Sypha AI Docs
Providers

Local Inference with Ollama

Execute AI models locally on your own hardware for maximum privacy and offline capability.

Local Inference with Ollama

Sypha supports local model execution via Ollama, enabling private, offline development. While this eliminates API costs and data transmission, it requires significant hardware resources and local configuration.

Official Site: ollama.com

Coming Soon !!!

Strategic Benchmarking

Running models locally differs from cloud-hosted solutions:

  • Capability Nuance: Smaller local models (e.g., 24B or 30B parameters) may produce more errors than cloud giants like Claude or GPT-4.
  • Velocity: Inference speed is directly tied to your local GPU/RAM performance.
  • Operational Tips: To increase speed, keep conversations concise and consider disabling specialized tools during intensive local processing.

Hardware Prerequisites

For a high-fidelity experience, we recommend:

  • PC/Linux: GPU with 24GB+ of VRAM.
  • macOS: MacBook with 32GB+ of Unified Architecture memory.

Selecting Your Local Engine

You can browse the full library at ollama.com/library.

For Sypha workflows, we recommend:

  • Primary: qwen3-coder:30b (Balance of logic and speed).
  • Secondary: devstral:24b (Mistral-based coding specialist).

Implementation Guide

1. Initialize Ollama

Download and install the binary from ollama.com. Ensure the background service is operational:

ollama serve

2. Provision the Model

In a secondary terminal, "pull" the model to your local machine:

ollama pull qwen3-coder:30b

3. Calibrate Context Parameters

By default, Ollama may limit the context window. For engineering tasks, we suggest a minimum of 32k context. This can be adjusted in the Sypha provider settings under "Context Window Size (num_ctx)."

4. Adjust Timeout Thresholds

Local inference can be slower than cloud APIs. If you experience interruptions, increase the API Request Timeout in the Sypha extension settings within your IDE.

Configuring Sypha

  1. Open the Sypha Sidebar.
  2. Select the Settings gear.
  3. Choose Ollama as the primary API Provider.
  4. Select the specific model you "pulled" in the previous step.
  5. (Optional) Define a custom base URL if Ollama is running on a different machine. (Default: http://localhost:11434).

For advanced configuration, refer to the Official Ollama Documentation.

On this page