Sypha AI Docs
Provider config

Groq

Learn how to configure and use Groq's lightning-fast inference with Sypha. Access models from OpenAI, Meta, DeepSeek, and more on Groq's purpose-built LPU architecture.

Groq delivers ultra-fast AI inference via their custom LPU™ (Language Processing Unit) architecture, specifically engineered for inference instead of adapted from training hardware. Groq accommodates open-source models from multiple providers including OpenAI, Meta, DeepSeek, Moonshot AI, and others.

Website: https://groq.com/

Getting an API Key

  1. Sign Up/Sign In: Visit Groq and establish an account or authenticate.
  2. Navigate to Console: Proceed to the Groq Console to enter your dashboard.
  3. Create a Key: Go to the API Keys section and produce a new API key. Assign your key a meaningful name (e.g., "Sypha").
  4. Copy the Key: Capture the API key right away. You will not be able to view it again. Preserve it securely.

Supported Models

Sypha is compatible with the following Groq models:

  • llama-3.3-70b-versatile (Meta) - Well-balanced performance with 131K context
  • llama-3.1-8b-instant (Meta) - Rapid inference with 131K context
  • openai/gpt-oss-120b (OpenAI) - Highlighted flagship model with 131K context
  • openai/gpt-oss-20b (OpenAI) - Highlighted compact model with 131K context
  • moonshotai/kimi-k2-instruct (Moonshot AI) - 1 trillion parameter model featuring prompt caching
  • deepseek-r1-distill-llama-70b (DeepSeek/Meta) - Reasoning-enhanced model
  • qwen/qwen3-32b (Alibaba Cloud) - Optimized for Q&A tasks
  • meta-llama/llama-4-maverick-17b-128e-instruct (Meta) - Most recent Llama 4 variant
  • meta-llama/llama-4-scout-17b-16e-instruct (Meta) - Most recent Llama 4 variant

Configuration in Sypha

  1. Open Sypha Settings: Select the settings icon (⚙️) within the Sypha panel.
  2. Select Provider: Pick "Groq" from the "API Provider" dropdown menu.
  3. Enter API Key: Insert your Groq API key into the "Groq API Key" field.
  4. Select Model: Pick your preferred model from the "Model" dropdown menu.

Groq's Speed Revolution

Groq's LPU architecture provides several critical advantages over conventional GPU-based inference:

LPU Architecture

Unlike GPUs that are modified from training workloads, Groq's LPU is purpose-designed for inference. This removes architectural bottlenecks that generate latency in conventional systems.

Unmatched Speed

  • Sub-millisecond latency that remains consistent across traffic, regions, and workloads
  • Static scheduling with pre-calculated execution graphs removes runtime coordination delays
  • Tensor parallelism refined for low-latency single responses instead of high-throughput batching

Quality Without Tradeoffs

  • TruePoint numerics diminish precision only in areas that don't impact accuracy
  • 100-bit intermediate accumulation guarantees lossless computation
  • Strategic precision control preserves quality while attaining 2-4× speedup over BF16

Memory Architecture

  • SRAM as primary storage (not cache) featuring hundreds of megabytes on-chip
  • Eliminates DRAM/HBM latency that affects traditional accelerators
  • Enables true tensor parallelism by dividing layers across multiple chips

Learn more about Groq's technology in their LPU architecture blog post.

Special Features

Prompt Caching

The Kimi K2 model enables prompt caching, which can substantially decrease costs and latency for repeated prompts.

Vision Support

Certain models enable image inputs and vision capabilities. Verify the model details in the Groq Console for particular capabilities.

Reasoning Models

Some models like DeepSeek variants provide enhanced reasoning capabilities with sequential thought processes.

Tips and Notes

  • Model Selection: Select models according to your particular use case and performance needs.
  • Speed Advantage: Groq distinguishes itself in single-request latency instead of high-throughput batch processing.
  • OSS Model Provider: Groq accommodates open-source models from multiple providers (OpenAI, Meta, DeepSeek, etc.) on their rapid infrastructure.
  • Context Windows: Most models provide extensive context windows (up to 131K tokens) for incorporating substantial code and context.
  • Pricing: Groq provides competitive pricing with their speed benefits. Consult the Groq Pricing page for current rates.
  • Rate Limits: Groq has generous rate limits, but verify their documentation for current limits according to your usage tier.

On this page