Groq

Learn how to configure and use Groq's lightning-fast inference with Sypha. Access models from OpenAI, Meta, DeepSeek, and more on Groq's purpose-built LPU architecture.

Groq delivers ultra-fast AI inference via their custom LPU™ (Language Processing Unit) architecture, specifically engineered for inference instead of adapted from training hardware. Groq accommodates open-source models from multiple providers including OpenAI, Meta, DeepSeek, Moonshot AI, and others.

Website: https://groq.com/

Getting an API Key

Sign Up/Sign In: Visit Groq and establish an account or authenticate.
Navigate to Console: Proceed to the Groq Console to enter your dashboard.
Create a Key: Go to the API Keys section and produce a new API key. Assign your key a meaningful name (e.g., "Sypha").
Copy the Key: Capture the API key right away. You will not be able to view it again. Preserve it securely.

Supported Models

Sypha is compatible with the following Groq models:

llama-3.3-70b-versatile (Meta) - Well-balanced performance with 131K context
llama-3.1-8b-instant (Meta) - Rapid inference with 131K context
openai/gpt-oss-120b (OpenAI) - Highlighted flagship model with 131K context
openai/gpt-oss-20b (OpenAI) - Highlighted compact model with 131K context
moonshotai/kimi-k2-instruct (Moonshot AI) - 1 trillion parameter model featuring prompt caching
deepseek-r1-distill-llama-70b (DeepSeek/Meta) - Reasoning-enhanced model
qwen/qwen3-32b (Alibaba Cloud) - Optimized for Q&A tasks
meta-llama/llama-4-maverick-17b-128e-instruct (Meta) - Most recent Llama 4 variant
meta-llama/llama-4-scout-17b-16e-instruct (Meta) - Most recent Llama 4 variant

Configuration in Sypha

Open Sypha Settings: Select the settings icon (⚙️) within the Sypha panel.
Select Provider: Pick "Groq" from the "API Provider" dropdown menu.
Enter API Key: Insert your Groq API key into the "Groq API Key" field.
Select Model: Pick your preferred model from the "Model" dropdown menu.

Groq's Speed Revolution

Groq's LPU architecture provides several critical advantages over conventional GPU-based inference:

LPU Architecture

Unlike GPUs that are modified from training workloads, Groq's LPU is purpose-designed for inference. This removes architectural bottlenecks that generate latency in conventional systems.

Unmatched Speed

Sub-millisecond latency that remains consistent across traffic, regions, and workloads
Static scheduling with pre-calculated execution graphs removes runtime coordination delays
Tensor parallelism refined for low-latency single responses instead of high-throughput batching

Quality Without Tradeoffs

TruePoint numerics diminish precision only in areas that don't impact accuracy
100-bit intermediate accumulation guarantees lossless computation
Strategic precision control preserves quality while attaining 2-4× speedup over BF16

Memory Architecture

SRAM as primary storage (not cache) featuring hundreds of megabytes on-chip
Eliminates DRAM/HBM latency that affects traditional accelerators
Enables true tensor parallelism by dividing layers across multiple chips

Learn more about Groq's technology in their LPU architecture blog post.

Model Selection: Select models according to your particular use case and performance needs.
Speed Advantage: Groq distinguishes itself in single-request latency instead of high-throughput batch processing.
OSS Model Provider: Groq accommodates open-source models from multiple providers (OpenAI, Meta, DeepSeek, etc.) on their rapid infrastructure.
Context Windows: Most models provide extensive context windows (up to 131K tokens) for incorporating substantial code and context.
Pricing: Groq provides competitive pricing with their speed benefits. Consult the Groq Pricing page for current rates.
Rate Limits: Groq has generous rate limits, but verify their documentation for current limits according to your usage tier.

Getting an API Key

Supported Models

Configuration in Sypha

Groq's Speed Revolution

LPU Architecture

Unmatched Speed

Quality Without Tradeoffs

Memory Architecture

Special Features

Prompt Caching

Vision Support

Reasoning Models

Tips and Notes

On this page