Groq
Learn how to configure and use Groq's lightning-fast inference with Sypha. Access models from OpenAI, Meta, DeepSeek, and more on Groq's purpose-built LPU architecture.
Groq delivers ultra-fast AI inference via their custom LPU™ (Language Processing Unit) architecture, specifically engineered for inference instead of adapted from training hardware. Groq accommodates open-source models from multiple providers including OpenAI, Meta, DeepSeek, Moonshot AI, and others.
Website: https://groq.com/
Getting an API Key
- Sign Up/Sign In: Visit Groq and establish an account or authenticate.
- Navigate to Console: Proceed to the Groq Console to enter your dashboard.
- Create a Key: Go to the API Keys section and produce a new API key. Assign your key a meaningful name (e.g., "Sypha").
- Copy the Key: Capture the API key right away. You will not be able to view it again. Preserve it securely.
Supported Models
Sypha is compatible with the following Groq models:
llama-3.3-70b-versatile(Meta) - Well-balanced performance with 131K contextllama-3.1-8b-instant(Meta) - Rapid inference with 131K contextopenai/gpt-oss-120b(OpenAI) - Highlighted flagship model with 131K contextopenai/gpt-oss-20b(OpenAI) - Highlighted compact model with 131K contextmoonshotai/kimi-k2-instruct(Moonshot AI) - 1 trillion parameter model featuring prompt cachingdeepseek-r1-distill-llama-70b(DeepSeek/Meta) - Reasoning-enhanced modelqwen/qwen3-32b(Alibaba Cloud) - Optimized for Q&A tasksmeta-llama/llama-4-maverick-17b-128e-instruct(Meta) - Most recent Llama 4 variantmeta-llama/llama-4-scout-17b-16e-instruct(Meta) - Most recent Llama 4 variant
Configuration in Sypha
- Open Sypha Settings: Select the settings icon (⚙️) within the Sypha panel.
- Select Provider: Pick "Groq" from the "API Provider" dropdown menu.
- Enter API Key: Insert your Groq API key into the "Groq API Key" field.
- Select Model: Pick your preferred model from the "Model" dropdown menu.
Groq's Speed Revolution
Groq's LPU architecture provides several critical advantages over conventional GPU-based inference:
LPU Architecture
Unlike GPUs that are modified from training workloads, Groq's LPU is purpose-designed for inference. This removes architectural bottlenecks that generate latency in conventional systems.
Unmatched Speed
- Sub-millisecond latency that remains consistent across traffic, regions, and workloads
- Static scheduling with pre-calculated execution graphs removes runtime coordination delays
- Tensor parallelism refined for low-latency single responses instead of high-throughput batching
Quality Without Tradeoffs
- TruePoint numerics diminish precision only in areas that don't impact accuracy
- 100-bit intermediate accumulation guarantees lossless computation
- Strategic precision control preserves quality while attaining 2-4× speedup over BF16
Memory Architecture
- SRAM as primary storage (not cache) featuring hundreds of megabytes on-chip
- Eliminates DRAM/HBM latency that affects traditional accelerators
- Enables true tensor parallelism by dividing layers across multiple chips
Learn more about Groq's technology in their LPU architecture blog post.
Special Features
Prompt Caching
The Kimi K2 model enables prompt caching, which can substantially decrease costs and latency for repeated prompts.
Vision Support
Certain models enable image inputs and vision capabilities. Verify the model details in the Groq Console for particular capabilities.
Reasoning Models
Some models like DeepSeek variants provide enhanced reasoning capabilities with sequential thought processes.
Tips and Notes
- Model Selection: Select models according to your particular use case and performance needs.
- Speed Advantage: Groq distinguishes itself in single-request latency instead of high-throughput batch processing.
- OSS Model Provider: Groq accommodates open-source models from multiple providers (OpenAI, Meta, DeepSeek, etc.) on their rapid infrastructure.
- Context Windows: Most models provide extensive context windows (up to 131K tokens) for incorporating substantial code and context.
- Pricing: Groq provides competitive pricing with their speed benefits. Consult the Groq Pricing page for current rates.
- Rate Limits: Groq has generous rate limits, but verify their documentation for current limits according to your usage tier.
GCP Vertex AI
Configure GCP Vertex AI with Sypha to access leading generative AI models like Claude 3.5 Sonnet v2. This guide covers GCP environment setup, authentication, and secure integration for enterprise teams.
LiteLLM & Sypha (using Codestral)
Learn how to set up and run LiteLLM with Sypha using the Codestral model. This guide covers Docker setup, configuration, and integration with Sypha.