Vercel AI Gateway
Use Vercel AI Gateway in Sypha to reach 100+ models from one endpoint with routing, retries, and spend observability.
With Vercel AI Gateway, you gain unified API access to models across multiple providers. Simply change the model identifier to switch between options—no need to swap SDKs or manage separate keys. Sypha connects directly with the Gateway, allowing you to choose any available model from the interface, interact with it as you would with any other provider, and monitor token consumption and cache metrics in real-time.
Useful links:
- Team dashboard: https://vercel.com/d?to=%2F%5Bteam%5D%2F%7E%2Fai
- Models catalog: https://vercel.com/ai-gateway/models
- Docs: https://vercel.com/docs/ai-gateway
Key benefits
- Access over 100 models through a unified endpoint using a single authentication key
- Built-in retry logic and fallback mechanisms that you control via the dashboard
- Comprehensive usage tracking including per-model requests, token consumption, cache statistics, latency distribution, and expense monitoring
- Compatible with OpenAI standards, ensuring existing client libraries work seamlessly
Obtaining an API Key
- Navigate to https://vercel.com and authenticate
- Access Dashboard → AI Gateway → API Keys → Generate new key
- Retrieve and save the generated key
For additional information regarding authentication methods and OIDC integration, visit https://vercel.com/docs/ai-gateway/authentication
Setting up Sypha
- Access Sypha configuration panel
- Choose Vercel AI Gateway from the API Provider options
- Insert your Gateway API Key
- Choose a model from the available options. Sypha retrieves the catalog automatically. You may also enter a specific model identifier
Notes:
- Model identifiers typically use
provider/modelformat. Use the exact identifier from the catalog Examples:openai/gpt-5anthropic/claude-sonnet-4google/gemini-2.5-progroq/llama-3.1-70bdeepseek/deepseek-v3
Actionable insights
Key metrics to monitor:
- Per-model request volume - verify routing behavior and model adoption rates
- Token metrics - distinguish between input and output, with reasoning tokens when available
- Caching performance - track cached input tokens and cache creation activity
- Response time - monitor p75 response duration and p75 time-to-first-token
- Financial impact - break down costs by project and individual models
Leverage these metrics to:
- Evaluate output token efficiency when switching between models
- Verify caching effectiveness by monitoring cache hit rates and creation patterns
- Identify time-to-first-token degradation during testing phases
- Synchronize spending with actual resource consumption
Available models
The platform provides access to an extensive and continuously evolving model collection. Sypha retrieves the current model list through the Gateway API and maintains a local cache. To view the complete catalog, visit https://vercel.com/ai-gateway/models
Best practices
Maintain distinct gateway keys for each environment (development, staging, production). This approach ensures cleaner dashboard organization and isolated budget tracking.
The pricing model is transparent, passing through provider list rates directly. When using your own keys, there's zero markup. Standard provider fees and processing charges still apply.
Vercel doesn't impose its own rate limitations. However, upstream providers maintain their own limits. Fresh accounts are granted $5 in credits monthly until the initial payment is processed.
Common issues and solutions
- 401 authentication error - ensure the Gateway key is directed to the Gateway endpoint rather than upstream provider URLs
- 404 model not found - verify you're using the precise identifier from the Vercel catalog
- Delayed initial token - examine p75 TTFT metrics in the dashboard and consider selecting a model optimized for streaming responses
- Unexpected cost increases - analyze cost breakdown per model in the dashboard and implement traffic caps or routing adjustments
Use case ideas
- Model comparison workflows - modify only the model identifier in Sypha to compare response latency and token output
- Gradual model migration - allocate a small traffic percentage to an experimental model through the dashboard and scale based on performance metrics
- Cost control implementation - establish per-project spending limits through configuration rather than code modifications
Related documentation
- OpenAI-Compatible setup: /provider-config/openai-compatible
- Model Selection Guide: /getting-started/model-selection-guide
- Understanding Context Management: /getting-started/understanding-context-management