Vercel AI Gateway

Use Vercel AI Gateway in Sypha to reach 100+ models from one endpoint with routing, retries, and spend observability.

With Vercel AI Gateway, you gain unified API access to models across multiple providers. Simply change the model identifier to switch between options—no need to swap SDKs or manage separate keys. Sypha connects directly with the Gateway, allowing you to choose any available model from the interface, interact with it as you would with any other provider, and monitor token consumption and cache metrics in real-time.

Useful links:

Team dashboard: https://vercel.com/d?to=%2F%5Bteam%5D%2F%7E%2Fai
Models catalog: https://vercel.com/ai-gateway/models
Docs: https://vercel.com/docs/ai-gateway

Key benefits

Access over 100 models through a unified endpoint using a single authentication key
Built-in retry logic and fallback mechanisms that you control via the dashboard
Comprehensive usage tracking including per-model requests, token consumption, cache statistics, latency distribution, and expense monitoring
Compatible with OpenAI standards, ensuring existing client libraries work seamlessly

Obtaining an API Key

Navigate to https://vercel.com and authenticate
Access Dashboard → AI Gateway → API Keys → Generate new key
Retrieve and save the generated key

For additional information regarding authentication methods and OIDC integration, visit https://vercel.com/docs/ai-gateway/authentication

Setting up Sypha

Access Sypha configuration panel
Choose Vercel AI Gateway from the API Provider options
Insert your Gateway API Key
Choose a model from the available options. Sypha retrieves the catalog automatically. You may also enter a specific model identifier

Notes:

Model identifiers typically use provider/model format. Use the exact identifier from the catalog Examples:
- openai/gpt-5
- anthropic/claude-sonnet-4
- google/gemini-2.5-pro
- groq/llama-3.1-70b
- deepseek/deepseek-v3

Actionable insights

Vercel AI Gateway observability with requests by model, tokens, cache, latency, and cost.

Key metrics to monitor:

Per-model request volume - verify routing behavior and model adoption rates
Token metrics - distinguish between input and output, with reasoning tokens when available
Caching performance - track cached input tokens and cache creation activity
Response time - monitor p75 response duration and p75 time-to-first-token
Financial impact - break down costs by project and individual models

Leverage these metrics to:

Evaluate output token efficiency when switching between models
Verify caching effectiveness by monitoring cache hit rates and creation patterns
Identify time-to-first-token degradation during testing phases
Synchronize spending with actual resource consumption

Available models

The platform provides access to an extensive and continuously evolving model collection. Sypha retrieves the current model list through the Gateway API and maintains a local cache. To view the complete catalog, visit https://vercel.com/ai-gateway/models

Best practices

Maintain distinct gateway keys for each environment (development, staging, production). This approach ensures cleaner dashboard organization and isolated budget tracking.

The pricing model is transparent, passing through provider list rates directly. When using your own keys, there's zero markup. Standard provider fees and processing charges still apply.

Vercel doesn't impose its own rate limitations. However, upstream providers maintain their own limits. Fresh accounts are granted $5 in credits monthly until the initial payment is processed.

Common issues and solutions

401 authentication error - ensure the Gateway key is directed to the Gateway endpoint rather than upstream provider URLs
404 model not found - verify you're using the precise identifier from the Vercel catalog
Delayed initial token - examine p75 TTFT metrics in the dashboard and consider selecting a model optimized for streaming responses
Unexpected cost increases - analyze cost breakdown per model in the dashboard and implement traffic caps or routing adjustments

Use case ideas

Model comparison workflows - modify only the model identifier in Sypha to compare response latency and token output
Gradual model migration - allocate a small traffic percentage to an experimental model through the dashboard and scale based on performance metrics
Cost control implementation - establish per-project spending limits through configuration rather than code modifications

OpenAI-Compatible setup: /provider-config/openai-compatible
Model Selection Guide: /getting-started/model-selection-guide
Understanding Context Management: /getting-started/understanding-context-management

On this page