Sypha AI Docs
Provider config

Vercel AI Gateway

Use Vercel AI Gateway in Sypha to reach 100+ models from one endpoint with routing, retries, and spend observability.

With Vercel AI Gateway, you gain unified API access to models across multiple providers. Simply change the model identifier to switch between options—no need to swap SDKs or manage separate keys. Sypha connects directly with the Gateway, allowing you to choose any available model from the interface, interact with it as you would with any other provider, and monitor token consumption and cache metrics in real-time.

Useful links:

Key benefits

  • Access over 100 models through a unified endpoint using a single authentication key
  • Built-in retry logic and fallback mechanisms that you control via the dashboard
  • Comprehensive usage tracking including per-model requests, token consumption, cache statistics, latency distribution, and expense monitoring
  • Compatible with OpenAI standards, ensuring existing client libraries work seamlessly

Obtaining an API Key

  1. Navigate to https://vercel.com and authenticate
  2. Access Dashboard → AI Gateway → API Keys → Generate new key
  3. Retrieve and save the generated key

For additional information regarding authentication methods and OIDC integration, visit https://vercel.com/docs/ai-gateway/authentication

Setting up Sypha

  1. Access Sypha configuration panel
  2. Choose Vercel AI Gateway from the API Provider options
  3. Insert your Gateway API Key
  4. Choose a model from the available options. Sypha retrieves the catalog automatically. You may also enter a specific model identifier

Notes:

  • Model identifiers typically use provider/model format. Use the exact identifier from the catalog Examples:
    • openai/gpt-5
    • anthropic/claude-sonnet-4
    • google/gemini-2.5-pro
    • groq/llama-3.1-70b
    • deepseek/deepseek-v3

Actionable insights

Vercel AI Gateway observability with requests by model, tokens, cache, latency, and cost.

Key metrics to monitor:

  • Per-model request volume - verify routing behavior and model adoption rates
  • Token metrics - distinguish between input and output, with reasoning tokens when available
  • Caching performance - track cached input tokens and cache creation activity
  • Response time - monitor p75 response duration and p75 time-to-first-token
  • Financial impact - break down costs by project and individual models

Leverage these metrics to:

  • Evaluate output token efficiency when switching between models
  • Verify caching effectiveness by monitoring cache hit rates and creation patterns
  • Identify time-to-first-token degradation during testing phases
  • Synchronize spending with actual resource consumption

Available models

The platform provides access to an extensive and continuously evolving model collection. Sypha retrieves the current model list through the Gateway API and maintains a local cache. To view the complete catalog, visit https://vercel.com/ai-gateway/models

Best practices

Maintain distinct gateway keys for each environment (development, staging, production). This approach ensures cleaner dashboard organization and isolated budget tracking.

The pricing model is transparent, passing through provider list rates directly. When using your own keys, there's zero markup. Standard provider fees and processing charges still apply.

Vercel doesn't impose its own rate limitations. However, upstream providers maintain their own limits. Fresh accounts are granted $5 in credits monthly until the initial payment is processed.

Common issues and solutions

  • 401 authentication error - ensure the Gateway key is directed to the Gateway endpoint rather than upstream provider URLs
  • 404 model not found - verify you're using the precise identifier from the Vercel catalog
  • Delayed initial token - examine p75 TTFT metrics in the dashboard and consider selecting a model optimized for streaming responses
  • Unexpected cost increases - analyze cost breakdown per model in the dashboard and implement traffic caps or routing adjustments

Use case ideas

  • Model comparison workflows - modify only the model identifier in Sypha to compare response latency and token output
  • Gradual model migration - allocate a small traffic percentage to an experimental model through the dashboard and scale based on performance metrics
  • Cost control implementation - establish per-project spending limits through configuration rather than code modifications
  • OpenAI-Compatible setup: /provider-config/openai-compatible
  • Model Selection Guide: /getting-started/model-selection-guide
  • Understanding Context Management: /getting-started/understanding-context-management

On this page