Baseten
Learn how to configure and use Baseten's Model APIs with Sypha. Access frontier open-source models with enterprise-grade performance, reliability, and competitive pricing.
Baseten offers on-demand frontier model APIs built for production environments rather than mere experimentation. Powered by the Baseten Inference Stack, these APIs provide enterprise-level performance and dependability with optimized inference capabilities for premier open-source models from OpenAI, DeepSeek, Meta, Moonshot AI, and Alibaba Cloud.
Website: https://www.baseten.co/products/model-apis/
Getting an API Key
- Sign Up/Sign In: Visit Baseten and establish an account or authenticate if you already have one.
- Navigate to API Keys: Open your dashboard and locate the API Keys section.
- Create a Key: Produce a new API key. Assign it a meaningful name (e.g., "Sypha").
- Copy the Key: Capture the API key right away and preserve it in a secure location.
Supported Models
Sypha is compatible with all available models within Baseten Model APIs, including: For current pricing information, please refer to: https://www.baseten.co/products/model-apis/ Note: Kimi K2 0711, Llama 4 Maverick, and Llama 4 Scout Model APIs were deprecated at 5pm PT on October 8th. https://www.baseten.co/resources/changelog/model-api-deprecation-notice-kimi-k2-0711-scout-maverick/
zai-org/GLM-4.6(Z AI) - Advanced open frontier model featuring sophisticated agentic, reasoning and coding abilities by Z AI (200k context) $0.60/$2.20 per 1M tokensmoonshotai/Kimi-K2-Instruct-0905(Moonshot AI) - September release with improved features (262K context) - $0.60/$2.50 per 1M tokensopenai/gpt-oss-120b(OpenAI) - 120B MoE featuring robust reasoning capabilities (128K context) - $0.10/$0.50 per 1M tokensQwen/Qwen3-Coder-480B-A35B-Instruct- Sophisticated coding and reasoning (262K context) - $0.38/$1.53 per 1M tokensQwen/Qwen3-235B-A22B-Instruct-2507- Mathematics and reasoning specialist (262K context) - $0.22/$0.80 per 1M tokensdeepseek-ai/DeepSeek-R1- DeepSeek's initial-generation reasoning model (163K context) - $2.55/$5.95 per 1M tokensdeepseek-ai/DeepSeek-R1-0528- Most recent iteration of DeepSeek's reasoning model (163K context) - $2.55/$5.95 per 1M tokensdeepseek-ai/DeepSeek-V3.1- Combined reasoning with sophisticated tool calling (163K context) - $0.50/$1.50 per 1M tokensdeepseek-ai/DeepSeek-V3-0324- Rapid general-purpose with improved reasoning (163K context) - $0.77/$0.77 per 1M tokens
Configuration in Sypha
- Open Sypha Settings: Select the settings icon (⚙️) within the Sypha panel.
- Select Provider: Pick "Baseten" from the "API Provider" dropdown menu.
- Enter API Key: Insert your Baseten API key into the "Baseten API Key" field.
- Select Model: Pick your preferred model from the "Model" dropdown menu.
Production-First Architecture
Baseten's Model APIs are engineered for production settings with multiple critical advantages:
Enterprise-Grade Reliability
- 99.99% uptime achieved via active-active redundancy
- Cloud-agnostic, multi-cluster autoscaling ensuring consistent availability
- SOC 2 Type II certification and HIPAA compliance meeting security standards
Optimized Performance
- Pre-optimized models delivered through the Baseten Inference Stack
- Latest-generation GPUs supported by multi-cloud infrastructure
- Ultra-fast inference refined from foundation to peak for production requirements
Cost Efficiency
- 5-10x more affordable compared to closed alternatives
- Optimized multi-cloud infrastructure enabling efficient resource usage
- Transparent pricing eliminating hidden fees or unexpected rate limit costs
Developer Experience
- OpenAI compatible API - transition by changing a single URL
- Direct replacement for closed models featuring comprehensive observability
- Effortless scaling transitioning from Model APIs to dedicated deployments
Special Features
Function Calling & Tool Use
Every Baseten model enables structured outputs, function calling, and tool usage through the Baseten Inference Stack, making them excellent for agentic applications.
Reasoning Capabilities
DeepSeek models provide enhanced reasoning featuring step-by-step analytical processes, while sustaining production-ready performance.
Long Context Support
- Up to 1 million tokens available for Llama 4 models (Maverick and Scout)
- 262K tokens available for Qwen3 models
- 163K tokens available for DeepSeek models
- Ideal for code repositories and intricate multi-turn conversations
Quantization Optimizations
Models are deployed utilizing advanced quantization methods (fp4, fp8, fp16) for peak performance while preserving quality.
Migration from Other Providers
Baseten's OpenAI compatibility simplifies migration:
From OpenAI:
- Replace
api.openai.comwithinference.baseten.co/v1 - Preserve existing request/response structures
- Gain significant cost reductions
From Other Providers:
- Employ standard OpenAI SDK format
- Keep current prompting approaches
- Obtain access to newer open-source models
Tips and Notes
- Model Selection: Select models according to your particular use case - reasoning models for intricate tasks, coding models for development activities, and flagship models for general purposes.
- Cost Optimization: Baseten provides some of the most attractive pricing available, particularly for open-source models.
- Context Windows: Leverage expansive context windows (up to 1M tokens) for incorporating extensive codebases and documentation.
- Enterprise Ready: Baseten is architected for production deployment with enterprise-level security, compliance, and reliability.
- Dynamic Model Updates: Sypha automatically retrieves the current model list from Baseten, guaranteeing access to new models upon release.
- Multi-Cloud Capacity Management (MCM): Baseten's multi-cloud infrastructure guarantees high availability and minimal latency worldwide.
- Support: Baseten delivers dedicated support for production deployments and can collaborate with you on dedicated resources during scaling.
Pricing Information
Current pricing is exceptionally competitive and clear. For the latest pricing details, visit the Baseten Model APIs page. Prices generally range from $0.10-$6.00 per million tokens, making Baseten substantially more economical than numerous closed-model alternatives while granting access to cutting-edge open-source models.
Anthropic
Learn how to configure and use Anthropic Claude models with Sypha. Covers API key setup, model selection, and advanced features like prompt caching.
Cerebras
Learn how to configure and use Cerebras's ultra-fast inference with Sypha. Experience up to 2,600 tokens per second with wafer-scale chip architecture and real-time reasoning models.