Baseten

Learn how to configure and use Baseten's Model APIs with Sypha. Access frontier open-source models with enterprise-grade performance, reliability, and competitive pricing.

Baseten offers on-demand frontier model APIs built for production environments rather than mere experimentation. Powered by the Baseten Inference Stack, these APIs provide enterprise-level performance and dependability with optimized inference capabilities for premier open-source models from OpenAI, DeepSeek, Meta, Moonshot AI, and Alibaba Cloud.

Website: https://www.baseten.co/products/model-apis/

Getting an API Key

Sign Up/Sign In: Visit Baseten and establish an account or authenticate if you already have one.
Navigate to API Keys: Open your dashboard and locate the API Keys section.
Create a Key: Produce a new API key. Assign it a meaningful name (e.g., "Sypha").
Copy the Key: Capture the API key right away and preserve it in a secure location.

Supported Models

Sypha is compatible with all available models within Baseten Model APIs, including: For current pricing information, please refer to: https://www.baseten.co/products/model-apis/ Note: Kimi K2 0711, Llama 4 Maverick, and Llama 4 Scout Model APIs were deprecated at 5pm PT on October 8th. https://www.baseten.co/resources/changelog/model-api-deprecation-notice-kimi-k2-0711-scout-maverick/

zai-org/GLM-4.6 (Z AI) - Advanced open frontier model featuring sophisticated agentic, reasoning and coding abilities by Z AI (200k context) $0.60/$2.20 per 1M tokens
moonshotai/Kimi-K2-Instruct-0905 (Moonshot AI) - September release with improved features (262K context) - $0.60/$2.50 per 1M tokens
openai/gpt-oss-120b (OpenAI) - 120B MoE featuring robust reasoning capabilities (128K context) - $0.10/$0.50 per 1M tokens
Qwen/Qwen3-Coder-480B-A35B-Instruct- Sophisticated coding and reasoning (262K context) - $0.38/$1.53 per 1M tokens
Qwen/Qwen3-235B-A22B-Instruct-2507 - Mathematics and reasoning specialist (262K context) - $0.22/$0.80 per 1M tokens
deepseek-ai/DeepSeek-R1 - DeepSeek's initial-generation reasoning model (163K context) - $2.55/$5.95 per 1M tokens
deepseek-ai/DeepSeek-R1-0528 - Most recent iteration of DeepSeek's reasoning model (163K context) - $2.55/$5.95 per 1M tokens
deepseek-ai/DeepSeek-V3.1 - Combined reasoning with sophisticated tool calling (163K context) - $0.50/$1.50 per 1M tokens
deepseek-ai/DeepSeek-V3-0324 - Rapid general-purpose with improved reasoning (163K context) - $0.77/$0.77 per 1M tokens

Configuration in Sypha

Open Sypha Settings: Select the settings icon (⚙️) within the Sypha panel.
Select Provider: Pick "Baseten" from the "API Provider" dropdown menu.
Enter API Key: Insert your Baseten API key into the "Baseten API Key" field.
Select Model: Pick your preferred model from the "Model" dropdown menu.

Production-First Architecture

Baseten's Model APIs are engineered for production settings with multiple critical advantages:

Enterprise-Grade Reliability

99.99% uptime achieved via active-active redundancy
Cloud-agnostic, multi-cluster autoscaling ensuring consistent availability
SOC 2 Type II certification and HIPAA compliance meeting security standards

Optimized Performance

Pre-optimized models delivered through the Baseten Inference Stack
Latest-generation GPUs supported by multi-cloud infrastructure
Ultra-fast inference refined from foundation to peak for production requirements

Cost Efficiency

5-10x more affordable compared to closed alternatives
Optimized multi-cloud infrastructure enabling efficient resource usage
Transparent pricing eliminating hidden fees or unexpected rate limit costs

Developer Experience

OpenAI compatible API - transition by changing a single URL
Direct replacement for closed models featuring comprehensive observability
Effortless scaling transitioning from Model APIs to dedicated deployments

Special Features

Function Calling & Tool Use

Every Baseten model enables structured outputs, function calling, and tool usage through the Baseten Inference Stack, making them excellent for agentic applications.

Reasoning Capabilities

DeepSeek models provide enhanced reasoning featuring step-by-step analytical processes, while sustaining production-ready performance.

Long Context Support

Up to 1 million tokens available for Llama 4 models (Maverick and Scout)
262K tokens available for Qwen3 models
163K tokens available for DeepSeek models
Ideal for code repositories and intricate multi-turn conversations

Quantization Optimizations

Models are deployed utilizing advanced quantization methods (fp4, fp8, fp16) for peak performance while preserving quality.

Migration from Other Providers

Baseten's OpenAI compatibility simplifies migration:

From OpenAI:

Replace api.openai.com with inference.baseten.co/v1
Preserve existing request/response structures
Gain significant cost reductions

From Other Providers:

Employ standard OpenAI SDK format
Keep current prompting approaches
Obtain access to newer open-source models

Tips and Notes

Model Selection: Select models according to your particular use case - reasoning models for intricate tasks, coding models for development activities, and flagship models for general purposes.
Cost Optimization: Baseten provides some of the most attractive pricing available, particularly for open-source models.
Context Windows: Leverage expansive context windows (up to 1M tokens) for incorporating extensive codebases and documentation.
Enterprise Ready: Baseten is architected for production deployment with enterprise-level security, compliance, and reliability.
Dynamic Model Updates: Sypha automatically retrieves the current model list from Baseten, guaranteeing access to new models upon release.
Multi-Cloud Capacity Management (MCM): Baseten's multi-cloud infrastructure guarantees high availability and minimal latency worldwide.
Support: Baseten delivers dedicated support for production deployments and can collaborate with you on dedicated resources during scaling.

Pricing Information

Current pricing is exceptionally competitive and clear. For the latest pricing details, visit the Baseten Model APIs page. Prices generally range from $0.10-$6.00 per million tokens, making Baseten substantially more economical than numerous closed-model alternatives while granting access to cutting-edge open-source models.

On this page