Sypha AI Docs
Provider config

Fireworks AI

Learn how to configure and use Fireworks AI's lightning-fast inference platform with Sypha. Experience up to 4x faster inference speeds with optimized models and competitive pricing.

Fireworks AI is a premier infrastructure platform for generative AI that concentrates on providing exceptional performance through optimized inference capabilities. With up to 4x faster inference speeds compared to alternative platforms and support for over 40 different AI models, Fireworks removes the operational complexity of operating AI models at scale.

Website: https://fireworks.ai/

Getting an API Key

  1. Sign Up/Sign In: Visit Fireworks AI and establish an account or authenticate.
  2. Navigate to API Keys: Enter the API keys section within your dashboard.
  3. Create a Key: Produce a new API key. Assign it a meaningful name (e.g., "Sypha").
  4. Copy the Key: Capture the API key right away. Preserve it securely.

Supported Models

Fireworks AI is compatible with an extensive variety of models across different categories. Popular models include:

Text Generation Models:

  • Llama 3.1 series (8B, 70B, 405B)
  • Mixtral 8x7B and 8x22B
  • Qwen 2.5 series
  • DeepSeek models featuring reasoning capabilities
  • Code Llama models for programming tasks

Vision Models:

  • Llama 3.2 Vision models
  • Qwen 2-VL models

Embedding Models:

  • Various text embedding models for semantic search

The platform curates, refines, and deploys models with custom kernels and inference optimizations for peak performance.

Configuration in Sypha

  1. Open Sypha Settings: Select the settings icon (⚙️) within the Sypha panel.
  2. Select Provider: Pick "Fireworks" from the "API Provider" dropdown menu.
  3. Enter API Key: Insert your Fireworks API key into the "Fireworks API Key" field.
  4. Enter Model ID: Designate the model you wish to use (e.g., "accounts/fireworks/models/llama-v3p1-70b-instruct").
  5. Configure Tokens: Optionally define max completion tokens and context window size.

Fireworks AI's Performance Focus

Fireworks AI's competitive strengths center on performance optimization and developer experience:

Lightning-Fast Inference

  • Up to 4x faster inference compared to alternative platforms
  • 250% higher throughput relative to open source inference engines
  • 50% faster speed with substantially reduced latency
  • 6x lower cost than HuggingFace Endpoints featuring 2.5x generation speed

Advanced Optimization Technology

  • Custom kernels and inference optimizations boost throughput per GPU
  • Multi-LoRA architecture facilitates efficient resource sharing
  • Hundreds of fine-tuned model variants can operate on shared base model infrastructure
  • Asset-light model emphasizes optimization software instead of expensive GPU ownership

Comprehensive Model Support

  • 40+ different AI models curated and refined for performance
  • Multiple GPU types accommodated: A100, H100, H200, B200, AMD MI300X
  • Pay-per-GPU-second billing with no additional charges for start-up times
  • OpenAI API compatibility for effortless integration

Pricing Structure

Fireworks AI uses a usage-based pricing model with competitive rates:

Text and Vision Models (2025)

Parameter CountPrice per 1M Input Tokens
Less than 4B parameters$0.10
4B - 16B parameters$0.20
More than 16B parameters$0.90
MoE 0B - 56B parameters$0.50

Fine-Tuning Services

Base Model SizePrice per 1M Training Tokens
Up to 16B parameters$0.50
16.1B - 80B parameters$3.00
DeepSeek R1 / V3$10.00

Dedicated Deployments

GPU TypePrice per Hour
A100 80GB$2.90
H100 80GB$5.80
H200 141GB$6.99
B200 180GB$11.99
AMD MI300X$4.99

Special Features

Fine-Tuning Capabilities

Fireworks provides sophisticated fine-tuning services accessible via CLI interface, supporting JSON-formatted data from databases like MongoDB Atlas. Fine-tuned models have the same cost as base models for inference.

Developer Experience

  • Browser playground for direct model interaction
  • REST API featuring OpenAI compatibility
  • Comprehensive cookbook with ready-to-deploy recipes
  • Multiple deployment options ranging from serverless to dedicated GPUs

Enterprise Features

  • HIPAA and SOC 2 Type II compliance for regulated industries
  • Self-serve onboarding for developers
  • Enterprise sales for larger deployments
  • Post-paid billing options and Business tier

Reasoning Model Support

Sophisticated support for reasoning models with <think> tag processing and reasoning content extraction, making intricate multi-step reasoning feasible for real-time applications.

Performance Advantages

Fireworks AI's optimization produces quantifiable improvements:

  • 250% higher throughput versus open source engines
  • 50% faster speed with diminished latency
  • 6x cost reduction relative to alternatives
  • 2.5x generation speed enhancement per request

Tips and Notes

  • Model Selection: Pick models according to your particular use case - smaller models for speed, larger models for intricate reasoning.
  • Performance Focus: Fireworks specializes in making AI inference rapid and economical through advanced optimizations.
  • Fine-Tuning: Utilize fine-tuning capabilities to enhance model accuracy with your proprietary data.
  • Compliance: HIPAA and SOC 2 Type II compliance permits use in regulated industries.
  • Pricing Model: Usage-based pricing scales with your success instead of traditional seat-based models.
  • Developer Resources: Extensive documentation and cookbook recipes expedite implementation.
  • GPU Options: Multiple GPU types accessible for dedicated deployments according to performance needs.

On this page