Sypha AI Docs
Features

Automatic Context Summarization

When your conversation nears the model's context window limit, Sypha automatically condenses it to free up space and continue working.

Auto-compact feature condensing conversation context

How It Works

Sypha tracks token usage throughout your conversation. As you approach the limit, he:

  1. Generates a comprehensive summary of everything that has occurred
  2. Retains all technical details, code changes, and decisions
  3. Substitutes the conversation history with the summary
  4. Continues precisely where he left off

You'll observe a summarization tool call when this occurs, displaying the total cost like any other api call in the chat view.

Why This Matters

Previously, Sypha would truncate older messages upon hitting context limits. This resulted in losing important context from earlier in the conversation.

With summarization now:

  • All technical decisions and code patterns get preserved
  • File changes and project context stay intact
  • Sypha retains memory of everything he's done
  • You can work on substantially larger projects without interruption

Context Summarization pairs beautifully with Focus Chain. When Focus Chain is enabled, todo lists persist across summarizations. This allows Sypha to work on long-horizon tasks spanning multiple context windows while remaining on track with the todo list guiding him through each reset.

Technical Details

The summarization occurs through your configured API provider utilizing the same model you're already using. It employs prompt caching to minimize costs.

  1. Sypha utilizes a summarization prompt to request a conversation summary.

  2. After the summary is generated, Sypha substitutes the conversation history with a continuation prompt that instructs Sypha to continue working and supplies the summary as context.

Different models have varying context window thresholds for when auto-summarization activates. You can examine how thresholds are determined in context-window-utils.ts.

Cost Considerations

Summarization utilizes your existing prompt cache from the conversation, so it costs approximately the same as any other tool call.

Since the majority of input tokens are already cached, you're primarily paying for the summary generation (output tokens), rendering it very cost-effective.

Restoring Context with Checkpoints

You can utilize checkpoints to restore your task state from before a summarization occurred. This ensures you never truly lose context - you can always revert to previous versions of your conversation.

Editing a message before a summarization tool call will function similarly to a checkpoint, permitting you to restore the conversation to that point.

Next Generation Model Support

Auto Compact employs advanced LLM-based summarization which we've found performs significantly better for next-generation models. We currently support this feature for the following models:

  • Claude 4 series
  • Gemini 2.5 series
  • GPT-5
  • Grok 4

When using other models, Sypha automatically defaults to the standard rule-based context truncation method, even if Auto Compact is enabled in settings.

On this page