Sypha AI Docs
Advanced usageAvailable tools

Semantic Codebase Search

Utilise Sypha's semantic search to find relevant code patterns across your entire project using AI embeddings.

Semantic Codebase Search

[!WARNING] Experimental Feature: The codebase_search tool requires an active embedding provider and a localized vector database (Qdrant).

The codebase_search tool enables deep semantic exploration of your entire repository. Unlike standard text-based searches that rely on literal characters, it leverages AI embeddings to understand the meaning of your request, finding relevant logic even when keywords don't match exactly.

Core Parameters

  • query (Required): A natural language description of the code or pattern you are seeking.
  • path (Optional): A specific directory to focus the search radius.

Operational Logic

This tool scans your indexed codebase for conceptual similarity. It returns code snippets ranked by relevance, providing file paths, line numbers, and similarity scores.

Coming Soon !!!

Strategic Use Cases

  • Pattern Discovery: Find how specific features (like authentication or error handling) are implemented across multiple modules.
  • Onboarding: Quickly understand an unfamiliar codebase by searching for high-level concepts like "database connection pooling."
  • Refactoring Impact: Identify all areas of the system that follow a particular logic pattern before making a global change.
  • Architectural Auditing: Ensure that new implementations align with existing architectural decisions.

Prototypical Features

  • Meaning-Based Retrieval: Finds code by intent rather than exact string matching.
  • Global Visibility: Scans the entire indexed project, not just currently open buffers.
  • Relevance Ranking: Results are scored on a 0-1 similarity scale, with only the most pertinent results returned (Threshold: 0.4).
  • Navigation Optimized: Results include clickable links to specific files and line ranges.

Technical Requirements

This tool is active only when Experimental Codebase Indexing is configured:

  • Indexing Active: The feature must be toggled for the current workspace.
  • Embedding Engine: A valid OpenAI key or a running Ollama service for vector generation.
  • Vector Storage: A reachable Qdrant instance for storage and retrieval.

Operational Cycle

  1. Validation: Sypha confirms the Indexing service is operational.
  2. Vectorization: Your natural language query is converted into a high-dimensional vector.
  3. Similarity Retrieval: The Qdrant database is scanned for the most similar code embeddings using cosine similarity.
  4. Context Injection: The top 50 matches (above 0.4 similarity) are formatted for both the AI and the user interface.

Best Practices for Queries

Effective Patterns:

  • "Logic for handling JSON Web Tokens"
  • "Middleware for rate limiting API requests"
  • "Service layer for managing user roles"

Avoid Over-Generality:

  • "function"
  • "code"
  • "module"

For a complete guide on initializing the search index, refer to the Experimental Settings Guide.

On this page