Sypha AI Docs
Features

Voice Dictation User Guide

Complete user guide for the Sypha Voice Dictation feature.

Voice Dictation - User Guide

Voice dictation enables you to interact with Sypha through spoken commands rather than keyboard input. Just press the microphone button, vocalize your directives, and Sypha will convert and execute your request.

Table of Contents


Overview

The voice dictation capability delivers:

  • Hands-free engagement with Sypha
  • Multi-language compatibility for transcription
  • Real-time translation (Sarvam AI exclusively)
  • Multiple transcription services (Sypha integrated or Sarvam AI)
  • Cross-platform functionality (macOS, Windows, Linux)

Prerequisites

macOS

  • macOS 10.15 or newer
  • FFmpeg (multimedia framework for audio capture)
  • Microphone (integrated or external)
  • Microphone permissions for VS Code

Windows

  • Windows 10 or newer
  • FFmpeg (multimedia framework for audio capture)
  • Microphone (integrated or external)
  • Microphone permissions for VS Code

Linux

  • Ubuntu 20.04+ or comparable distribution
  • FFmpeg with ALSA compatibility
  • Microphone (integrated or external)
  • Audio system (PulseAudio or ALSA)
  • Microphone permissions

Installation & Setup

Step 1: Install FFmpeg

FFmpeg is necessary for audio capture across all platforms.

macOS

Option 1: Using Homebrew (Recommended)

brew install ffmpeg

Option 2: Using MacPorts

sudo port install ffmpeg

Verify Installation:

ffmpeg -version

Windows

Option 1: Using winget (Windows 10+)

winget install Gyan.FFmpeg

Option 2: Manual Installation

  1. Download FFmpeg from ffmpeg.org
  2. Select "Windows builds from gyan.dev"
  3. Download the most recent release (full build)
  4. Extract to C:\ffmpeg
  5. Add to PATH:
    • Open System PropertiesEnvironment Variables
    • Edit Path variable
    • Add C:\ffmpeg\bin
    • Click OK and restart your terminal

Verify Installation:

ffmpeg -version

Important: Following FFmpeg installation, restart VS Code entirely.


Linux (Ubuntu/Debian)

sudo apt-get update
sudo apt-get install -y ffmpeg

Linux (Fedora/RHEL)

sudo dnf install ffmpeg

Linux (Arch)

sudo pacman -S ffmpeg

Verify Installation:

ffmpeg -version

Verify Opus Codec (Required):

ffmpeg -codecs | grep opus

You should observe libopus in the output.


Step 2: Configure Microphone Permissions

macOS

  1. Open System Settings
  2. Navigate to Privacy & SecurityMicrophone
  3. Enable microphone access for Visual Studio Code (or Code - Insiders if utilizing VS Code Insiders)
  4. If VS Code isn't displayed, select the + button and add it
  5. Restart VS Code following permission grant

Alternative Path: System Settings → Security & Privacy → Privacy tab → Microphone


Windows

  1. Open Settings
  2. Navigate to Privacy & SecurityMicrophone
  3. Enable "Microphone access"
  4. Enable "Let apps access your microphone"
  5. Scroll down and enable access for Visual Studio Code
  6. Restart VS Code following permission grant

Alternative Path: Settings → Privacy → Microphone


Linux

Most Linux distributions don't necessitate explicit app permissions, but confirm:

  1. Your user belongs to the audio group:

    groups $USER

    If audio isn't listed, include it:

    sudo usermod -a -G audio $USER

    Then log out and log back in.

  2. Test microphone access:

    arecord -l

    This should enumerate your audio recording devices.

  3. Test recording (3 seconds):

    ffmpeg -f alsa -i default -t 3 test.webm

Step 3: Enable Dictation in Sypha

  1. Launch Sypha in VS Code
  2. Select the Settings (⚙️) icon in the Sypha sidebar
  3. Navigate to the Features tab
  4. Enable "Enable Dictation" checkbox
  5. The dictation settings section will display below

Step 4: Configure Your Transcription Provider

Using Sypha Transcription (Default)

  1. In the dictation settings, select "Sypha" as the transcription service
  2. Sign in to your Sypha Account (necessary for Sypha transcription)
  3. Choose your preferred Transcription Language
  4. Select Save

Note: Sypha transcription necessitates an active Sypha account with available credits.

Using Sarvam AI

  1. In the dictation settings, select "Sarvam AI" as the transcription service
  2. Obtain your Sarvam AI API key from sarvam.ai
  3. Enter your Sarvam AI API Key in the designated field
  4. Choose your preferred Transcription Language
  5. (Optional) Enable Translation and select target language
  6. Select Save

Supported Languages (Sarvam AI):

  • English (en)
  • Hindi (hi)
  • Bengali (bn)
  • Gujarati (gu)
  • Kannada (kn)
  • Malayalam (ml)
  • Marathi (mr)
  • Odia (od)
  • Punjabi (pa)
  • Tamil (ta)
  • Telugu (te)

Using Voice Dictation

  1. Start Recording:

    • Press the microphone icon (🎤) in the Sypha chat input area
    • The icon will transform to red indicating recording is active
    • A timer will display the recording duration
  2. Speak Your Instructions:

    • Articulate clearly and naturally
    • Position yourself near your microphone
    • Minimize background noise where possible
  3. Stop Recording:

    • Press the stop button (⏹️) or the red microphone icon once more
    • Sypha will process your audio and transcribe it
    • The transcribed text will display in the chat input
  4. Review & Send:

    • Review the transcribed text
    • Edit as necessary
    • Press Enter or select Send to submit
  5. Cancel Recording:

    • Press the cancel button (✖️) to discard the recording without transcribing

Transcription Providers

Sypha Transcription (Default)

  • Authentication: Necessitates Sypha account sign-in
  • Credits: Utilizes your Sypha account credits
  • Languages: Multiple languages compatible
  • Translation: Unavailable
  • Best for: Current Sypha users

Sarvam AI

  • Authentication: Necessitates Sarvam AI API key
  • Credits: Utilizes your Sarvam AI credits
  • Languages: 11 Indian languages + English
  • Translation: Real-time translation accessible
  • Best for: Indian language compatibility and translation requirements

Translation Feature

Available with: Sarvam AI exclusively

The translation capability enables you to speak in one language and have it automatically converted to another.

Example Use Cases:

  • Speak in Hindi, obtain English instructions to Sypha
  • Speak in Tamil, obtain Hindi instructions to Sypha
  • Speak in English, obtain Gujarati instructions to Sypha

How to Enable:

  1. Select Sarvam AI as your transcription service
  2. Enable "Enable Translation" checkbox
  3. Select your Transcription Language (the language you'll speak)
  4. Select your Translation Target Language (the language you desire)
  5. Save settings

Workflow:

  1. Speak in your selected transcription language
  2. Sarvam AI transcribes your speech
  3. Sarvam AI translates to your target language
  4. Translated text displays in the chat input

Troubleshooting

Microphone Button is Disabled/Grayed Out

Possible Causes:

  • Dictation feature not enabled
  • FFmpeg not installed
  • No microphone detected

Solutions:

  1. Navigate to Settings → Features → Enable "Enable Dictation"
  2. Install FFmpeg (see Step 1)
  3. Restart VS Code following FFmpeg installation
  4. Verify if your microphone is connected and functioning
  5. Confirm microphone permissions (see Step 2)

"Enable Dictation" Option Not Visible

Possible Cause:

  • Utilizing an older version of Sypha
  • Platform not compatible

Solutions:

  1. Confirm you're utilizing the latest version of Sypha
  2. Verify that you're on a compatible platform (macOS, Windows, Linux)
  3. Reload the VS Code window: Cmd/Ctrl + Shift + P → "Developer: Reload Window"

Recording Starts but Nothing Happens

Possible Causes:

  • FFmpeg process failed silently
  • No audio input being captured
  • Microphone not designated as default

Solutions:

  1. Test FFmpeg manually:

    macOS:

    ffmpeg -f avfoundation -i :default -t 3 test.webm

    Windows:

    ffmpeg -f wasapi -i audio=default -t 3 test.webm

    Linux:

    ffmpeg -f alsa -i default -t 3 test.webm
  2. Check default microphone:

    • Open your system sound settings
    • Confirm your microphone is designated as the default input device
    • Test the microphone in system settings
  3. Check VS Code Output:

    • Open VS Code Output panel: ViewOutput
    • Select "Sypha" from the dropdown
    • Search for error messages related to recording

"Recording file not found" Error

Possible Causes:

  • FFmpeg failed to generate the audio file
  • Microphone permissions not granted
  • FFmpeg missing opus codec
  • Audio device not available

Solutions:

  1. Verify FFmpeg installation:

    ffmpeg -version

    Should display version information.

  2. Check opus codec:

    ffmpeg -codecs | grep opus

    Should display libopus encoder/decoder.

  3. Grant microphone permissions:

  4. Test audio recording manually:

  5. Windows-specific:

    • Confirm FFmpeg is in your PATH
    • Open a new terminal and execute ffmpeg -version
    • If not located, add FFmpeg to PATH and restart VS Code
  6. Linux-specific:

    • Verify audio system is operational:
      systemctl --user status pulseaudio
    • List audio devices:
      arecord -l

"FFmpeg is required" Error

Cause: FFmpeg isn't installed or not in system PATH

Solutions:

  1. Install FFmpeg:

  2. Verify installation:

    ffmpeg -version
  3. Add to PATH (if needed):

    macOS/Linux: Add to ~/.bashrc or ~/.zshrc:

    export PATH="/usr/local/bin:$PATH"

    Then reload: source ~/.bashrc

    Windows:

    • System Properties → Environment Variables
    • Edit PATH variable
    • Add FFmpeg bin directory
    • Restart VS Code
  4. Restart VS Code entirely (close all windows)


Microphone Permission Issues

macOS:

Symptom: Recording initiates but no audio is captured

Solution:

  1. Open System SettingsPrivacy & SecurityMicrophone
  2. If VS Code has a ❌ beside it, remove it and add it once more
  3. Toggle the permission off and back on
  4. Entirely restart VS Code (quit and reopen)
  5. If still malfunctioning:
    tccutil reset Microphone com.microsoft.VSCode
    Then grant permission once more

Windows:

Symptom: "Access denied" or no audio captured

Solution:

  1. Windows Settings:

    • Settings → Privacy & Security → Microphone
    • Enable "Microphone access"
    • Enable "Let apps access your microphone"
    • Enable for "Visual Studio Code"
  2. Check Antivirus/Security Software:

    • Some antivirus software restricts microphone access
    • Add VS Code to whitelist
  3. Run VS Code as Administrator (temporary test):

    • Right-click VS Code → "Run as administrator"
    • Attempt recording once more
    • If it functions, there's a permission issue

Linux:

Symptom: "Device or resource busy" or permission errors

Solution:

  1. Add user to audio group:

    sudo usermod -a -G audio $USER

    Log out and log back in.

  2. Check PulseAudio:

    pulseaudio --check
    pulseaudio --start
  3. Check device permissions:

    ls -l /dev/snd/

    Devices should be accessible to your user.


Poor Transcription Quality

Possible Causes:

  • Background noise
  • Low-quality microphone
  • Speaking too rapidly/slowly
  • Wrong language selected

Solutions:

  1. Improve Recording Environment:

    • Record in a quiet environment
    • Minimize background noise
    • Articulate clearly at a moderate pace
    • Position microphone 6-12 inches from your mouth
  2. Check Microphone Settings:

    • Utilize a quality microphone
    • Test microphone in system settings
    • Adjust input volume (not excessively high to avoid distortion)
  3. Verify Language Settings:

    • Confirm transcription language matches the language you're speaking
    • Settings → Features → Dictation Settings → Transcription Language
  4. Try Different Provider:

    • If utilizing Sypha transcription, attempt Sarvam AI (or vice versa)
    • Different services may perform better for different languages

Transcription in Wrong Language

Cause: Language settings don't match your spoken language

Solution:

  1. Open Settings → Features → Dictation Settings
  2. Verify Transcription Language setting
  3. Select the language you're speaking
  4. Save settings
  5. Attempt recording once more

Note: If utilizing translation, confirm:

  • Transcription Language = the language you speak
  • Translation Target Language = the language you desire in the chat

FAQ

Q: Do I need an internet connection for voice dictation?

A: Yes, transcription necessitates internet connection as it utilizes cloud-based AI services.


Q: Is my voice data stored or recorded?

A: Your voice data is transmitted to the transcription service (Sypha or Sarvam AI) for processing. Temporary audio files are generated locally during recording but are automatically removed following transcription. Please refer to the privacy policies of Sypha and Sarvam AI for details on how they handle audio data.


Q: Can I use voice dictation offline?

A: No, voice dictation necessitates an internet connection to communicate with transcription services.


Q: Which transcription provider should I use?

A:

  • Use Sypha Transcription if you already possess a Sypha account and desire integrated billing
  • Use Sarvam AI if you require Indian language compatibility or real-time translation

Q: How much does voice dictation cost?

A: Pricing depends on your selected provider:

  • Sypha Transcription: Utilizes your Sypha account credits
  • Sarvam AI: Necessitates separate Sarvam AI API credits

Consult with each provider for current pricing.


Q: Can I switch between providers?

A: Yes, you can modify your transcription service at any time in Settings → Features → Dictation Settings.


Q: Why does the microphone button sometimes stay red?

A: This can occur if the recording process doesn't terminate properly. Attempt:

  1. Press the stop button once more
  2. Reload VS Code window: Cmd/Ctrl + Shift + P → "Developer: Reload Window"

Q: Can I use a Bluetooth microphone?

A: Yes, but confirm it's designated as your system's default input device before initiating recording.


Q: Does voice dictation work with all VS Code themes?

A: Yes, the microphone button adapts to your theme's colors.


Q: How long can I record?

A: There's no rigid limit, but for optimal results:

  • Maintain recordings under 2 minutes
  • Longer recordings may require more time to transcribe
  • Separate lengthy instructions into smaller segments

Q: Can I edit the transcribed text before sending?

A: Yes! The transcribed text displays in the chat input where you can review and edit it before transmitting to Sypha.


Q: What audio format is used?

A: Audio is captured in WebM format with Opus codec (16kHz, mono, 32kbps) for optimal quality and file size.


Q: The translation feature isn't working. Why?

A: Translation is exclusively available with Sarvam AI provider. Confirm:

  1. You've selected "Sarvam AI" as transcription service
  2. You've enabled "Enable Translation"
  3. You've selected both transcription and target languages
  4. Your Sarvam AI API key is valid

Still Need Help?

If you're still experiencing difficulties:

  1. Check the Console:

    • Open VS Code Developer Tools: HelpToggle Developer Tools
    • Verify the Console tab for error messages
    • Search for errors related to recording or transcription
  2. Enable Logging:

    • Open VS Code Output panel: ViewOutput
    • Select "Sypha" from the dropdown
    • Attempt recording once more and verify for error messages
  3. Report an Issue:

    • Include your operating system and version
    • Include FFmpeg version (ffmpeg -version)
    • Include error messages from console/output
    • Describe the steps you've taken
    • Submit at: [Your support channel/GitHub issues]

Last Updated: [Current Date] Version: 2.0.0

On this page