Voice Dictation - User Guide

Voice dictation enables you to interact with Sypha through spoken commands rather than keyboard input. Just press the microphone button, vocalize your directives, and Sypha will convert and execute your request.

Overview
Prerequisites
- macOS
- Windows
- Linux
Installation & Setup
Using Voice Dictation
Transcription Providers
- Sypha Transcription (Default)
- Sarvam AI
Translation Feature
Troubleshooting
FAQ

Overview

The voice dictation capability delivers:

Hands-free engagement with Sypha
Multi-language compatibility for transcription
Real-time translation (Sarvam AI exclusively)
Multiple transcription services (Sypha integrated or Sarvam AI)
Cross-platform functionality (macOS, Windows, Linux)

Prerequisites

macOS

macOS 10.15 or newer
FFmpeg (multimedia framework for audio capture)
Microphone (integrated or external)
Microphone permissions for VS Code

Windows

Windows 10 or newer
FFmpeg (multimedia framework for audio capture)
Microphone (integrated or external)
Microphone permissions for VS Code

Linux

Ubuntu 20.04+ or comparable distribution
FFmpeg with ALSA compatibility
Microphone (integrated or external)
Audio system (PulseAudio or ALSA)
Microphone permissions

Installation & Setup

Step 1: Install FFmpeg

FFmpeg is necessary for audio capture across all platforms.

macOS

Option 1: Using Homebrew (Recommended)

brew install ffmpeg

Option 2: Using MacPorts

sudo port install ffmpeg

Verify Installation:

ffmpeg -version

Windows

Option 1: Using winget (Windows 10+)

winget install Gyan.FFmpeg

Option 2: Manual Installation

Download FFmpeg from ffmpeg.org
Select "Windows builds from gyan.dev"
Download the most recent release (full build)
Extract to C:\ffmpeg
Add to PATH:
- Open System Properties → Environment Variables
- Edit Path variable
- Add C:\ffmpeg\bin
- Click OK and restart your terminal

Verify Installation:

ffmpeg -version

Important: Following FFmpeg installation, restart VS Code entirely.

Linux (Ubuntu/Debian)

sudo apt-get update
sudo apt-get install -y ffmpeg

Linux (Fedora/RHEL)

sudo dnf install ffmpeg

Linux (Arch)

sudo pacman -S ffmpeg

Verify Installation:

ffmpeg -version

Verify Opus Codec (Required):

ffmpeg -codecs | grep opus

You should observe libopus in the output.

Step 2: Configure Microphone Permissions

macOS

Open System Settings
Navigate to Privacy & Security → Microphone
Enable microphone access for Visual Studio Code (or Code - Insiders if utilizing VS Code Insiders)
If VS Code isn't displayed, select the + button and add it
Restart VS Code following permission grant

Alternative Path: System Settings → Security & Privacy → Privacy tab → Microphone

Windows

Open Settings
Navigate to Privacy & Security → Microphone
Enable "Microphone access"
Enable "Let apps access your microphone"
Scroll down and enable access for Visual Studio Code
Restart VS Code following permission grant

Alternative Path: Settings → Privacy → Microphone

Linux

Most Linux distributions don't necessitate explicit app permissions, but confirm:

Your user belongs to the audio group:
```
groups $USER
```
If audio isn't listed, include it:
```
sudo usermod -a -G audio $USER
```
Then log out and log back in.
Test microphone access:
```
arecord -l
```
This should enumerate your audio recording devices.

Test recording (3 seconds):

ffmpeg -f alsa -i default -t 3 test.webm

Step 3: Enable Dictation in Sypha

Launch Sypha in VS Code
Select the Settings (⚙️) icon in the Sypha sidebar
Navigate to the Features tab
Enable "Enable Dictation" checkbox
The dictation settings section will display below

Step 4: Configure Your Transcription Provider

Using Sypha Transcription (Default)

In the dictation settings, select "Sypha" as the transcription service
Sign in to your Sypha Account (necessary for Sypha transcription)
Choose your preferred Transcription Language
Select Save

Note: Sypha transcription necessitates an active Sypha account with available credits.

Using Sarvam AI

In the dictation settings, select "Sarvam AI" as the transcription service
Obtain your Sarvam AI API key from sarvam.ai
Enter your Sarvam AI API Key in the designated field
Choose your preferred Transcription Language
(Optional) Enable Translation and select target language
Select Save

Supported Languages (Sarvam AI):

English (en)
Hindi (hi)
Bengali (bn)
Gujarati (gu)
Kannada (kn)
Malayalam (ml)
Marathi (mr)
Odia (od)
Punjabi (pa)
Tamil (ta)
Telugu (te)

Using Voice Dictation

Start Recording:
- Press the microphone icon (🎤) in the Sypha chat input area
- The icon will transform to red indicating recording is active
- A timer will display the recording duration
Speak Your Instructions:
- Articulate clearly and naturally
- Position yourself near your microphone
- Minimize background noise where possible
Stop Recording:
- Press the stop button (⏹️) or the red microphone icon once more
- Sypha will process your audio and transcribe it
- The transcribed text will display in the chat input
Review & Send:
- Review the transcribed text
- Edit as necessary
- Press Enter or select Send to submit
Cancel Recording:
- Press the cancel button (✖️) to discard the recording without transcribing

Transcription Providers

Sypha Transcription (Default)

Authentication: Necessitates Sypha account sign-in
Credits: Utilizes your Sypha account credits
Languages: Multiple languages compatible
Translation: Unavailable
Best for: Current Sypha users

Sarvam AI

Authentication: Necessitates Sarvam AI API key
Credits: Utilizes your Sarvam AI credits
Languages: 11 Indian languages + English
Translation: Real-time translation accessible
Best for: Indian language compatibility and translation requirements

Translation Feature

Available with: Sarvam AI exclusively

The translation capability enables you to speak in one language and have it automatically converted to another.

Example Use Cases:

Speak in Hindi, obtain English instructions to Sypha
Speak in Tamil, obtain Hindi instructions to Sypha
Speak in English, obtain Gujarati instructions to Sypha

How to Enable:

Select Sarvam AI as your transcription service
Enable "Enable Translation" checkbox
Select your Transcription Language (the language you'll speak)
Select your Translation Target Language (the language you desire)
Save settings

Workflow:

Speak in your selected transcription language
Sarvam AI transcribes your speech
Sarvam AI translates to your target language
Translated text displays in the chat input

Troubleshooting

Microphone Button is Disabled/Grayed Out

Possible Causes:

Dictation feature not enabled
FFmpeg not installed
No microphone detected

Solutions:

Navigate to Settings → Features → Enable "Enable Dictation"
Install FFmpeg (see Step 1)
Restart VS Code following FFmpeg installation
Verify if your microphone is connected and functioning
Confirm microphone permissions (see Step 2)

"Enable Dictation" Option Not Visible

Possible Cause:

Utilizing an older version of Sypha
Platform not compatible

Solutions:

Confirm you're utilizing the latest version of Sypha
Verify that you're on a compatible platform (macOS, Windows, Linux)
Reload the VS Code window: Cmd/Ctrl + Shift + P → "Developer: Reload Window"

Recording Starts but Nothing Happens

Possible Causes:

FFmpeg process failed silently
No audio input being captured
Microphone not designated as default

Solutions:

Test FFmpeg manually:

macOS:

ffmpeg -f avfoundation -i :default -t 3 test.webm

Windows:

ffmpeg -f wasapi -i audio=default -t 3 test.webm

Linux:

ffmpeg -f alsa -i default -t 3 test.webm

Check default microphone:
- Open your system sound settings
- Confirm your microphone is designated as the default input device
- Test the microphone in system settings
Check VS Code Output:
- Open VS Code Output panel: View → Output
- Select "Sypha" from the dropdown
- Search for error messages related to recording

"Recording file not found" Error

Possible Causes:

FFmpeg failed to generate the audio file
Microphone permissions not granted
FFmpeg missing opus codec
Audio device not available

Solutions:

Verify FFmpeg installation:
```
ffmpeg -version
```
Should display version information.
Check opus codec:
```
ffmpeg -codecs | grep opus
```
Should display libopus encoder/decoder.
Grant microphone permissions:
- See Step 2: Configure Microphone Permissions
- Important: Restart VS Code following permission grant
Test audio recording manually:
- See solutions in Recording Starts but Nothing Happens
Windows-specific:
- Confirm FFmpeg is in your PATH
- Open a new terminal and execute ffmpeg -version
- If not located, add FFmpeg to PATH and restart VS Code
Linux-specific:
- Verify audio system is operational:
```
systemctl --user status pulseaudio
```
- List audio devices:
```
arecord -l
```

"FFmpeg is required" Error

Cause: FFmpeg isn't installed or not in system PATH

Solutions:

Install FFmpeg:
- See Step 1: Install FFmpeg for your platform
Verify installation:
```
ffmpeg -version
```
Add to PATH (if needed):

macOS/Linux: Add to ~/.bashrc or ~/.zshrc:
```
export PATH="/usr/local/bin:$PATH"
```
Then reload: source ~/.bashrc

Windows:
- System Properties → Environment Variables
- Edit PATH variable
- Add FFmpeg bin directory
- Restart VS Code
Restart VS Code entirely (close all windows)

Microphone Permission Issues

macOS:

Symptom: Recording initiates but no audio is captured

Solution:

Open System Settings → Privacy & Security → Microphone
If VS Code has a ❌ beside it, remove it and add it once more
Toggle the permission off and back on
Entirely restart VS Code (quit and reopen)
If still malfunctioning:
```
tccutil reset Microphone com.microsoft.VSCode
```
Then grant permission once more

Windows:

Symptom: "Access denied" or no audio captured

Solution:

Windows Settings:
- Settings → Privacy & Security → Microphone
- Enable "Microphone access"
- Enable "Let apps access your microphone"
- Enable for "Visual Studio Code"
Check Antivirus/Security Software:
- Some antivirus software restricts microphone access
- Add VS Code to whitelist
Run VS Code as Administrator (temporary test):
- Right-click VS Code → "Run as administrator"
- Attempt recording once more
- If it functions, there's a permission issue

Linux:

Symptom: "Device or resource busy" or permission errors

Solution:

Add user to audio group:
```
sudo usermod -a -G audio $USER
```
Log out and log back in.
Check PulseAudio:
```
pulseaudio --check
pulseaudio --start
```
Check device permissions:
```
ls -l /dev/snd/
```
Devices should be accessible to your user.

Poor Transcription Quality

Possible Causes:

Background noise
Low-quality microphone
Speaking too rapidly/slowly
Wrong language selected

Solutions:

Improve Recording Environment:
- Record in a quiet environment
- Minimize background noise
- Articulate clearly at a moderate pace
- Position microphone 6-12 inches from your mouth
Check Microphone Settings:
- Utilize a quality microphone
- Test microphone in system settings
- Adjust input volume (not excessively high to avoid distortion)
Verify Language Settings:
- Confirm transcription language matches the language you're speaking
- Settings → Features → Dictation Settings → Transcription Language
Try Different Provider:
- If utilizing Sypha transcription, attempt Sarvam AI (or vice versa)
- Different services may perform better for different languages

Transcription in Wrong Language

Cause: Language settings don't match your spoken language

Solution:

Open Settings → Features → Dictation Settings
Verify Transcription Language setting
Select the language you're speaking
Save settings
Attempt recording once more

Note: If utilizing translation, confirm:

Transcription Language = the language you speak
Translation Target Language = the language you desire in the chat

FAQ

Q: Do I need an internet connection for voice dictation?

A: Yes, transcription necessitates internet connection as it utilizes cloud-based AI services.

Q: Is my voice data stored or recorded?

A: Your voice data is transmitted to the transcription service (Sypha or Sarvam AI) for processing. Temporary audio files are generated locally during recording but are automatically removed following transcription. Please refer to the privacy policies of Sypha and Sarvam AI for details on how they handle audio data.

Q: Can I use voice dictation offline?

A: No, voice dictation necessitates an internet connection to communicate with transcription services.

Q: Which transcription provider should I use?

Use Sypha Transcription if you already possess a Sypha account and desire integrated billing
Use Sarvam AI if you require Indian language compatibility or real-time translation

Q: How much does voice dictation cost?

A: Pricing depends on your selected provider:

Sypha Transcription: Utilizes your Sypha account credits
Sarvam AI: Necessitates separate Sarvam AI API credits

Consult with each provider for current pricing.

Q: Can I switch between providers?

A: Yes, you can modify your transcription service at any time in Settings → Features → Dictation Settings.

Q: Why does the microphone button sometimes stay red?

A: This can occur if the recording process doesn't terminate properly. Attempt:

Press the stop button once more
Reload VS Code window: Cmd/Ctrl + Shift + P → "Developer: Reload Window"

Q: Can I use a Bluetooth microphone?

A: Yes, but confirm it's designated as your system's default input device before initiating recording.

Q: Does voice dictation work with all VS Code themes?

A: Yes, the microphone button adapts to your theme's colors.

Q: How long can I record?

A: There's no rigid limit, but for optimal results:

Maintain recordings under 2 minutes
Longer recordings may require more time to transcribe
Separate lengthy instructions into smaller segments

You've selected "Sarvam AI" as transcription service
You've enabled "Enable Translation"
You've selected both transcription and target languages
Your Sarvam AI API key is valid

Still Need Help?

If you're still experiencing difficulties:

Check the Console:
- Open VS Code Developer Tools: Help → Toggle Developer Tools
- Verify the Console tab for error messages
- Search for errors related to recording or transcription
Enable Logging:
- Open VS Code Output panel: View → Output
- Select "Sypha" from the dropdown
- Attempt recording once more and verify for error messages
Report an Issue:
- Include your operating system and version
- Include FFmpeg version (ffmpeg -version)
- Include error messages from console/output
- Describe the steps you've taken
- Submit at: [Your support channel/GitHub issues]

Last Updated: [Current Date] Version: 2.0.0

Voice Dictation User Guide

On this page