VocalFuse Documentation
Complete reference for installing, licensing, and using the VocalFuse desktop client — local dictation with OpenAI Whisper Large, batch transcription, and Pro AI note taking.
Download from Downloads, run the installer, allow microphone access.
Paste your product key, press Ctrl+Shift+Space, speak into any app.
Overview
VocalFuse is Fuse Intelligence's local-first speech-to-text desktop app. Audio is transcribed on your machine with OpenAI Whisper Large through whisper.cpp (VocalFuse.exe). Subscriptions, billing, and installers are managed through this website.
- Basic ($10/mo): Unlimited local dictation, pill overlay, batch file transcription, product key & updates
- Pro ($15/mo): Everything in Basic plus AI note taker mode, 3-minute batch recording, email summaries
- Model:
ggml-large-v3.bin(~3 GB) stored underC:\VocalFuse\models\ - Privacy: Voice audio is processed locally — not uploaded for cloud transcription
Requirements
| Component | Minimum | Recommended |
|---|---|---|
| OS | Windows 10 64-bit, macOS 12, Ubuntu 22.04 | Latest stable release with security patches |
| CPU | 4-core x64 | 8+ cores for smoother partials |
| RAM | 8 GB | 16 GB for OpenAI Whisper Large |
| GPU | Not required | NVIDIA with CUDA for <100ms dictation |
| Disk | ~3 GB free | SSD for faster model load |
| Network | HTTPS for license checks | Stable connection on first model download |
Grant microphone permission at OS level. Some apps require accessibility permissions for direct text injection — clipboard paste is the fallback.
Install
- Sign in at Login with an active VocalFuse subscription.
- Open Downloads and install the build for your platform.
- On first launch, VocalFuse downloads
ggml-large-v3.binintoC:\VocalFuse\models\if the file is missing. - Launch VocalFuse.exe, sign in with your Fuse account, and enter your product key from Product Keys.
- Configure your dictation hotkey and test with the pill overlay visible.
Product key & license API
Your product key ties VocalFuse to your subscription tier. Copy it from Product Keys and paste it inside the desktop app on first run or after reinstall.
The client validates against the Fuse API on startup with an offline grace period when HTTPS is temporarily unavailable.
/api/license/verify.php?key=YOUR_LICENSE_KEY&product=vocalfuse
Returns JSON confirming validity, product slug, and plan entitlements.
If verification fails repeatedly, confirm your subscription is active and that firewalls allow HTTPS to http://localhost/api/license/verify.php.
Dictation modes
| Mode | Plan | Description |
|---|---|---|
| Hold-to-record | Basic + Pro | Press and hold the global hotkey while speaking. Release to finalize and inject text into the focused application. |
| Toggle dictation | Basic + Pro | Optional setting: tap hotkey to start/stop continuous capture for longer passages. |
| Batch file drop | Basic + Pro | Drag audio files onto the app window for offline transcription and export. |
| AI note taker batch | Pro only | Record up to 3 minutes from the microphone, structure notes locally, optionally email a summary. |
For best live dictation latency, enable GPU acceleration and close competing GPU workloads. See Performance tuning.
Hotkeys & defaults
| Action | Default | Notes |
|---|---|---|
| Toggle / hold dictation | Ctrl+Shift+Space | Rebind in Settings → Hotkeys |
| Show / hide pill | Tray menu | Pill persists per-monitor position via drag grip |
| Paste fallback | Ctrl+V | Used when injection into target app is blocked |
Pill overlay UI
The floating overlay uses the documented rv-pill component: draggable grip, logo, and level meters. Border color reflects capture state.
Ready — press your hotkey to dictate.
Warm border — capturing audio locally.
Muted border — transcription or injection in flight.
Drag the grip to reposition. Placement is saved per monitor and restored on next launch.
Batch transcription
Drop supported audio files (WAV, MP3, M4A, FLAC depending on build) onto the VocalFuse window or use the batch panel from the tray menu. Inference runs locally with the same OpenAI Whisper Large weights as live dictation.
- Add one or more files to the queue.
- Select output format (see Exports).
- Start processing — progress appears in the app shell; no upload step.
- Open the output folder or copy transcript text into your workflow.
Export formats
| Format | Extension | Best for |
|---|---|---|
| Plain text | .txt | Quick paste into docs, tickets, chat |
| Word document | .docx | Formatted handoff to stakeholders |
| SubRip subtitles | .srt | Video editors, YouTube uploads |
| WebVTT | .vtt | HTML5 players, web captions |
Settings panel
In-app preferences are grouped into account, hotkeys, inference, and Pro features. Web-side theme and profile settings live at Account Settings.
VocalFuse settings
Account
Hotkeys
| Setting | Location |
|---|---|
| Product key | Desktop → Account (synced from web) |
| Dictation hotkey | Desktop → Hotkeys |
| GPU / CPU threads | Desktop → Performance |
| AI note taker + SMTP | Desktop → Pro (requires Pro plan) |
| Theme & profile name | Web → Account Settings |
AI note taking (Pro)
VocalFuse Pro extends dictation into structured capture: batch microphone recording (up to 3 minutes), local note organization, and optional email summaries via SMTP — while keeping audio off third-party transcription clouds.
- Confirm your plan includes Pro on Subscriptions.
- Enable Note taker in the desktop Settings panel.
- Start a batch capture session from the tray menu or Pro panel.
- Review structured notes, export, or send email summary if SMTP is configured.
- Organize running notes during calls; paste into Slack, Docs, or your IDE from the overlay.
- Prefer GPU acceleration during live narration for responsive partials.
- Use batch file mode for pre-recorded lectures when you already have audio files.
Learn more on the AI note taker landing page.
Performance tuning
| Symptom | Fix |
|---|---|
| Slow partials (>200ms) | Enable GPU in Settings → Performance; close other GPU apps |
| High RAM usage | Ensure ggml-large-v3.bin is the only loaded model; restart after long sessions |
| CPU-only laptops | Reduce background apps; expect higher latency than CUDA builds |
| Long batch jobs | Process overnight; exports write to disk as each file completes |
Deep dive: Local AI Guide · Architecture
Updates
Stable releases publish to Downloads when ready. User-visible changes are summarized on the changelog.
This documentation targets release 1.0.1 as served by the Fuse API. After updating, restart VocalFuse so license and model manifests refresh.
Troubleshooting
- No text appearing at cursor
- Confirm the destination app is focused. Grant accessibility permissions on macOS. Try clipboard paste fallback from the pill menu.
- High latency or stuttering partials
- Enable GPU acceleration, verify CUDA drivers, and reduce competing GPU workloads. See Performance tuning.
- “Invalid key” or license errors
- Renew on Subscriptions, re-copy the key from Product Keys, check HTTPS/firewall rules.
- Model missing or failed download
- Ensure
ggml-large-v3.binexists inC:\VocalFuse\models\. Re-run first-launch download or place the file manually, then restart VocalFuse.exe. - Pro features greyed out
- Verify Pro (or admin) entitlements via license API response. Basic plans do not include AI note taker or email summaries.
- Microphone not detected
- Check OS privacy settings, correct input device in Sound settings, and that no other app holds exclusive mic access.
VocalFuse FAQ
What AI model does VocalFuse use for transcription?
VocalFuse uses OpenAI Whisper Large (ggml-large-v3) running locally through whisper.cpp — not a small cloud API or distilled model.
Which platforms support VocalFuse AI transcription?
VocalFuse runs on Windows 10+, macOS 12+, and Ubuntu 22.04+. NVIDIA GPU acceleration is optional via CUDA.