DESKTOP APP · v1.0.1

VocalFuse

Local AI transcription and AI note taking powered by OpenAI Whisper Large — dictation, meetings, and batch exports without sending audio to the cloud.

Built on whisper.cpp · Model file ggml-large-v3.bin · From $10/mo

<100msAvg dictation latency
~3 GBWhisper Large on disk
$10–15Monthly plans
100% localAudio stays on-device
Windows 10+ macOS 12+ Ubuntu 22.04+ CUDA optional

Speech to text at desk speed

Hold your hotkey, speak, and watch text land in Slack, Word, VS Code, or any focused window. VocalFuse runs inference on your GPU or CPU — no round-trip to a cloud API, no per-minute billing surprises.

  • ✓ Always-on-top pill overlay with 16-bar live visualizer
  • ✓ Drag grip, note-mode switch, settings & close controls
  • ✓ Batch transcribe files to TXT, DOCX, SRT, VTT
  • ✓ Pro adds AI note taker + email summaries
Workspace — VocalFuse active

Always-on-top pill · drag grip to reposition

The VocalFuse desktop UI

Pixel-accurate replicas of the Windows client from VocalFuse_cpp — 248×38 pill overlay, six capture states, settings dialog, and system tray menu.

Pill overlay — all states

Idle

Border #c4a04a · “Hold to speak”

Note ready

Note switch on · “Click for notes”

Recording

Border #e2c06a · 16-bar visualizer

Busy

Border #9a7a34 · “Working...”

Note taking

Pro batch capture · live meters

Note sending

SMTP summary dispatch · “Sending notes...”

Settings dialog

460px panel with Product License, Note Taking Mode (email SMTP), and Updates — matching settings.cpp.

Control layout

Right-side controls: 36×16 note switch, settings gear, close button. Hover states shown below.

Hover — settings
Hover — close

System tray

Tray context menu: Settings, Minimize, Show, Close — from tray.cpp.

Layout reference: Pill body #451218, drag grip 18px, logo 18px, Segoe UI labels, note switch 36×16. See the full wireframes in VocalFuse docs.

How VocalFuse works

Subscribe on the web, install the desktop client, activate with your product key, and dictate anywhere.

Subscribe

Pick Basic ($10) or Pro ($15) on the pricing page. Stripe handles billing — cancel anytime.

Download & install

Grab the installer from Downloads. First launch pulls ggml-large-v3.bin if needed.

Activate & configure

Paste your product key, set your hotkey, enable GPU acceleration, and position the pill overlay where you want it.

Dictate & export

Real-time speech to text into any app. Pro users batch-record meetings and export structured notes.

Built for professionals who can't send voice to the cloud

Real-time dictation

Global hotkey capture with sub-100ms partials on capable hardware. Text injects into the focused field or copies to clipboard.

OpenAI Whisper Large

Full Large weights — not a distilled cloud API model. Studio-grade accuracy for accents, jargon, and long-form narration.

Batch transcription

Drop podcasts, interviews, and lectures. Export subtitles and documents without uploading audio anywhere.

AI note taker (Pro)

Capture up to 3 minutes per batch, structure meeting notes locally, and email summaries via SMTP.

Privacy by architecture

Microphone audio never leaves your machine. License checks use HTTPS; voice data does not.

Developer-friendly

License verification API, architecture docs, and local AI guides for teams embedding Fuse intelligence.

Basic vs Pro

Feature Basic — $10/mo Pro — $15/mo
OpenAI Whisper Large local dictation
Always-on-top pill widget
Hold-to-record hotkey
Batch file transcription (TXT, SRT, VTT, DOCX)
AI note taker mode
3-minute batch meeting capture
Email summaries via SMTP
Priority updates

Compare plans & subscribe

Who uses VocalFuse?

Developers & writers

Dictate commits, docs, and emails without leaving the IDE. Whisper Large handles technical vocabulary better than lightweight cloud models.

Legal & healthcare

Keep sensitive conversations on-device. No third-party transcription bot joins your calls — audio stays in your trust boundary.

Students & creators

Transcribe lectures and podcasts offline. Pro turns long sessions into structured notes you can export or email.

System requirements

Operating systemWindows 10/11 (64-bit), macOS 12+, Ubuntu 22.04+
Memory16 GB RAM recommended for OpenAI Whisper Large
GPUOptional NVIDIA GPU with CUDA for lowest latency
Storage~3 GB for ggml-large-v3.bin
Model pathC:\VocalFuse\models\
PermissionsMicrophone; optional accessibility APIs for text injection
AccountActive VocalFuse subscription for downloads and license validation
Need tuning advice? Read the Local AI Guide for GPU setup, or the full VocalFuse documentation for install and troubleshooting.

Start dictating in minutes

Create your account, subscribe, download VocalFuse.exe, and press your hotkey. No cloud account required beyond Fuse Intelligence.

VocalFuse FAQ

What AI model does VocalFuse use for transcription?

VocalFuse uses OpenAI Whisper Large (ggml-large-v3) running locally through whisper.cpp — not a small cloud API or distilled model.

Which platforms support VocalFuse AI transcription?

VocalFuse runs on Windows 10+, macOS 12+, and Ubuntu 22.04+. NVIDIA GPU acceleration is optional via CUDA.