Local AI Guide
Model tiers
- Turbo — smallest model, <50ms latency, good for dictation
- Balanced — default; strong accuracy/speed tradeoff
- Studio — highest accuracy for batch transcription
GPU acceleration
NVIDIA CUDA and Apple Metal are auto-detected. AMD ROCm supported on Linux with manual driver install.
If latency exceeds 200ms, switch to Turbo model or close other GPU-heavy applications.
Building your own app
Embed the Rowton inference SDK (coming soon) or call the REST API for hybrid cloud/local deployments. See API Reference.