Local AI Guide

Model tiers

  • Turbo — smallest model, <50ms latency, good for dictation
  • Balanced — default; strong accuracy/speed tradeoff
  • Studio — highest accuracy for batch transcription

GPU acceleration

NVIDIA CUDA and Apple Metal are auto-detected. AMD ROCm supported on Linux with manual driver install.

If latency exceeds 200ms, switch to Turbo model or close other GPU-heavy applications.

Building your own app

Embed the Rowton inference SDK (coming soon) or call the REST API for hybrid cloud/local deployments. See API Reference.