Engineering

Private voice dictation Mac edition: how Loqua's hybrid voice typing stack keeps your data on your side

Most voice dictation depends on cloud paths. Loqua uses a hybrid architecture with local-first sensitive layers, optional cloud features, and visible boundaries.

TL;DR

If you're searching for private voice dictation Mac options that aren't just "cloud transcription with a privacy policy," this is the architectural answer. Loqua is hybrid by design: the sensitive core path — speech recognition, local cleanup, named-entity handling, and screen/context reading — is designed to run on-device on Apple Silicon by default. We treat this as secure dictation on Mac because the layers that touch audio and screen content are local-first, not because the marketing copy says "private." Optional cloud processing is reserved for features such as longer rewrites or selected translations, and can be disabled. We do not train on user dictation data. The goal is a visible boundary around what stays in local voice typing mode and what, if enabled, crosses the wire.

Loqua is a context-aware voice typing tool for Mac. The fact that it can use screen context makes the privacy story central. If a dictation product can see your code, your messages, and your half-drafted emails, the architecture around that data is not a marketing footnote - it is the product.

I'm Shuran, and I co-built this stack with a small team of algorithm researchers. We use Loqua for our own internal Slack, email, coding prompts, and code review. The standard we wanted was simple: keep the sensitive path local by default, make optional cloud use visible, and avoid training on user dictation data.

The cloud-default tradeoff

Many modern dictation products use cloud transcription. That can be a reasonable engineering choice: large models, centralized updates, cross-platform consistency, enterprise controls, and documented zero-data-retention modes can all live in that architecture.

The tradeoff is surface area. Once audio or context crosses the wire, there is now a server path between your microphone and your cursor: transport, queues, logs, model providers, operational metadata, and enterprise policy. Good vendors manage that surface carefully. But users still need to understand where the boundary is.

Loqua starts from a different default. The layers that touch audio and screen context are designed to run locally first. Optional cloud features are treated as explicit feature boundaries, not invisible plumbing.

Why pure on-device is still a tradeoff

Pure on-device AI is appealing, and for routine dictation it is the right default. But absolute claims get brittle. Some long-tail tasks - very long rewrites, distant-language translation, rare-domain transformation - can benefit from larger cloud models. Model updates, crash reporting, license checks, and feature delivery also create network touchpoints in many products.

So we avoid the slogan version of privacy. The useful answer is not "cloud bad" or "local magic." It is a hybrid architecture with clear defaults, explicit controls, and a product that keeps working when cloud features are disabled.

What hybrid means at Loqua

Here's the architecture, said plainly:

LayerWhere it runs by defaultWhy
Speech recognition (Layer 1)On-device, Apple Neural EngineLatency budget; audio sensitivity
Language intelligence — filler cleanup, NER, basic formatting (Layer 2)On-deviceLatency; vocabulary is yours
Multimodal context — screen reading (Layer 3)On-deviceScreen content never leaves your machine
Cloud post-processing — only when you opt inLoqua-managed cloud, TLS-encryptedLong-form rewrites, certain translations

The three core layers — the ones that touch audio and screen content — are designed to run on-device by default. You can use Loqua in offline mode for the core dictation experience.

Cloud is reserved for specific, opt-in cases. When it's used: cloud traffic is TLS-encrypted; cloud processing is zero-retention (the request is processed and discarded); and the user can disable cloud entirely from Settings. We do not train on user data at any point — not on cloud traffic, not on on-device usage.

Every boundary, visible

The principle: if a piece of your data crosses a boundary, you should know about it without reading the EULA. Here's how we make every boundary visible:

  • Menu-bar indicator. When Loqua is recording, the menu-bar icon turns red. When cloud is being used for a particular utterance, the indicator visibly differs (a small cloud icon overlay). You see, in real time, whether anything is leaving your machine.
  • Settings → Privacy panel. Lists exactly what cloud calls are enabled, with toggles for each. Translation can be on while long-form rewrite is off, or vice versa.
  • Audio handling. Audio is not sent to the cloud for the default core dictation path. Optional cloud features are explicit and can be disabled.
  • Screen content handling. Screen content read by the multimodal context layer never crosses the wire. Even if you enable cloud rewriting, only the text being rewritten goes — not the surrounding screen.
  • Logging. Local debug logs do not include dictated content. Cloud-side logs do not include audio or transcripts.

AED and multimodal context processing stay local under the same boundary. The prototype work described in sounds with meaning treats non-word audio as a local, opt-in signal, and the multimodal listener described in building a listener that sees what you see uses screen context for the current utterance rather than creating a general screen log.

Algorithmic tradeoffs at low latency

Running the core layers on-device while keeping dictation responsive on consumer Macs is the hardest engineering work in this stack. Three things made it feasible:

  • Aggressive operator selection for the Neural Engine. Not every transformer operator runs efficiently on Apple's Neural Engine. We choose layer types, attention variants, and quantization schemes that stay on the fast path. Apple's Core ML documentation maps the supported operator set; falling off it can be expensive.
  • Streaming-first speech recognition. Output starts before the full utterance is finalized. Non-streaming variants can improve per-utterance accuracy but feel slower.
  • Parallel pipeline. The context layer runs in parallel with speech recognition. By the time the language layer is ready to format output, the destination context has already been read locally.

The tradeoff: parameter budgets are tight. Each local layer is smaller than a cloud model unconstrained by laptop thermals. We compensate with task-specific training data, careful fine-tuning, and a narrow Mac-first scope. Internal benchmarks currently target roughly 200ms-class response, high technical-vocabulary recognition, and low single-digit WER in supported conditions; we describe these as internal targets until a public benchmark page exists.

What we guarantee

The hard list:

  • No training on user data. Not on audio. Not on transcripts. Not on cloud-processed text. Not for any future model version.
  • No audio uploaded unless you opt in. Default: no cloud audio. Opt-in cloud features are explicit and per-feature.
  • Zero retention on cloud-processed data. The request is processed and immediately discarded. There is no "30-day soft-delete" — there is no copy to delete.
  • TLS for all cloud traffic. Standard practice but stated for completeness.
  • Offline mode. A single toggle in Settings disables every cloud call. Loqua continues to work using only the on-device layers.
  • No browser hooks. No cross-app tracking. Loqua reads the active app's context for the current dictation only. Between dictations, the multimodal context layer is idle.
  • Personal Dictionary stays local. Your custom vocabulary lives in a local file. It does not sync to any cloud and is not visible to us.

Your controls

Privacy is only useful if the user has controls that are easy to find. From the Settings -> Privacy panel you can:

  • Disable optional cloud calls
  • Toggle long-form cloud rewriting on or off
  • Toggle cloud translation on or off
  • Exclude specific apps from Loqua entirely
  • Revoke microphone permission in macOS System Settings
  • Revoke Accessibility permission in macOS System Settings

For regulated or security-sensitive workflows, use full offline mode and run your own compliance review. We do not present a blog post as legal or HIPAA compliance advice; the product boundary is technical, and formal compliance requirements should be evaluated through the right policy channel.

Further reading

If you have a specific voice typing privacy or security requirement we don't address here, email us. We're a small team and we'd rather answer your question directly than have you guess from a generic policy document. That's the short version of why Loqua is built as a private voice dictation Mac product first and a cloud-feature product second.

Frequently asked questions

Is audio ever sent to the cloud?
Not by default. Speech recognition runs on-device on Apple Silicon. Audio is sent to the cloud only if you explicitly enable a cloud feature that requires it (currently: certain long-form rewrites and some translation pairs). You can disable all cloud calls in Settings → Privacy.
Does Loqua train on my dictation or my audio?
No. Not on audio, not on transcripts, not on cloud-processed text. Not for any future model version. We use carefully curated training data sets that don't include user content.
Can I run Loqua fully offline?
Yes. Toggle off all cloud calls in Settings → Privacy. The core dictation experience — speech recognition, multimodal context, NER, app-aware formatting — runs entirely on-device. You'll lose the optional cloud features (long-form rewrites, certain translations) and gain a stack with no network surface.
What gets logged?
Local debug logs include diagnostic information (model load time, latency measurements, error traces) but do not include your dictated content. Cloud-side logs do not include audio or transcripts — only opaque request metadata for service reliability.
What about GDPR / CCPA?
We are designed to comply. Since most processing is on-device and cloud processing is zero-retention, there is typically no personal data to subject to access or deletion requests. For specifics relevant to your jurisdiction, see our privacy policy or email us.
Can I use Loqua in HIPAA-style regulated workflows?
Do not treat this blog post as legal or HIPAA compliance advice. Loqua can be run with optional cloud features disabled for sensitive workflows, but regulated deployments should be reviewed through your compliance process and any required agreements.

Try Loqua today

Free to start. Mac native. Built by algorithm researchers who use it every day.

Download for Mac

More from the Loqua Blog

how-to
How to dictate code on Mac: a complete guide for Cursor, VS Code, and Claude Code
compare
Loqua vs Wispr Flow: a Mac-first Wispr Flow alternative for context, coding, and privacy
engineering
Multimodal voice recognition: building a listener that sees what you see
engineering
Audio event detection dictation: sounds with meaning beyond words
productivity
Voice productivity stack: 9 tools we actually use to write, ship, and think