Every vibe coder pays for dictation. I open-sourced mine.

·Philipp Baldauf·4 min read

Watch how people actually use AI coding tools now, and you'll notice something: almost nobody is typing prompts anymore. They're talking. Holding a key, dictating a paragraph into Cursor, releasing, and watching the words appear. It's faster than typing, and once you've done it for a week you can't go back.

And nearly all of them are paying for it.

There's a whole category of slick dictation apps now — Wispr Flow being the one everyone seems to land on — and they're genuinely good. Polished, fast, nicely designed. They also cost a monthly subscription, and they send your voice to a server to do the transcription.

Here's the part that always makes me smile: this is a crowd that will vibe-code an entire side project in a single afternoon. People who'll happily spin up a custom CLI, a Raycast extension, a little menubar utility for some niche annoyance. And yet the one tool they use all day, every day — the one sitting between their mouth and their editor — is a paid subscription nobody thought to build themselves.

The dictation layer is a solved problem now

A few years ago, "build your own dictation tool" would have been a real project. You'd need a cloud speech API, you'd be paying per minute of audio, you'd be shipping people's voices off to someone else's server. Not anymore.

Whisper runs locally now. On Apple Silicon, WhisperKit does on-device transcription that is genuinely fast and genuinely accurate — no network round-trip, no API key, no per-minute cost. The hard part isn't the transcription anymore. The hard part is the glue: a global hotkey, a tiny menubar presence, grabbing the audio while the key is held, and pasting the result into whatever app has focus.

That glue is exactly the kind of thing we build for fun. So I built it.

ainstype

ainstype is a macOS menubar app. You hold a hotkey — the right Cmd key by default — you speak, you let go, and your words are transcribed and pasted straight into whatever you're working in. Cursor, a terminal, an email, a Slack message. It doesn't care.

The whole thing runs on-device through WhisperKit. Your audio never leaves your Mac. There's no account to create, no subscription, no cloud. It ships with the model pre-bundled so there's nothing to download on first run, supports a custom dictionary for the domain-specific words Whisper always gets wrong, and lets you remap the hotkey and language in a simple config file.

That's it. It does one thing, and it does it without asking you for a credit card or your voice.

My first open-source project

This one's a small milestone for me. I've been shipping apps for over a decade — App Store apps, web apps, products with pricing pages and support inboxes. All of them closed source. ainstype is the first thing I've put out into the world under an open license (MIT), source and all.

It felt strangely vulnerable to do. There's no marketing page to hide behind, no polished onboarding — just the code, exactly as it is, for anyone to read, fork, or tell me is wrong. But that's also the point. This isn't a product I'm trying to grow. It's a tool I wanted for myself, and there's no reason it should cost anyone anything to have it too.

I'm not knocking the paid tools

To be clear: Wispr Flow and the others are good software, and if a polished, supported, cross-platform app is what you want, paying for it is a perfectly reasonable choice. Not everything needs to be a weekend build.

But if you're already comfortable on the command line — if you're the kind of person who reads "you could just build that yourself" as an invitation rather than a chore — then the build-your-own path for dictation is genuinely viable now. The result runs entirely on your machine, costs nothing, and is yours to change.

The code is on GitHub. If you try it, break it, or improve it, I'd love to hear about it.