Buzz
A bare-bones interface sitting on top of OpenAI's Whisper. Drop in a file, choose a model, receive text. The app most newcomers should reach for first if they've never run a local transcription before.
Read the full reviewOver several weekends we ran real recordings β interviews, lectures, voice notes, the occasional podcast β through five popular desktop transcription tools. Here's what we found, stated plainly. No ranking. No declared winner. Just five tools and the specific work each one actually handles well.
All five process audio directly on your computer. None of them require a subscription. Each one approaches the problem differently β which is precisely why this site exists.
A bare-bones interface sitting on top of OpenAI's Whisper. Drop in a file, choose a model, receive text. The app most newcomers should reach for first if they've never run a local transcription before.
Read the full reviewIt's technically a subtitle editor β but if you've ever needed to fix Whisper's punctuation, re-time captions, or export to a dozen different subtitle formats, nothing else competes. Heavier than Buzz, but it pays that back the moment you start polishing output.
Read the full reviewThe refined one. Sandboxed, signed, and sits quietly in the menu bar. If you want something that actually feels Mac-native rather than a Python script dressed up in a window, this is the one.
Read the full reviewThe newcomer. Built around Apple's MLX framework, it's the quickest of the group on M-series chips for the model sizes it covers β sometimes by an embarrassingly wide margin.
Read the full reviewA different category entirely: VoiceInk handles dictation, not file transcription. Press a hotkey, speak, and your text lands wherever the cursor sits. We included it because it answers a question the other four don't.
Read the full reviewA condensed table β handy for narrowing options, not for making the final call. The full reviews dig deeper into where each tool stumbles.
| App | Platforms | Price | Best for | File mode | Live dictation | Subtitle export |
|---|---|---|---|---|---|---|
| Buzz | Mac Β· Win Β· Linux | Free (open source) | First-time users, straightforward batch jobs | Yes | Limited | SRT, VTT, TXT |
| Subtitle Edit | Win Β· Mac/Linux (Mono) | Free (open source) | Cleaning up transcripts & subtitle work | Yes | No | ~200 formats |
| Whisper Transcription | macOS | Free tier Β· paid model unlocks | Mac users who want refinement over tinkering | Yes | Microphone capture | SRT, VTT, TXT, DOCX |
| Pyrenees | macOS (Apple Silicon) | Free | Speed, batch jobs on M-series Macs | Yes | No | SRT, VTT, TXT |
| VoiceInk | macOS | Free (open source) | Dictation into any application | Secondary feature | Primary feature | N/A |
People keep asking us "what's the best transcription app?" That question doesn't really have a clean answer. Here's a more productive way to think about it.
Try Buzz. It's the lowest-friction way to find out whether local transcription is good enough for your needs. Five minutes in, you'll know.
Subtitle Edit, no contest. The waveform editor and format support eat the whole rest of the field. Whisper integration is just the cherry on top.
Whisper Transcription if you want a curated experience and don't mind paying for higher-quality models, Pyrenees if you'd rather get the speed and zero cost and don't need bells and whistles.
VoiceInk. It's the only one of the five built around the "I want to talk into my computer" workflow. The other four are wrong tools for that job.
We're not journalists. We're a couple of people who got tired of "Top 10 AI Transcription Tools 2025" articles that all said the same five things in the same five blocks. There's nothing wrong with affiliate roundups β they keep the lights on for a lot of small sites β but they tend to flatten the differences between tools. We wanted somewhere that does the opposite, and explains why you'd pick one over another.
Read the reviews in any order. Most people don't read them all β and that's fine. More about us here, if you're curious.
Buzz is one of those tools that solves exactly one problem and then gets out of your way. The problem in question β running OpenAI's Whisper model on a file you don't want leaving your laptop β used to involve at least one virtual environment, a couple of pip install commands, and a 50-50 chance of an opaque ffmpeg error. Buzz turns all of that into a window with a button labeled "Transcribe". Drag in your audio, pick a model size, watch the progress bar. Output goes to a folder. Done.
That's the entire pitch. If it sounds modest, that's because the surface is designed to be modest. The compelling part is what it leaves out β no account creation, no server uploads, no upsell toward an AI-summary feature on a credits plan. It really is just a wrapper. And after a fair amount of time with the app across two laptops and three operating systems, that minimalism is the app's single greatest strength.
Buzz is an open-source desktop application written by Chidi Williams. The repository on GitHub has been around since 2022 and continues to receive updates. Under the hood, it bundles two transcription engines: the original OpenAI Whisper implementation in Python, and the much faster whisper.cpp port written in C++. You can pick which one you want at the model-loading screen β and which one you want depends on what kind of computer you're sitting at.
If you've used something like Audacity, Buzz will feel familiar in spirit: utilitarian, slightly dated in its widgets, clearly built by people more interested in function than aesthetics. No marketing-inflated empty space. No dashboard. The main window is a list of transcription jobs, each row showing whether it's queued, running, or done.
Buzz is not OpenAI's official Whisper application. OpenAI has never shipped one. Buzz is a community-made front-end that loads OpenAI's open-source model on your local machine. Everything happens on your computer; nothing is transmitted to OpenAI or any other server.
The first thing that struck me was how quickly I got from "I just downloaded this" to "I have a transcript." On a 2021 M1 MacBook Pro, the entire setup took about three minutes β and most of that was the initial model download (the medium-size model is about 1.5 GB). On a five-year-old Windows laptop without a discrete GPU, it took longer to run the transcription, but the setup was identical.
The interface isn't attractive. Better to say that upfront. It's built with PyQt and looks like every other PyQt application β functional in the way a utility knife is functional. You will not show it to someone to impress them.
I tested it on a 47-minute interview recorded for an unrelated project. The tiny model finished in around 90 seconds and got the gist. The medium model took four minutes and caught most proper nouns. The large model ran for about fourteen minutes and produced output I'd actually show someone.
The best thing I can say about Buzz is that I stopped noticing I was using it. It does the job, the job ends, the file is there.
Buzz lets you choose between several model sizes (tiny, base, small, medium, large) and several backends. The most important choice is between the original OpenAI Whisper Python implementation, the whisper.cpp backend, and Hugging Face's transformers-based implementation. There's also support for using OpenAI's hosted Whisper API if you'd prefer to send the file to OpenAI in exchange for faster results β but that defeats the privacy advantage, and almost no one I know who installs Buzz uses that mode.
Two practical observations from real-world use:
whisper.cpp backend with Core ML acceleration is the fastest by a wide margin. You'll want to enable that.Buzz also supports a "Live Recording" mode where it'll transcribe directly from your microphone as you speak. I've used this feature exactly twice, and both times I came away thinking that this is not what Buzz is for. The latency is wrong for it β you'll get text in chunks of several seconds β and it doesn't integrate with other apps. If you want dictation that drops text where your cursor is, look at VoiceInk instead. If you want live captions for a video call, look elsewhere entirely. Buzz is a file-based tool with a microphone option grafted on, and you can feel the seam.
If you've already tried Buzz and the transcripts come back with weird timing or punctuation issues, don't wrestle with the app β export to .srt or .vtt and clean up in Subtitle Edit. It's faster than fighting Buzz's text editor.
whisper.cppThis is the section most write-ups skip, so here it is. The complete flow, from "I haven't installed anything" to "I have a clean SRT", without skipping the parts that actually trip people up.
.dmg; on Windows it's an .exe installer; on Linux you've got AppImage and Snap options.
Test recording: a 47-minute interview, recorded into the iPhone Voice Memos app, exported as .m4a.
Result with the medium model on M1 MacBook Pro: finished in 14 minutes 22 seconds. The transcript needed roughly 5 minutes of cleanup β mostly proper nouns the model didn't know, plus the usual punctuation around hesitations.
Staying within this site's shortlist, here's how Buzz stacks up against the others:
Versus Subtitle Edit: Subtitle Edit can also drive Whisper, but it does much more than transcribe β it has a full waveform editor and supports an absurd number of subtitle formats. If you're a translator or a captioner, Subtitle Edit is probably your daily driver and Buzz is redundant. If you just want a transcript, Buzz is faster to learn.
Versus Whisper Transcription (Mac): Whisper Transcription is more polished, prettier, and better integrated into macOS. It's also Mac-only and has a paid tier. Buzz is uglier, but free everywhere.
Versus Pyrenees: Pyrenees is faster on Apple Silicon, full stop β but only on Apple Silicon. If you're on an M-series Mac and you mostly transcribe shorter files, Pyrenees wins on speed. Buzz wins on cross-platform consistency and on having more backend options.
Versus VoiceInk: Different tool for a different job. VoiceInk is for live dictation (talking into apps as you'd talk into iOS dictation). Buzz is for files. They don't really compete.
If you've never run a local transcription before β install Buzz first, even if you move on to something else later. It's the lowest-resistance way to verify whether local transcription meets your needs at all.
If you already know you need subtitle editing, dictation, or maximum speed on Apple Silicon, you can likely skip Buzz and go straight to the more specialized tool.
Buzz itself is free under the MIT license. No sign-up, no trial period, no premium tier. The only expense would be if you choose to use OpenAI's hosted API as a backend β but the default local mode costs nothing beyond the electricity your machine uses.
Not by default. The local backends (whisper.cpp and the OpenAI open-source Whisper) do everything on your machine. The only mode that uploads anything is the explicit "OpenAI API" mode, and you have to provide your own API key to use it.
Whatever Whisper supports, which covers roughly 99 languages with varying accuracy. English, the major European languages, and Mandarin perform best. Smaller languages may be noticeably less reliable.
You can do basic text edits, but Buzz isn't a text editor or a subtitle editor. For any serious cleanup β re-timing, fixing punctuation, splitting cues β open the SRT in Subtitle Edit or another dedicated tool.
Yes, after the initial model download. You only need internet the first time you load each model. After that, transcription works entirely offline.
Whisper's large model is around 3 GB and benefits enormously from a GPU or Apple Silicon's Neural Engine. On an older CPU-only laptop, plan for a long wait β the medium model at that point is usually the better compromise.
I'll start with a confession. The first time I tried Subtitle Edit for transcription, I quit after fifteen minutes and returned to Buzz. The interface looked like a Windows XP control panel that had been dragged forward by sheer force of will, and I couldn't figure out how to get Whisper running inside it. I assumed it was broken. It wasn't. I'd simply misjudged what kind of program I was looking at.
Subtitle Edit is not a transcription tool with caption editing added on. It's a caption editor with transcription added on. The distinction matters enormously for how you approach learning it.
There are essentially two workflows the app supports, and people who try Subtitle Edit generally fall into one of two camps based on what they came for.
Workflow A: you have an audio or video file and you want clean, correctly-timed captions. You import the media, you tell Subtitle Edit to run Whisper on it, you wait. You get a populated cue list. Then you spend twenty or thirty minutes cleaning it up: fixing punctuation, merging short cues, splitting long ones, retiming the parts where the model got confused. The output goes out as .srt or whatever else you need. This is the workflow professional captioners use.
Workflow B: you have a transcript already (from Buzz, from Whisper Transcription, from Otter, doesn't matter) and you want to fix it. You open the existing file, you bring in the audio so the waveform syncs with the cues, and you fix the obvious mistakes by listening and clicking. This is what I personally use it for, and I think it's the underrated use case. Even if your primary transcriber is something else entirely, Subtitle Edit makes a phenomenal "second tool".
I now run almost everything through Buzz first and then open the resulting .srt in Subtitle Edit for cleanup. It's not the workflow the developer intended, but it's faster than trying to do everything in one app, and the keyboard-driven editing in Subtitle Edit is genuinely better than anything else I've tried.
The interface is dense. There's no softer way to put it. Every pixel earns its place. On first launch you see a toolbar with maybe a hundred icons, a waveform panel occupying the bottom third of the screen, and a cue list between them. It's a lot.
Spend an afternoon with it and the density starts feeling like a feature. The reason every command has a keyboard shortcut is that captioners work at pace β they need to jump from a timestamp correction to a spell-check to an export without lifting their hands from the keyboard.
Whisper integration sits under Video β Audio to text (Whisper). From there you choose the engine β Subtitle Edit supports the original Python Whisper, whisper.cpp, Const-me's GPU implementation, Purfview's Whisper Faster, and a couple of others depending on which version you have installed. Each engine has its own strengths. On a Windows laptop without a GPU, Purfview's implementation gave me the best balance of speed and accuracy. On a machine with an NVIDIA card, Const-me's GPU build was faster than anything else by a wide margin.
This is the part nobody talks about, and the thing that makes Subtitle Edit irreplaceable in some workflows. The app reads and writes well over two hundred subtitle formats. If you've ever stared at a file with a strange extension and wondered how to convert it to .srt without losing timing or styling, Subtitle Edit is almost certainly the answer.
A partial list of what it handles:
If your transcription job ends with "and then we hand it to a broadcaster," Subtitle Edit may be the only free tool capable of producing a file that will actually pass.
If you have a video file whose subtitles are baked in as images (DVD/Blu-ray rips, some MKV files), Subtitle Edit's built-in OCR can extract them as editable text. Set the language correctly and clean up the output in the cue list. Faster than retyping.
Honest section. Subtitle Edit is not for everyone, and there are genuine friction points beyond the dense interface.
The platform story is uneven. The Windows version is a proper, signed, native application that has had two decades of polish. The Mac version is a newer port that runs natively on both Intel and Apple Silicon, but feels less mature β keyboard shortcuts that work flawlessly on Windows occasionally do nothing on Mac, certain dialogs appear off-screen, and waveform extraction sometimes fails on file types that work fine on the Windows build. On Linux, you're typically running it through Mono, which works but has its own assortment of papercuts. If you're not on Windows, expect rougher edges.
It's not a transcription-first app. If your goal is to get a clean .txt transcript and you don't care about timing, you'll find yourself fighting a UI that wants you to care about timing. You can absolutely use it for plain transcripts β just export to TXT after the cues are populated β but you'll spend a lot of attention on widgets you didn't need.
The translation features are uneven. There are translation integrations (Google, DeepL, libretranslate, ChatGPT API, others), but the quality varies and the UX of running them feels grafted on. For pure translation work, you're better off elsewhere.
The learning curve is real. Out of every tool we cover on this site, Subtitle Edit has the steepest first-week curve. Plan for it.
Because Subtitle Edit covers more than one workflow, a single step-by-step guide doesn't quite fit. Here are three focused ones instead.
Measured against the rest of the shortlist:
Compared to Buzz, Subtitle Edit is the heavy tool. Buzz is for "I have a recording, I want a transcript". Subtitle Edit is for "I have a recording, I want broadcast-ready captions, and I'm willing to spend an afternoon getting them right." Both are free; they're answers to different questions.
Compared to Whisper Transcription, Subtitle Edit is dramatically uglier and dramatically more capable. Whisper Transcription will get you a clean transcript faster on a Mac. Subtitle Edit will let you actually shape it.
Compared to Pyrenees, the comparison doesn't really hold β Pyrenees is a transcription engine optimized for speed, Subtitle Edit is an editing environment. They could even live alongside each other: Pyrenees produces, Subtitle Edit edits.
Compared to VoiceInk, they share no overlap at all. Different jobs.
Subtitle Edit is the answer once you've moved past the "can I get Whisper running at all?" stage and you're now asking "how do I make this output actually usable?" Most people will install Buzz first and discover Subtitle Edit a few months later β and that order is probably right. For translators, captioners, and anyone whose work involves the phrase "broadcast-safe," it's the most important free tool you can have.
Yes. It's released under the GNU General Public License v3 and can be used commercially without restriction. The one nuance: if you bundle and redistribute Subtitle Edit, you have to comply with the GPL. Just using it on commercial work is unrestricted.
It runs, with caveats. There's now a native macOS build supporting both Intel and Apple Silicon, and it's improving steadily β but the Windows version still has the most polish. On Linux you'll typically run it through Mono. If you need Subtitle Edit's full capability, plan on using Windows or a Windows VM.
For most Windows users without a GPU: Purfview's Whisper Faster build is the most reliable balance of speed and accuracy. With an NVIDIA GPU: Const-me's GPU implementation tends to be the fastest. On macOS: whisper.cpp through the bundled integration. The differences are smaller than the choice of model size, so don't agonize.
No. Subtitle Edit is strictly file-based. For live transcription or dictation, look elsewhere.
Not natively in any clean, automatic way. Whisper itself doesn't reliably perform diarization, and Subtitle Edit doesn't add a separate diarization step. If you need speaker labels, you'll do that work manually in the cue list, or run the audio through a separate diarization tool first.
Surprisingly good, for European languages and cleaner DVD subtitles in particular. For Blu-ray SUPs, accuracy tends to land at 90% or higher before corrections. For non-Latin scripts, results vary β Tesseract handles the heavy lifting, and you'll need the correct language pack installed.
This is one of the rare apps in this space where someone clearly designed it rather than simply shipped it. You can tell immediately. The icon doesn't look like a Python logo with a microphone grafted on. The window has the right corner radius. The settings panel uses the macOS sheet style that actually lets you find what you need. When you import a file, the app shows you metadata β sample rate, channels, duration β that most transcription tools simply ignore.
None of this changes the underlying transcription quality. Whisper is Whisper, regardless of which app calls it. So the question Whisper Transcription has to answer is: given that the model is the same, what does this app give me that the free options don't?
The honest answer, after two weeks of regular use: a collection of small things, none individually decisive, that together add up to "this is the app I'd hand to someone who doesn't want to think about it."
The core flow matches every other tool in this category. Drop in a file, select a model, press a button, receive text. Where Whisper Transcription sets itself apart is in the details.
The transcript view is interactive. Click a sentence, the audio jumps to that timestamp. Edit the sentence in place. Highlight a span and you get inline tools to merge cues, split them, change capitalization, mark a speaker. It's not Subtitle Edit's level of cue-editing power, but for working with prose-style transcripts, it's genuinely faster than re-opening your output in another app.
It can capture system audio, not just microphone. A small but uncommon feature. If you want to transcribe a YouTube video, a podcast you're listening to, or a Zoom call (with appropriate permissions), Whisper Transcription can pipe the system's audio output directly in. Most of the free alternatives only see the microphone.
Export is well thought through. SRT, VTT, plain text, and DOCX are all one click away. The DOCX export in particular is more polished than what you'll get from running Whisper through a script β it preserves paragraph breaks at sensible points, includes timestamps as headers if you want them, and doesn't dump everything into a single block of unreadable prose.
There's a menu-bar mode. If you click the menubar icon, a small palette appears that lets you start a recording, drop in a file, or pull up your recent transcripts without opening the main app. It's the kind of detail a tinkerer never builds and a designer always insists on.
I recorded a 12-minute podcast intro the same day a new model unlock went live. Imported the M4A. Transcription took 2 minutes 40 seconds with the medium model on an M2 MacBook Air. The interactive transcript caught two proper nouns I'd mispronounced, and clicking each one to hear the audio play back was β and I mean this β genuinely satisfying. No find function, no waveform scrubbing.
This is where we have to discuss money, because it's the main thing separating Whisper Transcription from the free alternatives.
The app is a free download from the Mac App Store. The free tier includes the smaller Whisper models β typically tiny and base β which are adequate for casual notes but noticeably weaker than what you'd want for professional work. Unlocking the larger models (medium, large, and various distilled variants) requires a one-time in-app purchase. Since pricing shifts over time and varies by region, check the App Store listing rather than relying on a figure from this review.
Worth noting: the pricing model is a one-time unlock, not a subscription. Pay once and the larger models are yours. No monthly fee, no per-minute charge, no credits. That alone makes it cheaper than most cloud-based transcription services if you transcribe more than a few hours per month.
Free Whisper exists. You can run it through Buzz or Pyrenees and get the same model output for nothing. So the question isn't "should I pay for transcription?" β it's "should I pay an indie Mac developer for a polished front-end?" If you transcribe regularly and value your time, yes. If you transcribe rarely or genuinely enjoy command-x-rule flags, no. Both answers are reasonable.
I want to be direct about the limitations here, because every "the polished one" review I've ever read tends to gloss over them.
Mac only. Obvious but worth saying. If you ever switch to Windows or Linux, your purchase doesn't follow you and your workflow doesn't follow you.
Less flexible than open-source alternatives. The app picks reasonable defaults and hides most of the tuning knobs. If you want to set custom Whisper parameters, run a fine-tuned model, or experiment with non-standard backends, you'll outgrow Whisper Transcription quickly. Buzz lets you switch backends; this doesn't.
Speed is good but not the best. On Apple Silicon, Pyrenees is faster β sometimes substantially faster β for the same model size. Whisper Transcription uses solid acceleration but isn't the speed champion of the field.
No deep subtitle editing. The interactive editor is a pleasure for prose, but it's not pretending to be Subtitle Edit. If your job involves cue-by-cue caption work, you'll still be exporting to .srt and finishing the job elsewhere.
App Store review constraints. Because it's distributed through the App Store, it lives inside Apple's sandbox rules. That has security upsides (the app can't quietly access files you didn't grant it access to) but the occasional UX papercut β for instance, you'll be re-asked for microphone permission after some macOS updates.
The workflow is shorter than for most tools we've reviewed. Here's the condensed version.
If you're going to do any serious cue editing, export to SRT and open it in Subtitle Edit. Whisper Transcription's editor is great for prose; it's not designed for the cue-by-cue work captioners do.
Quick reference points across the rest of the shortlist:
Versus Buzz: Buzz is free everywhere; Whisper Transcription is a paid Mac app. If you're disciplined enough to set up Buzz and don't mind its plain UI, you get the same transcription quality without spending anything. If you want it to feel like a Mac app and you transcribe regularly enough that the time savings matter, the purchase pays itself back.
Versus Pyrenees: Pyrenees is faster and free, but barer-bones. No interactive editor, no DOCX export, no system audio capture. If raw speed and zero cost are your priorities, Pyrenees. If polish is your priority, this.
Versus Subtitle Edit: Different category. Whisper Transcription is for getting transcripts; Subtitle Edit is for grooming captions. If you do both, you'll likely use both.
Versus VoiceInk: Different again. VoiceInk is for live dictation into other apps. Whisper Transcription is for files (with optional recording). They cover different problems.
For casual use β voice memos, meeting notes, short interviews you'll edit anyway β yes. The smaller models are more capable than you'd expect. For longer, professional work, the medium and large models are noticeably better, and the gap matters most when audio quality is uneven.
The hosted API is faster and defaults to the large model, but every minute transcribed is a minute of audio sent to OpenAI's servers at a per-minute charge. Whisper Transcription does everything on your Mac, charges nothing per minute, and keeps your audio local. For privacy-sensitive work, the answer is clear. For one-off use of large amounts of public-domain audio, the hosted API might be cheaper.
Yes. App Store purchases are tied to your Apple ID. Buy a new Mac, sign in with the same account, and your unlock carries over. Family Sharing configurations may extend access to family members as well.
No. The model runs on-device. The app needs internet only for the initial model download and App Store updates. If you've already downloaded the models, you can transcribe entirely offline.
In our testing, files of two to three hours worked without issue on M-series Macs with the medium model. Beyond that, you may occasionally hit memory warnings. Splitting very long recordings into segments is good practice regardless of which app you use.
The interactive editor lets you assign speaker labels to text spans manually, which works well for short interviews. There's no automatic diarization β if that's essential, you'll need a separate tool for it.
Not directly. Whisper Transcription works with the official Whisper model family and certain distilled variants. If you need a custom or domain-adapted model, a more flexible tool like Buzz or a command-x-rule setup is the right path.
Pyrenees is a free macOS transcription app for Apple Silicon Macs. It's built around MLX, Apple's open-source machine-learning framework released in late 2023. MLX is designed specifically for the unified-memory architecture of M-series chips β it runs models on the GPU and Neural Engine without copying tensors back and forth across separate VRAM and system RAM the way frameworks designed for NVIDIA cards have to. For models like Whisper, that translates into noticeably faster inference than running the same model through plain PyTorch or even whisper.cpp's Core ML path.
The app itself is small, quiet, and does essentially nothing except transcription. Import a file, choose a model, receive a transcript. What makes it worth a dedicated review is how it performs that one function on Apple Silicon hardware.
If you can't find Pyrenees in the App Store, that's because it isn't there β it's distributed directly as a notarized .dmg. You're meant to download it, accept the security prompt once, and run it. This is normal for indie Mac apps; it's not a sign that anything's wrong.
The standard caveats apply. Speed comparisons across transcription apps are highly hardware-dependent, and any specific numbers will be outdated by the time you read this. With that in mind, here's what we observed.
On every Apple Silicon Mac we tested β an M1 MacBook Air, an M2 MacBook Air, and a Mac Studio with M2 Max β Pyrenees ran the same transcription jobs faster than any other tool in our comparison. Not by a small margin.
The qualitative difference is more interesting than the raw numbers. Where transcription on the same hardware in Buzz used to feel like "start it and go make tea," Pyrenees on the same machine feels like "start it and wait a moment."
Pyrenees doesn't feel faster the way an upgraded computer feels faster. It feels faster the way switching from email-based file sharing to AirDrop feels faster β a category shift, not a speed boost.
Most Mac transcription apps fall into one of two technical camps. They either ship the original PyTorch Whisper implementation with whatever GPU acceleration they can scrape together, or they bundle whisper.cpp, the C++ port that Georgi Gerganov maintains. Both are perfectly good options.
Pyrenees sits in a third camp. It uses MLX-converted Whisper weights and runs them through the MLX runtime. Because MLX is built specifically for Apple Silicon's unified memory architecture, it can keep the model and audio in the same memory pool the GPU and CPU share β which is why the speed gap is so pronounced.
The practical consequences:
If you're on a Mac with 8 GB of unified memory, start with the 4-bit medium model. The quality sits closer to the full medium than you'd expect from quantization, and the speed is excellent.
This is the section where the case for a different tool gets made. Pyrenees is deliberate about its scope, and the things it won't do are worth knowing before you commit to it.
It doesn't run on Intel Macs. Apple Silicon only. If you're still on a 2019 16-inch MacBook Pro, this app is not for you, and you'll have to wait for that next upgrade or fall back on Buzz.
It doesn't run on Windows or Linux. Obvious from the platform note, but worth saying if you're considering setting up a multi-OS workflow.
It doesn't have an interactive transcript editor. The output is a finished transcript. You can fix obvious typos in the export, but there's no click-to-play, no inline cue editing, no speaker labels.
It doesn't capture system audio. Microphone input works for ad-hoc recording, but it can't pull audio from another app the way Whisper Transcription can.
It doesn't do dictation. File-based only. For dictation into other apps, you want VoiceInk.
It doesn't have professional subtitle exports. SRT, VTT, and TXT cover the common cases. If you need TTML, EBU-STL, or any of the broadcast formats, you'll need to take the SRT into Subtitle Edit for conversion.
This is the shortest how-to of any tool we cover, because the app genuinely is that simple.
Here's how I use it personally. I record voice memos for note-taking on my iPhone, AirDrop them to my MacBook Air at the end of the day, drop them into Pyrenees, and have text in the time it takes to open my notes app.
Versus Buzz: Pyrenees is faster and prettier. Buzz is more flexible (Linux, Windows, multiple backends, batch queueing, OpenAI API support). For Mac-only users who don't need Buzz's flexibility, Pyrenees wins. For anyone who works across platforms or needs the optionality, Buzz still has a place.
Versus Whisper Transcription: Pyrenees is faster and free; Whisper Transcription is more polished and has features (interactive editor, system audio, DOCX export) that Pyrenees doesn't. It's a real tradeoff. Try Pyrenees first since it costs nothing β if you need the extra features after a week of use, Whisper Transcription's purchase makes sense.
Versus Subtitle Edit: Different jobs. Pyrenees produces, Subtitle Edit edits. The natural workflow is to use both.
Versus VoiceInk: Different jobs again. Pyrenees is for files, VoiceInk is for live dictation.
No. Pyrenees requires an Apple Silicon chip (M1 or later). MLX is designed around that architecture and won't run on Intel hardware.
It's free. No paid tier, no subscription, no premium model lock. Some niche features may be donation-encouraged, but the core functionality is unrestricted.
Transcription is entirely local. Pyrenees has no "send to server" mode at all, which is one of its clear advantages for sensitive audio.
Sometimes, but not reliably. Pyrenees uses MLX-format weights, which are different from the .bin files whisper.cpp uses or the .pt checkpoints from PyTorch. You can re-download them through Pyrenees; the storage cost is the same.
Apple Silicon Macs at the lower end can run the tiny and base models without issue. The medium model in 4-bit form typically works on 8 GB machines. The full large model on 8 GB is a stretch; the quantized variant is the better choice there.
It supports recording from the microphone and transcribing what you record, but not in true real-time the way iOS dictation works. For low-latency dictation, look at VoiceInk.
App Store rules around bundling models and how apps use on-device acceleration can be restrictive for ML-heavy software. Distribution outside the App Store gives the developer more flexibility. The app is open source, so you can inspect it.
Apple has bundled dictation as a macOS system feature for over a decade. Press a hotkey, speak into any text field, and it works. It's been there long enough that most people have forgotten about it. VoiceInk exists because the built-in version has some persistent limitations β it routes your voice to Apple's servers (or did; recent builds can run on-device for English on Apple Silicon, though the implementation remains opaque), doesn't handle technical vocabulary well, and offers essentially zero customization.
VoiceInk replaces that with something more capable. It's an open-source app that runs Whisper locally and maps a hotkey to dictation. Your text appears wherever the cursor is. The model stays on your machine. The customization is yours. It's free, the code is on GitHub, and after a few days with it, going back to system dictation feels like an unnecessary step backward.
This is the only review where the gesture itself matters more than the underlying technology, so it's worth describing carefully.
You set a hotkey in VoiceInk's preferences β let's say fn, the function key, since it's already on your keyboard and it's not bound to anything most people use regularly. From then on, you can be in any application β a browser, a terminal, an email client, a code editor β and:
That's the full interaction. Hold, speak, release. Once it becomes muscle memory, it changes how you handle a lot of short writing tasks β Slack replies, email responses, code comments, search queries. I've watched people try it for the first time and stop reaching for their keyboard for short messages within an afternoon.
The first week I used VoiceInk, I didn't enjoy it. Talking to my computer felt strange, and I was rewriting dictated text more often than I rewrote typed text. Then around ten days in, the rewrites stopped β partly because I'd learned to think before speaking, and partly because the model handled my voice better the longer I'd been using it. By week two I was reaching for the hotkey for any message longer than a sentence. Don't judge VoiceInk on day one.
Like the other tools on this site, VoiceInk runs Whisper locally. It supports multiple model sizes, and the choice of model is a speed-versus-quality tradeoff that matters more for dictation than for file transcription. With dictation, you don't want to wait fifteen seconds for text to appear β you want it now. So most VoiceInk users settle on the base or small model, which is fast enough to feel responsive.
That tradeoff has a downside. Smaller Whisper models are noticeably less accurate than the large variants, especially with proper nouns, technical vocabulary, or non-mainstream accents. VoiceInk has a vocabulary feature where you can teach it specific words β names of colleagues, project names, technical terms β and that helps considerably. But the accuracy gap is real, and you should expect light corrections after dictating anything important.
VoiceInk benefits significantly from Apple Silicon's Neural Engine. On M1 or later, the small model is fast enough to feel essentially instant, and the medium model is usable. On Intel Macs, you're limited to smaller models if you want responsiveness, and the experience suffers noticeably.
I want to be specific about this rather than general, because vague reviews don't help when you're deciding whether to install something.
Quick replies. Slack messages, email responses, "yeah looks good", "let me get back to you tomorrow on that." Things that take three seconds to say and ten seconds to type. The hotkey workflow shaves real time off a real day.
First drafts. Talking out a paragraph and then editing it on the keyboard is genuinely faster than writing the paragraph from scratch for many people. VoiceInk fits this workflow especially well because the text lands directly in your editor of choice β no copy-paste step.
Note-taking. Quick thought, capture it before it goes away. The hotkey-anywhere model means the friction of "where is my notes app, where is the cursor, what was I about to say" disappears.
Code comments and commit messages. Anywhere a thought is more important than its phrasing. The fact that you can be in your terminal or your editor makes this work without breaking flow.
Accessibility. For people with RSI, hand injuries, or other reasons keyboard input is painful, a fast on-device dictation tool is genuinely valuable. VoiceInk's open-source nature also means you can audit it for any concerns about where your voice goes.
Long-form writing. An hour of dictation is exhausting in a way an hour of typing isn't. People who try to dictate entire essays usually go back to keyboards within a week.
Anything you don't want misheard. If accuracy is critical β a legal document, an academic citation, a medical reference β dictation will let you down at small but inconvenient frequency. Always read it back.
File transcription. Said it once, saying it again. VoiceInk doesn't accept input audio files. It records from your microphone, processes it, and types the result. If you have a file, use Buzz or Pyrenees.
Multi-speaker situations. The mic captures whoever is loudest. A meeting recording is the wrong input for VoiceInk.
Hotkey: fn. Model: medium (M1 MacBook Pro, 16 GB). Vocabulary: about thirty entries β names of people I message often, two project codenames, three technical terms my model kept misspelling. Result: I dictate maybe twenty short messages a day and rewrite about one in twenty.
VoiceInk is genuinely orthogonal to every other tool on this site. The other four answer "I have audio, give me text." VoiceInk answers "I want to talk, give me text in whatever I'm typing in." They don't compete.
If you've installed VoiceInk and you find yourself wanting to transcribe a meeting recording, you're using the wrong app β go install Buzz or Pyrenees for that. Conversely, if you've installed Buzz and you find yourself thinking "I wish I could dictate into Slack the same way," go install VoiceInk. It's normal to have both.
Better in the ways that matter to people who care about dictation: customization, model choice, predictable behavior, and the ability to audit where your voice goes.
Any app that accepts text input, yes. It works by simulating keystrokes after transcription, which is why it needs accessibility permissions. Apps that use non-standard text fields may occasionally not respond β this is rare, but it happens.
Yes, but the practical experience is much weaker than on Apple Silicon. Smaller models are usable but accuracy suffers; the medium model is likely too slow to be practical for most Intel machines.
No. That's not what it does. Use Buzz, Pyrenees, or Whisper Transcription for files.
No. Transcription is fully on-device. This is part of the appeal β a privacy-respecting alternative to system dictation in cases where you can't or don't want audio leaving your machine.
Whisper supports many languages, and VoiceInk inherits that. Major European languages, Mandarin, Japanese, and others tend to work well. Smaller languages may be noticeably less reliable, and for dictation specifically β where the base or small model is usually running β accuracy will be lower than what you'd get with the large model on a file.
No. Whisper is not a personalized model β it doesn't adapt to individual speakers. The way you "improve" it is by filling in the custom vocabulary list with words it consistently gets wrong. That's the main lever.
WhisperDesktop is a small, independent website that reviews desktop software for converting audio into text. We're not venture-backed, not affiliated with any developer, and not part of a larger media company. It's two people writing about software they actually use.
Search for "best transcription app" in 2026 and you'll encounter dozens of articles that essentially recycle the same five recommendations in the same format. Some of them are useful. Many of them were written by someone who watched a YouTube video rather than used the software.
We wanted to build something different: actually install the software, actually run files through it, and try to honestly communicate what it's like in practice β including the parts that are annoying.
For every app we review, we install it ourselves on a real machine (or several, where platform support varies). We use the same set of test recordings across every tool so comparisons are grounded:
The same files across every app. We note the time taken, the obvious accuracy issues, and the friction points along the way. We try to write in terms of trade-offs rather than scores, because software doesn't behave the same way for every person.
We're a small team β currently two people β who got tired of "Top 10" articles and decided to do something different. We're not professional reviewers. We're people who use this software regularly and noticed a gap in genuinely useful information about it.
We don't think our names matter very much for the credibility of what we write. Anyone who reads enough reviews can tell whether the writer has actually used the software, and the reviews on this site try hard to make that obvious. If you'd like to contact us, the contact page has an email address that's read by an actual person.
The site is small enough to run on a modest budget. We pay for the domain and a basic hosting plan β that's most of it. To offset those costs, some links on this site are affiliate links.
Two things to understand about that:
If we ever publish a sponsored review, it will be labeled clearly at the top of the article. We haven't published any sponsored content to date.
A few things we want to be explicit about, because the rest of the internet sometimes blurs them:
If you find a factual error, please tell us. If a feature has changed since we wrote a review, please tell us. If you think we missed something important about an app, please tell us. The address is on the contact page and we read everything that comes in, even if we don't always have the bandwidth to reply.
Beyond that β we hope something here is useful. There are hundreds of tools in this space and the differences between them are genuinely meaningful. We try to make those differences legible.
This site is an independent informational resource and is not affiliated with any of the official services covered (Buzz, Subtitle Edit, Whisper Transcription, Pyrenees, VoiceInk).
Found an error in one of our reviews, want to suggest an app we haven't covered, or just have a question about something you read here? Write to us.
contact@WhisperDesktop
A real person reads everything that arrives in this inbox. We aim to respond within a week, though sometimes that stretches longer.
The more specific your message, the more useful our response can be. A few things we appreciate when relevant:
The site is run by a small team in our spare time. There's no ticketing system, no help desk. In practice:
If you've written and haven't heard back within two weeks, feel free to send a follow-up β the original may have landed in spam or gotten buried.
A few things to save everyone's time:
Anything you send us stays with us. We don't sell email addresses, we don't pass them to advertisers, we don't sign people up to newsletters they didn't ask for. If you want your message deleted from our inbox after we've read it, just say so and we'll delete it. The full details are in our privacy policy.
This site is an independent informational resource and is not affiliated with any of the official services covered (Buzz, Subtitle Edit, Whisper Transcription, Pyrenees, VoiceInk).
This policy explains what information WhisperDesktop collects when you visit, why it's collected, and what happens with it. We've tried to write it in plain English.
We don't run ads, we don't sell data, and we don't track you across the web. We collect only the standard server-side information needed to keep the site running and understand which content is useful.
This site is operated by the small editorial team behind WhisperDesktop. For privacy-related questions or requests, you can reach us at contact@WhisperDesktop.
Like virtually every website, our hosting provider maintains standard server logs. When your browser requests a page from us, those logs capture:
This data is part of how the web works β it's not something we ask for or collect deliberately. We use it to keep the site running, spot errors, and defend against abuse.
We may use a privacy-respecting analytics service to understand which pages are read and which aren't. Any service we use must meet a minimum standard:
Examples of services that meet this standard include Plausible and similar self-hostable tools. If we ever move to a service that uses cookies or broader tracking, we'll update this page.
If you write to us at contact@WhisperDesktop, we receive your email address, your name (if your client sends one), and the content of your message. We use it to reply. We don't add you to any list or share your details with third parties.
The site itself does not set tracking cookies. Some technical cookies may be set by our hosting platform for purposes like load balancing. These are strictly operational.
We try to keep external dependencies minimal, but a few are unavoidable:
Some links on this site are affiliate links. When you click one and subsequently make a purchase from the linked seller, we may receive a small commission at no additional cost to you. Affiliate links are labeled wherever they appear.
We don't control what affiliate networks or destination sites do with that information. We do clearly identify links that are affiliate links β see our about page for more on this. You can choose not to click those links, and you'll lose nothing on our end if you'd rather visit those products directly.
This site is not directed at children under 13, and we don't knowingly collect personal information from children. If you're a parent and believe your child has provided personal information to this site, please contact us and we'll address it promptly.
Depending on your location, you may have legal rights regarding personal data we hold. These typically include the right to:
In practice, the personal information we hold about most readers is simply "an email you sent us." If you want it deleted, write to us and we'll take care of it.
We take reasonable steps to protect the information we hold β using HTTPS site-wide, keeping software updated, and limiting the number of people with access to any data we store.
If we change anything material in this policy, we'll update the "Last updated" date at the top and, for significant changes, post a notice on the site for a reasonable period.
For privacy questions or requests, write to contact@WhisperDesktop with "Privacy" in the subject x-rule. We'll get back to you as quickly as we can.
This site is an independent informational resource and is not affiliated with any of the official services covered (Buzz, Subtitle Edit, Whisper Transcription, Pyrenees, VoiceInk). All trademarks and product names are the property of their respective owners.
By visiting WhisperDesktop, you accept the terms below. They're intended to be straightforward rather than adversarial β but please read them so we're on the same page.
WhisperDesktop is an independent informational website whose purpose is to help readers compare and choose desktop software for converting audio to text. We are not the developer, publisher, or official representative of any app reviewed here. References to third-party products (Buzz, Subtitle Edit, Whisper Transcription, Pyrenees, VoiceInk, or others) are made as outside reviewers. Their official sources remain their respective developers.
Everything on this site is provided for general informational purposes. We try to be accurate. We test the software ourselves. But software changes β sometimes frequently β and a feature working as described at time of writing may behave differently by the time you read it. Always consult the official documentation of the relevant app before relying on it for anything important.
Nothing on this site constitutes professional, legal, financial, or technical advice. If a transcription is going into a courtroom, a medical record, or any other situation with meaningful consequences for accuracy, please verify the output yourself or have a qualified professional do so.
This site and all content on it are provided "as is" and "as available," without any warranty of any kind, express or implied. We make no representations about the accuracy, reliability, completeness, or timeliness of the information presented. To the fullest extent permitted by applicable law, we disclaim all warranties, including implied warranties of merchantability, fitness for a particular purpose, and non-infringement.
To the maximum extent allowed by law, WhisperDesktop and the people who run it will not be liable for any direct, indirect, incidental, consequential, special, or punitive damages arising from your use of the site, including any decision made based on content you read here, any loss of data, or any issues caused by software you installed after reading about it on this site.
If you live somewhere that doesn't allow some of these limitations, the provisions that are unenforceable in your jurisdiction simply don't apply, and the remainder still does.
All product names, logos, and brands referenced on this site are the property of their respective owners. Mention of a product or company name does not imply endorsement, sponsorship, or partnership unless explicitly stated. We use these names solely to identify the products we're writing about β what's sometimes called nominative fair use.
Screenshots of third-party software, where included, are used for commentary, criticism, and review. If you are the owner of a product we cover and believe we've represented something inaccurately, please write to us at contact@WhisperDesktop and we'll review the concern.
Some links on this site may be affiliate links. If you click such a link and subsequently make a purchase, we may receive a small commission from the linked seller, at no additional cost to you. This is disclosed wherever it applies. Affiliate relationships do not influence our editorial judgment about what we cover or how we cover it. See our about page for the longer version.
This site contains links to external websites we don't control. We include them because they're useful β for example, links to the developers of apps we review. We are not responsible for the content, privacy practices, or availability of any external site. Following an external link is at your own discretion.
The original written content on this site β articles, reviews, comparisons, and the way they're worded β is the work of the team behind WhisperDesktop and is protected by copyright. You're welcome to:
What we'd prefer you not do:
If you'd like to do something not covered here β translate an article, syndicate a piece, use a passage in a publication β just write to us. We're usually willing to work something out.
Please don't:
If we identify abusive activity, we may block the responsible IP addresses or take other reasonable measures.
We may update these terms periodically. When we do, we'll change the "Last updated" date at the top. For significant changes, we'll post a notice on the site for a reasonable period. Continued use of the site following a change indicates acceptance of the updated terms; if you don't agree, you're free to stop using the site.
If any provision of these terms is found unenforceable in your jurisdiction, the remainder continues to apply. The unenforceable provision is treated as removed, narrowed to the extent needed to make it enforceable, or replaced with a comparable provision that is enforceable.
For questions about these terms, write to contact@WhisperDesktop with "Terms" in the subject x-rule.
This site is an independent informational resource and is not affiliated with any of the official services covered (Buzz, Subtitle Edit, Whisper Transcription, Pyrenees, VoiceInk). All trademarks and product names are the property of their respective owners.