Just Talk to It · Jay Shah

I stopped typing to my LLMs. Now I talk to them. I open the agent, fire up Superwhisper, and just say what I want for a minute or two straight. The quality of what comes back went up across the board: more accuracy, more clarity, less back-and-forth. Here’s why.

Talking removes the friction of giving context

Talking lets me cover far more context than typing in the same amount of time. It’s about three times faster: a Stanford and University of Washington study clocked English speech at 153 words per minute against 52 on a keyboard. But speed isn’t the point on its own. It’s that the friction is gone. When I type, I compress: I trim the backstory, drop half the constraints, and skip the “here’s what I already tried,” because writing it all out is tedious. The model gets a thin version of what’s in my head. When I talk, I say all of it, because saying all of it costs nothing, and a model with the full picture gives a better answer.

More relevant context, better output

LLMs reward context. Every prompting guide from the major labs says the same thing: give the model the details it needs instead of assuming it can infer them. Two well-studied techniques are really just specific kinds of “more context,” and both are far easier to do out loud.

Few-shot prompting means showing the model a couple of examples of what you want before asking. It works well, but typing out examples is a chore, so I usually skip them. Talking, I throw in an example without thinking about it: “I want it formatted like when I said X earlier.”

Chain-of-thought means walking through the reasoning step by step instead of jumping to the answer, which sharply improves results on hard, multi-step problems. When I talk, I narrate exactly that: here’s how I’d approach this, here’s the case I’m worried about, here’s what I’d check first.

Both hand the model the examples and the reasoning it does better with, and talking is what makes giving them effortless.

There’s a second reason talking works. These models are trained on a vast corpus of human language, so natural, conversational phrasing is exactly what they parse best. Typing pushes me toward terse, robotic keyword fragments. Talking makes me explain things the way I would to a person, which is far closer to the language the model actually learned from.

The catch: relevant, not just more

More words is not the goal. More relevant context is. Stuffing a prompt with everything degrades output: models get “lost in the middle” of long inputs, recall decays as context grows, and irrelevant detail distracts them. So this isn’t a license to talk endlessly. But the apps help here too: they don’t just transcribe, they clean up the ramble, drop the “ums” and false starts, and turn what I said into concise text before it reaches the model. I get the ease of talking without dumping a messy stream of speech into the prompt.

Try it

When Andrej Karpathy coined “vibe coding,” one detail was that he talks to his editor and “barely even touches the keyboard.” A handful of tools make this trivial: Wispr Flow, Aqua, and Superwhisper, which I use mostly for the lifetime license instead of yet another subscription. Setup takes five minutes.

Then say the whole thought out loud, the way you’d explain it to a colleague, and watch what comes back. For me it’s been one of the higher-leverage changes to how I work with these tools all year.

// END