xAI shipped grok-voice-think-fast-1.0 on April 23, 2026, and the headline number is hard to ignore: 67.3% on τ-voice Bench. The next-best model on the same leaderboard is Gemini 3.1 Flash Live at 43.8%. That is a 23-point gap. In a benchmark race that usually moves in two-to-five-point increments, this is the kind of jump that resets the field.
If you build voice agents for support, sales, or any high-volume phone workflow, this is the moment where cost-per-minute, tool-use reliability, and latency all changed at the same time.
The Leaderboard at a Glance
τ-voice is the voice extension of Sierra's τ-bench. It evaluates full-duplex voice agents under realistic conditions: noise, accents, interruptions, turn-taking, and tool orchestration. Here are the headline scores xAI published with the launch:
| Model | τ-voice overall | Retail | Airline | Telecom |
|---|---|---|---|---|
| Grok Voice Think Fast 1.0 | 67.3% | 62.3% | 66.0% | 73.7% |
| Gemini 3.1 Flash Live | 43.8% | 45.6% | 64.0% | 40.4% |
| Grok Voice Fast 1.0 | 38.3% | 44.7% | 40.0% | 21.9% |
| GPT Realtime 1.5 | 35.3% | 38.6% | 36.0% | 21.1% |
The most interesting cell in that table is Telecom: 73.7% vs 40.4%. Telecom is the brutal column because it is where tool-calling, plan changes, billing disputes, and hardware troubleshooting collide with bad audio quality. It is where you find out whether the model can actually orchestrate work under pressure or just sounds like it can.
A 33-point gap there is not a tuning win. That is an architecture win.
What "Think Fast" Actually Means
The name is not marketing fluff. It refers to a real architectural choice: background reasoning runs in parallel with speech generation, so the model can think through a multi-step query without making the user sit through dead air.
Traditional voice models force you to pick one of two failure modes:
- ●Snappy and dumb. Reply fast, skip the reasoning. Sounds smooth, gets things wrong with confidence.
- ●Smart and laggy. Reply with proper reasoning, but the user hears two to four seconds of silence first. Feels broken.
xAI's claim is that Grok Voice Think Fast keeps the snappiness while doing the reasoning off-camera. Their illustrative example in the announcement is a classic LLM gotcha:
Q: Which months of the year are spelled with the letter X?
>
Other voice models: "Only one month is spelled with the letter X. It's February."
>
Grok Voice Think Fast 1.0: "None of the months are spelled with the letter X."
Voice models confidently hallucinating the wrong answer because they were optimized to sound fast is the bug Think Fast was built to kill.
Pricing: $0.05 Per Minute
This is where the announcement stops being a benchmark story and starts being an economics story:
- ●Conversation rate: $0.05 per minute
- ●Tool calls: roughly $0.005 each
- ●Worked example from xAI: a 10-minute support call with 20 tool calls totals $0.60
Get the Weekly IT + AI Roundup
What changed this week in NinjaOne, ServiceNow, CrowdStrike, and AI. One email, every Monday.
No spam, unsubscribe anytime. Privacy Policy
For reference, OpenAI's older GPT Realtime API ran roughly $0.60 to $0.90 per minute depending on the input/output mix. Grok Voice landed at around 12x to 18x cheaper for the same workload at launch.
That math changes who can afford to ship voice agents. At the older Realtime prices, you needed a $50-plus ticket value to break even on a 10-minute call. At Grok Voice prices, you can put a voice agent on a $9 per month SaaS support flow and still come out ahead.
The Starlink Production Case
xAI ran this model in production at Starlink's customer line (+1 (888) GO STARLINK) before announcing it publicly. The numbers they shared:
- ●20% conversion rate on sales calls. One in five callers buys Starlink service while still on the phone.
- ●70% autonomous resolution rate for support cases, with no human in the loop.
- ●28 distinct tools orchestrated by the same agent across hundreds of workflows: hardware troubleshooting, replacement orders, service credits.
- ●25+ languages supported natively for global rollout.
The 70% autonomous resolution figure is the one to focus on. This is not a contained-domain benchmark, it is a real call center where the agent issues hardware and grants credits without escalation. That is a high-trust deployment, and Starlink is not a forgiving environment to be wrong in.
What This Changes for Your Voice Stack
Three practical questions, depending on where you are today:
1. Already on GPT Realtime or Gemini Live?
Switching to Grok Voice probably cuts your per-minute cost significantly while giving you a measurably smarter agent. The migration tax is API-surface differences and prompt tuning. For high-volume deployments the payback is days, not months.
2. Building from scratch?
Default to Grok Voice Think Fast unless you have a specific reason not to. The benchmark gap and the price gap both point the same direction.
3. On a voice-infra layer like Vapi, Retell, or Bland?
Watch for those platforms to add grok-voice-think-fast-1.0 as a model option. Their value is the orchestration, not the model, so the model they should expose is the one that wins on both price and quality. If they drag their feet, that tells you something.
Honest Caveats
Three things worth keeping in mind before you build a roadmap around this:
- ●The benchmark numbers come from xAI's announcement. τ-voice is Sierra's benchmark and it is a respected one, but xAI is the party publishing these specific scores. Independent reproduction across more domains (healthcare, legal, education) has not happened yet.
- ●Background reasoning is not free. Parallel reasoning still consumes compute. If xAI repositions or reprices later, the economics shift. The $0.05/min number is the launch price, not a contract.
- ●Voice models still hallucinate. A 67.3% benchmark means roughly one in three queries goes wrong. For high-stakes flows (medical, legal, financial), you still need guardrails, confirmation steps, and a human escalation path. The Starlink deployment includes all of those.
Bottom Line
Grok Voice Think Fast 1.0 is the first voice model where price, latency, and quality all line up at once. If you have been waiting for voice agents to get good enough and cheap enough to put in front of real customers, this is the moment.
The voice-agent race was not supposed to reset this fast. xAI did it anyway.