Parallax Visual Explainers · Est. 2026
Issue 18 · Live JUN 04, 2026
← cd ../tech
parallax/tech Build Log · № 18
main@c0e8a1v0.18.0 JUN 04, 2026Published 7 minRead
LLM PRICING

The Bill Came Due in April

AI keeps getting cheaper per token, and that is exactly why the bill goes up. The price per unit fell, the units per task exploded, and Uber burned its whole 2026 AI budget in four months.

What you need to know

Every headline says artificial intelligence is getting cheaper, and the price of running it has fallen sharply for three years. But the newest coding tools do not chat, they work: they read whole codebases, loop, and reason, burning far more of the billable units called tokens per task. So the price per unit fell while the bill went up. Here is how the math broke.

— 01
THE SCISSOR

Two blades, closing.

Everyone agrees on the sentence: AI is getting cheaper. Look closer and it is true at the unit level and false at the invoice level, and that gap is the whole story. A scissor has two blades. Watch both.

The blade everyone watches
Price per token keeps falling.
Frontier capability that cost dollars per million tokens in 2023 now costs cents. Claude 3 Haiku launched at $0.25 / $1.25 per million input/output tokens; today's small models sit lower still — GPT-5-mini at $0.05 / $0.40, Grok 4 Fast at $0.20 / $0.50. The rate card has fallen more than an order of magnitude in three years. The corollary feels obvious: an AI budget should be shrinking.
The blade that cuts the budget
Tokens consumed per task climb faster.
The tools stopped being things you talk to and became things that work for you. A coding agent re-reads the whole codebase on every step, calls tools, and runs reasoning loops, so one task can burn well over a million tokens where a chat reply burns a few thousand. Multiply a unit price that halves each year by a unit count that climbs an order of magnitude, and the product, the actual invoice, goes up. Uber budgeted 2026 against last year's cheaper-per-token mental model and exhausted the entire annual figure in four months.
— 02
THE CROSSOVER

The month the chatbot started working.

Here's the thing the bill doesn't tell you: nobody raised prices. The token bill climbed because, around November 2025, the software quietly changed jobs.

For most of 2025, OpenAI and Anthropic were running what Simon Willison sums up as "Reinforcement Learning from Verifiable Rewards to increase the quality of code written by their models." Basically: teaching the models to write code that passes the tests. The payoff landed in a single fortnight. GPT-5.1 Codex Max shipped on 19 November 2025, Claude Opus 4.5 on 24 November, and paired with their coding harnesses, Willison dates this to the moment coding agents went "from often-work to mostly-work, crossing a quality barrier where you could use them as a daily-driver to get real work done."

A chatbot answers and stops. An agent keeps going. It reads the repository, plans, edits, runs the tests, reads the failures, and tries again. And here's the part that costs money: to hold its place, it replays the entire conversation back to the model on every single step. Willison's own guide puts the mechanics plainly. Coding agents "maintain state by replaying entire conversations with each new prompt," so "as a conversation gets longer, each prompt becomes more expensive since the number of input tokens grows every time." The work that used to be a question is now a loop, and the loop is metered.

And that's the part the cheaper-per-token headlines skip right over. The price of generating a line of code fell to almost nothing. The cost of getting an agent to deliver good code, across a long autonomous run, didn't fall with it. The tools the labs finally found a market for, Willison writes, are exactly the ones that "burn vastly more tokens, but are also quickly becoming daily drivers for the work carried out by extremely well-compensated professionals." The capability showed up, and the consumption showed up welded to it.

Around November 2025, coding agents crossed from often-work to mostly-work and became daily drivers for highly paid engineers. The shift from a tool you talk to into a worker that runs for you is what changed the token math, because an agent that works autonomously burns vastly more tokens than a chat that answers a question.

— 03
TOKENS PER TASK

Why more tokens beats cheaper tokens.

The other blade, plotted. One of Willison's GPT-5 Codex tasks consumed 169,818 input, 17,112 output, and 1,176,320 cached tokens: well over a million for a single task, and the only verified point on this chart, sitting at the top. The vertical axis is logarithmic because that anchor sits three orders of magnitude above the bottom. The lower points trace the shape from a single chat turn upward and are illustrative of task class, not measured.

Tokens per task · chat turn → agentic session · log scale tokens consumed (one task) · log
3,192 2e4 8e4 4e5 2e6 0.68 1.84 3 4.16 5.32 one chat reply* short Q&A thread* single-file edit* multi-step refactor* GPT-5 Codex task (measured) task autonomy → tokens consumed (one task) (log)
Source · Top point: real GPT-5 Codex task (Willison, llm-pricing). Lower points illustrative of task class, not measured.
— 04
HOW THE BILL CAME DUE

The chronology is the argument.

Read top to bottom, the sequence explains itself. The tools got good, so the pricing flipped to metered usage; once usage was metered, the frontier price started rising; and an enterprise budget set against the old mental model ran out a third of the way into the year.

Aug 7 2025
GPT-5 sets the frontier price.
Ships at $1.25 input / $10 output per million tokens — the price point the coming agent wave is built on.
Nov 24 2025
The November inflection.
GPT-5.1 Codex Max (Nov 19) and Claude Opus 4.5 (Nov 24), plus their harnesses, take coding agents from often-work to mostly-work. The daily-driver era begins.
Apr 2 2026
OpenAI moves Codex to token metering.
Codex pricing realigned "to align with API token usage, instead of per-message pricing." The flat-rate mental model is retired.
Apr 14 2026
Anthropic meters the enterprise seat.
Enterprise terms change to roughly $20 per seat per month plus API pricing. Usage is now billed, not bundled.
Apr 16 2026
Opus 4.7 ships — same rate card, more tokens.
At an unchanged $5 / $25 price card, the tokenizer counts more tokens for identical input. Effective cost rises about 1.4× versus Opus 4.6.
Apr 23 2026
GPT-5.5 ships at 2× the price of GPT-5.4.
At the frontier, the per-token price is now rising, not falling. The cheaper-AI corollary fails exactly where the heaviest work happens.
~Apr 2026
Uber's 2026 AI budget is exhausted.
Four months into the year, the entire annual figure is gone, set against a cheaper-per-token reality that no longer exists.
Jun 2 2026
Uber caps every engineer at $1,500 / month per tool.
Bloomberg reports separate monthly token budgets per agentic tool. Willison calls the cap a rational policy response to over-spending.
— 05
THE REAL PRICE OF A SESSION

Sticker price versus what it actually rang up.

The rate card is one number; the realised cost is another. Over thirty days, at standard interface rates, Willison ran up more than two thousand dollars of tokens. He paid two hundred. That gap is what keeps the true cost off the individual's own statement, which is precisely why finance is the last to find out. The dashed line marks what he actually paid.

One heavy user · 30-day token cost at API rates vs. subscription price paidUSD · 30 days
what he actually paid (subscriptions) · 200
Tokens billed at API rates (total) 2180.16
Anthropic Claude Code 1199.79
OpenAI Codex 980.37
Actually paid (Max + Pro) 200
Source · Simon Willison, 27 May 2026 — self-reported 30-day usage
— 06
$1,500 / MONTH

The first institution to hit the wall.

Uber is the first named company to run into the scissor in public. The cap is a ceiling, drawn around a line item that, left uncapped, ate a year's budget by April. The figures, set side by side.

UBER · AI CODING SPEND · 2026
$1,500 /mo
Per engineer, per agentic tool
Separate budgets per tool. Applies to Cursor and Claude Code — agents, not chat assistants.
4 months
To exhaust the entire 2026 AI budget
Set against last year's cheaper-per-token, lighter-usage reality.
~11 %
Of a median engineer's total comp
Two tools ≈ $36,000/yr against ~$330,000 comp. Willison's arithmetic on Levels.fyi data, not an Uber disclosure.
$2,180
Tokens one heavy user burned in 30 days
At API rates, for a $200 subscription. The cost the individual never feels.
GPT-5.5 API price vs. GPT-5.4
Shipped 23 Apr 2026 — a price increase at the frontier.
~1.4×
Effective cost rise, Opus 4.7 vs 4.6
Same $5 / $25 rate card; the tokenizer counts more tokens for the same input.
Sources · Bloomberg (Natalie Lung) via Simon Willison, 2–3 Jun 2026 · Willison, 27 May & 20 Apr 2026
— 07
THE WALL EVERYONE'S WALKING TOWARD

The cost didn't fall. It moved.

The mistake was never in the rate card, but in reading a per-unit price as if it were the bill.

Every blade of the scissor is real. The price of a token did fall, by more than an order of magnitude, and it is still the headline. What the headline leaves out is that the work changed shape underneath it. A chatbot was something you priced like a phone call; an agent is something you meter like electricity, and it runs for hours. The cheaper-AI story was true about the cost of talking to a model, and silent about the cost of one working for you.

Uber is the first named company to say the number out loud, and the unit economics suggest it will not be the last. The work only grows more token-hungry as agents take on more of it, and the budgets are still being drawn from the old reading of the price.

So the cap at Uber is the first visible price of a quiet substitution. We stopped buying answers and started renting workers, and the meter that used to tick in thousands now turns in millions. The bill came due in April because that is when the arithmetic finally caught up with the language.

Uber is the first named company to hit the wall, not the last. The cheaper-AI story was true about the price of talking to a model and silent about the price of one working for you, and the business pays per task, not per token.

Margin notes

Highlight any sentence above to leave a private note. Approved notes from other readers will appear here.

No margin notes yet. Select a passage above to write the first.

How did this land?

A small private signal back to the editor. Multiple can apply.

Letters

Longer responses from readers. Write one below — the editor reviews each before it appears here.

No letters yet. Be the first to write one.

Write a letter
0 / 2000

← cd ../tech

Sources & further reading

  1. Uber Caps Usage of AI Tools Like Claude Code to Manage Costs — Simon Willison's Weblog (relaying Bloomberg / Natalie Lung). simonwillison.net · primary
  2. I think Anthropic and OpenAI have found product-market fit — Simon Willison's Weblog. simonwillison.net · primary
  3. The last six months in LLMs in five minutes — Simon Willison's Weblog. simonwillison.net · primary
  4. Simon Willison — llm-pricing tag index — Simon Willison's Weblog. simonwillison.net · primary
  5. Claude Token Counter, now with model comparisons — Simon Willison's Weblog. simonwillison.net · primary
  6. How coding agents work (Agentic Engineering Patterns) — Simon Willison's Weblog. simonwillison.net · analysis
  7. Writing code is cheap now (Agentic Engineering Patterns) — Simon Willison's Weblog. simonwillison.net · analysis
0% · 7 min Press S

Stories, in your inbox.

One email when each new issue ships. No tracking pixels, no retargeting, no third-party trackers. Unsubscribe in one click.