The Bill Came Due in April

— 01

THE SCISSOR

Two blades, closing.

Everyone agrees on the sentence: AI is getting cheaper. Look closer and it is true at the unit level and false at the invoice level, and that gap is the whole story. A scissor has two blades. Watch both.

The blade everyone watches

Price per token keeps falling.

Frontier capability that cost dollars per million tokens in 2023 now costs cents. Claude 3 Haiku launched at $0.25 / $1.25 per million input/output tokens; today's small models sit lower still — GPT-5-mini at $0.05 / $0.40, Grok 4 Fast at $0.20 / $0.50. The rate card has fallen more than an order of magnitude in three years. The corollary feels obvious: an AI budget should be shrinking.

The blade that cuts the budget

Tokens consumed per task climb faster.

The tools stopped being things you talk to and became things that work for you. A coding agent re-reads the whole codebase on every step, calls tools, and runs reasoning loops, so one task can burn well over a million tokens where a chat reply burns a few thousand. Multiply a unit price that halves each year by a unit count that climbs an order of magnitude, and the product, the actual invoice, goes up. Uber budgeted 2026 against last year's cheaper-per-token mental model and exhausted the entire annual figure in four months.

— 02

THE CROSSOVER

The month the chatbot started working.

Here's the thing the bill doesn't tell you: nobody raised prices. The token bill climbed because, around November 2025, the software quietly changed jobs.

For most of 2025, OpenAI and Anthropic were running what Simon Willison sums up as "Reinforcement Learning from Verifiable Rewards to increase the quality of code written by their models." Basically: teaching the models to write code that passes the tests. The payoff landed in a single fortnight. GPT-5.1 Codex Max shipped on 19 November 2025, Claude Opus 4.5 on 24 November, and paired with their coding harnesses, Willison dates this to the moment coding agents went "from often-work to mostly-work, crossing a quality barrier where you could use them as a daily-driver to get real work done."

A chatbot answers and stops. An agent keeps going. It reads the repository, plans, edits, runs the tests, reads the failures, and tries again. And here's the part that costs money: to hold its place, it replays the entire conversation back to the model on every single step. Willison's own guide puts the mechanics plainly. Coding agents "maintain state by replaying entire conversations with each new prompt," so "as a conversation gets longer, each prompt becomes more expensive since the number of input tokens grows every time." The work that used to be a question is now a loop, and the loop is metered.

And that's the part the cheaper-per-token headlines skip right over. The price of generating a line of code fell to almost nothing. The cost of getting an agent to deliver good code, across a long autonomous run, didn't fall with it. The tools the labs finally found a market for, Willison writes, are exactly the ones that "burn vastly more tokens, but are also quickly becoming daily drivers for the work carried out by extremely well-compensated professionals." The capability showed up, and the consumption showed up welded to it.

Around November 2025, coding agents crossed from often-work to mostly-work and became daily drivers for highly paid engineers. The shift from a tool you talk to into a worker that runs for you is what changed the token math, because an agent that works autonomously burns vastly more tokens than a chat that answers a question.

— 03

TOKENS PER TASK

Why more tokens beats cheaper tokens.

The other blade, plotted. One of Willison's GPT-5 Codex tasks consumed 169,818 input, 17,112 output, and 1,176,320 cached tokens: well over a million for a single task, and the only verified point on this chart, sitting at the top. The vertical axis is logarithmic because that anchor sits three orders of magnitude above the bottom. The lower points trace the shape from a single chat turn upward and are illustrative of task class, not measured.

Tokens per task · chat turn → agentic session · log scale tokens consumed (one task) · log

Source · Top point: real GPT-5 Codex task (Willison, llm-pricing). Lower points illustrative of task class, not measured.

— 04

HOW THE BILL CAME DUE

The chronology is the argument.

Read top to bottom, the sequence explains itself. The tools got good, so the pricing flipped to metered usage; once usage was metered, the frontier price started rising; and an enterprise budget set against the old mental model ran out a third of the way into the year.

Aug 7 2025

GPT-5 sets the frontier price.

Ships at $1.25 input / $10 output per million tokens — the price point the coming agent wave is built on.

Nov 24 2025

The November inflection.

GPT-5.1 Codex Max (Nov 19) and Claude Opus 4.5 (Nov 24), plus their harnesses, take coding agents from often-work to mostly-work. The daily-driver era begins.

Apr 2 2026

OpenAI moves Codex to token metering.

Codex pricing realigned "to align with API token usage, instead of per-message pricing." The flat-rate mental model is retired.

Apr 14 2026

Anthropic meters the enterprise seat.

Enterprise terms change to roughly $20 per seat per month plus API pricing. Usage is now billed, not bundled.

Apr 16 2026

Opus 4.7 ships — same rate card, more tokens.

At an unchanged $5 / $25 price card, the tokenizer counts more tokens for identical input. Effective cost rises about 1.4× versus Opus 4.6.

Apr 23 2026

GPT-5.5 ships at 2× the price of GPT-5.4.

At the frontier, the per-token price is now rising, not falling. The cheaper-AI corollary fails exactly where the heaviest work happens.

~Apr 2026

Uber's 2026 AI budget is exhausted.

Four months into the year, the entire annual figure is gone, set against a cheaper-per-token reality that no longer exists.

Jun 2 2026

Uber caps every engineer at $1,500 / month per tool.

Bloomberg reports separate monthly token budgets per agentic tool. Willison calls the cap a rational policy response to over-spending.

— 05

THE REAL PRICE OF A SESSION

Sticker price versus what it actually rang up.

The rate card is one number; the realised cost is another. Over thirty days, at standard interface rates, Willison ran up more than two thousand dollars of tokens. He paid two hundred. That gap is what keeps the true cost off the individual's own statement, which is precisely why finance is the last to find out. The dashed line marks what he actually paid.

One heavy user · 30-day token cost at API rates vs. subscription price paidUSD · 30 days

what he actually paid (subscriptions) · 200

Tokens billed at API rates (total) 2180.16

Anthropic Claude Code 1199.79

OpenAI Codex 980.37

Actually paid (Max + Pro) 200

Source · Simon Willison, 27 May 2026 — self-reported 30-day usage

— 06

$1,500 / MONTH

The first institution to hit the wall.

Uber is the first named company to run into the scissor in public. The cap is a ceiling, drawn around a line item that, left uncapped, ate a year's budget by April. The figures, set side by side.

UBER · AI CODING SPEND · 2026

$1,500 /mo

Per engineer, per agentic tool

Separate budgets per tool. Applies to Cursor and Claude Code — agents, not chat assistants.

4 months

To exhaust the entire 2026 AI budget

Set against last year's cheaper-per-token, lighter-usage reality.

~11 %

Of a median engineer's total comp

Two tools ≈ $36,000/yr against ~$330,000 comp. Willison's arithmetic on Levels.fyi data, not an Uber disclosure.

$2,180

Tokens one heavy user burned in 30 days

At API rates, for a $200 subscription. The cost the individual never feels.

2×

GPT-5.5 API price vs. GPT-5.4

Shipped 23 Apr 2026 — a price increase at the frontier.

~1.4×

Effective cost rise, Opus 4.7 vs 4.6

Same $5 / $25 rate card; the tokenizer counts more tokens for the same input.

Sources · Bloomberg (Natalie Lung) via Simon Willison, 2–3 Jun 2026 · Willison, 27 May & 20 Apr 2026

— 07

THE WALL EVERYONE'S WALKING TOWARD

The cost didn't fall. It moved.

The mistake was never in the rate card, but in reading a per-unit price as if it were the bill.

Every blade of the scissor is real. The price of a token did fall, by more than an order of magnitude, and it is still the headline. What the headline leaves out is that the work changed shape underneath it. A chatbot was something you priced like a phone call; an agent is something you meter like electricity, and it runs for hours. The cheaper-AI story was true about the cost of talking to a model, and silent about the cost of one working for you.

Uber is the first named company to say the number out loud, and the unit economics suggest it will not be the last. The work only grows more token-hungry as agents take on more of it, and the budgets are still being drawn from the old reading of the price.

So the cap at Uber is the first visible price of a quiet substitution. We stopped buying answers and started renting workers, and the meter that used to tick in thousands now turns in millions. The bill came due in April because that is when the arithmetic finally caught up with the language.

Uber is the first named company to hit the wall, not the last. The cheaper-AI story was true about the price of talking to a model and silent about the price of one working for you, and the business pays per task, not per token.

The Bill Came Due in April

Two blades, closing.

The month the chatbot started working.

Why more tokens beats cheaper tokens.

The chronology is the argument.

Sticker price versus what it actually rang up.

The first institution to hit the wall.

The cost didn't fall. It moved.

Margin notes

How did this land?

Letters

Sources & further reading

The Bill Came Due in April

Two blades, closing.

The month the chatbot started working.

Why more tokens beats cheaper tokens.

The chronology is the argument.

Sticker price versus what it actually rang up.

The first institution to hit the wall.

The cost didn't fall. It moved.

Margin notes

How did this land?

Letters

Sources & further reading

Stories, in your inbox.