Can A Token Sale Actually Be Better Revenue?

a newsletter about VC syndicates

Tokens are getting a lot of heat right now. The criticism goes something like this: the same token gets sold multiple times over and counted as revenue at every layer, inflating the size of the AI economy. Anthropic creates the token. Cursor buys API access from Anthropic, marks it up, and bundles it into their product. A consultancy uses Cursor to build something for a client. A wrapper sits on top of another wrapper. By the time you trace the dollar from end user to model lab, the same underlying unit of compute has been recognized as revenue four or five times — or so the story goes.

It's a tidy critique. It's also mostly wrong.

When you actually map how tokens move from producer to consumer and compare it to how goods and software have historically moved through supply chains, the AI token economy isn't unusually long. It's unusually short. And that compression has real implications for who captures value, where margin actually lives, and why the "double-counting" narrative misunderstands what's happening.

How long is a typical supply chain?

Start with a t-shirt at Target. Cotton farmer sells to a textile mill. Mill sells to a garment manufacturer. Manufacturer sells to a brand like Hanes. Hanes sells to a distributor. Distributor sells to Target. Target sells to you. That's six sales for a single physical good, and it's not unusual — most consumer goods you buy have passed through 4 to 6 layers of middlemen before reaching your hands.

Pre-cloud software was shorter but still layered. A developer built the product, a publisher packaged and marketed it, a distributor moved boxes to retail, a retailer like Best Buy or CompUSA put it on a shelf, and you bought it. Three to four sales, with each layer taking a meaningful margin to justify its existence in the chain.

SaaS compressed this further. Software company sells directly to a customer, sometimes through a reseller or systems integrator. Two to three sales, max.

The token economy sits at the extreme end of this compression: typically two sales. Anthropic (or OpenAI, or Google) sells API access to a developer or company. That company embeds it in a product and sells access to an end user. That's it for the overwhelming majority of token usage. Three sales happens occasionally — Poe aggregating multiple models, or a wrapper-on-a-wrapper SaaS — but four or more is rare enough to be a curiosity, not a pattern.

Why tokens don't get resold the way critics imagine

The instinct to count consultancy work as a third or fourth sale is where the analysis usually breaks down. If an agency uses Cursor to write code for a client and bills the client for labor, the tokens were a production input — not a product. The client isn't paying for tokens; they're paying for a finished website, the same way they'd pay for a logo designed in Photoshop. Adobe doesn't get to count that as a third sale of Photoshop.

What actually keeps the chain short

Two forces keep the chain short, and they work at different layers.

Pure commodity resale is hard but not impossible. Aggregators like OpenRouter and cloud-bundled offerings like AWS Bedrock and Azure OpenAI exist, but they survive on coordination value — multi-model routing, compliance wrappers, unified enterprise billing — rather than markup. The margins are thin, and crucially, they don't stack. You don't see resellers of resellers, because there's no second coordination problem to solve once the first one is handled.

Product layers work, but they don't stack deeply either. Cursor, Replit, and Perplexity wrap tokens in genuine product value — an IDE, a dev environment, a search interface — and capture meaningful margin doing it. But these layers also resist stacking, because end customers can see the per-token economics underneath and start asking what each additional wrapper is actually adding. A Cursor-on-Cursor doesn't survive contact with the market.

The chain stays short because both forces are real at the same time: thin margins on commodity resale prevent stacking at the bottom of the stack, and visible economics prevent stacking at the top. The closest historical analog isn't software distribution — it's electricity or telecom bandwidth. Utilities go almost directly from producer to end user with maybe one service layer in between, and tokens follow that pattern more than they follow traditional goods.

Here's where it gets interesting for the people building and investing in this stack.

In a traditional six-layer chain, each middleman takes 20 to 50% margin. By the time a $5 t-shirt reaches you at $25, the cotton farmer who grew the raw input might have captured one or two percent of the final retail price. Most of the value is captured by intermediaries who solve logistics, not by the entity that actually created the thing.

In a two-layer token chain, the producer captures a far larger share of end-user spend. If Cursor charges $20 a month and pays Anthropic $8 in API costs for that user's usage, Anthropic captured 40% of consumer spend. That's a wildly different value distribution than physical goods or even traditional software. The "creator" of the underlying unit is getting paid like a brand, not like a raw material supplier.

This is why the framing of "tokens being sold multiple times" gets the economics backward. The shorter the chain, the more of the end-user dollar flows back to the model lab. If tokens were getting resold five times the way the critique implies, Anthropic would be capturing a much smaller slice of consumer spend, not a larger one. The compression is what gives producers their leverage.

Why this isn't automatically a great business

Shorter chains favor producers, but they come with their own structural problems, and the AI labs are running into all of them simultaneously.

Costs are front-loaded and enormous. Anthropic spent billions on training, compute, and talent before selling a single token. T-shirt manufacturers have low fixed costs and amortize across high volume. Token producers have astronomical fixed costs and need extraordinary volume to justify them.

Inference margins are thin. Running the model costs real money in GPU time, and most credible estimates suggest frontier labs are barely breaking even or losing money on inference at current prices. The good revenue capture only translates to good profit if usage scales massively or model costs fall faster than prices.

The middle layer gets squeezed. Cursor, Perplexity, and the wrapper economy are sandwiched between upstream price increases and downstream user resistance to paying more. Some have great businesses because they add real product value — an editor, a search interface, an agent framework. Pure wrappers are getting culled.

Commoditization is the existential risk. If Claude, GPT, Gemini, and the best open-source models become functionally interchangeable, token prices collapse toward marginal cost. Producers go from capturing 40% of consumer spend to capturing whatever inference costs plus a thin margin — the electricity model, but without the regulated monopoly that makes utilities profitable.

The takeaway

The "tokens get sold five times" narrative is the kind of critique that sounds sophisticated but falls apart on contact with how supply chains actually work. The AI economy isn't suspiciously long. It's suspiciously short. That compression is structurally favorable to whoever creates the underlying token — until it isn't, because the same forces that compressed the chain also make commoditization a faster and more brutal threat than it was for physical goods.

The right question isn't whether tokens are getting double-counted. It's whether the producers at the top of this very short chain can hold their position long enough to convert revenue capture into durable profit. That's a much harder question.

✍️ Written by Zachary and Alex