-
Cryptocurrencies
-
Exchanges
-
Media
All languages
Cryptocurrencies
Exchanges
Media
Share
Author: Xiaojing
A new word is becoming popular in Silicon Valley: Tokenmaxxing (maximum amount of Token).
Inside Meta and OpenAI, engineers began to compete on the AI usage leaderboard. According to foreign media reports, one engineer even consumed 210 billion Tokens in a week, which is equivalent to the amount of text in 33 Wikipedias. Some people’s monthly AI bills alone are as high as $150,000.
An Ericsson engineer in Stockholm spent more on Claude than his salary, but the company borne the bill. Token budgeting is becoming a new work benefit for engineers, “just like free snacks or free lunches once were.”
Shopify CEO Tobi Lütke issued an internal memo as early as April 2025, announcing that "AI use is Shopify's baseline expectation", requiring all teams to prove that AI cannot complete the job before applying for new manpower, and to include AI use in performance appraisals. Meta later announced that it would officially incorporate “AI-driven impact” into the performance evaluation of all employees starting in 2026.
When Token consumption begins to appear in KPIs, it has become an organizational behavior signal.
At the same time, signals at the industrial level are equally intensive. On March 16, Huang Renxun defined Token as "the cornerstone of the AI era" at the NVIDIA GTC conference, saying that it will become "the most valuable commodity." The next day, Alibaba announced the establishment of the Alibaba Token Hub business group, which is directly responsible for CEO Wu Yongming. Its positioning is to "create Tokens, transport Tokens, and apply Tokens."

Picture: Jen-Hsun Huang showed a chart of the relationship between Token cost and revenue in his GTC speech, divided the data center into free tier, mid-tier tier, premium tier and Premium tier to allocate computing power, and showed the prediction that Vera Rubin chip will bring a 5-fold increase in revenue compared to Grace Blackwell.
A year ago, Token was only a technical measurement unit that developers only cared about. Now, it has become the language used by chip companies to define product value. It is also the reason why Internet giants reorganize their business groups around it. It has also become a new type of benefit and core KPI in engineers' offers.
However, the Tokenmaxxing ranking only records consumption, and no one records how many effective tasks these tokens have completed.
This happens to be the biggest blind spot in the entire Token economy today.

210 billion Token sounds like an amazing number. But to understand its true meaning, you need to give up an assumption first: Token is a standard product.

Picture: Tokscale global token consumption rankings. Tokscale is an open source token usage tracking and ranking tool that supports multiple platforms such as Claude Code, Cursor, OpenCode, Codex, etc. Users can submit data to participate in global rankings
Two years ago, the pricing of large models was relatively simple, usually with only two basic prices: input token and output token; but today, the pricing system of mainstream manufacturers has been clearly stratified. The same "Token" often corresponds to completely different charging standards under different calling conditions.
Taking Anthropic as an example, the standard input price of Claude Opus 4.6 is US$5 per million Tokens, and the output price is US$25; if prompt caching is enabled, the cache write is US$6.25 for 5 minutes, the cache write is US$10 for 1 hour, and the cache read is US$0.50. If you use the Batch API, the input and output prices can be further discounted by 50%; if you specify inference only in the United States, the price of the relevant token will increase by 10%; and in Fast Mode, the input and output prices of Opus 4.6 will be directly increased to 6 times the standard price.
In other words, the same manufacturer, the same model, and the billing unit also called "Token" will have price differences of several times or even more than ten times due to different conditions such as caching, batch processing, regional inference, and speed levels.
What really drives up the cost is not just the calling fee of the model itself. OpenAI's current price list shows that Web Search charges are differentiated by model type: web searches for models such as GPT-4.1 and GPT-4o are US$10 per thousand times, while web searches for inference models such as GPT-5 are US$25 per thousand times.
File Search costs $2.50 per thousand calls, plus $0.10 per GB per day for vector storage, with the first 1GB free. Code containers have also become a separate billing item: currently, a 1GB container charges $0.03, and 4GB, 16GB, and 64GB containers correspond to higher prices respectively; starting from March 31, 2026, this set of prices will also switch to billing on a session per container basis every 20 minutes.
Outside of the model, search, retrieval, storage, and execution environments, which were often regarded as "auxiliary capabilities" in the past, have now been split into independent cost centers.
Google is moving in the same direction. The official pricing page of Vertex AI shows that starting from February 11, 2026, Code Execution, Sessions and Memory Bank in Agent Engine have begun to be officially charged. The relevant prices are no longer packaged in general, but are priced separately according to vCPU hours and GiB memory hours.
So, when we talk about "large model prices" today, we can no longer just focus on the input and output Token unit prices. What has really changed is the billing logic. What large model manufacturers currently sell is a complete set of basic AI capabilities that can be run, stored, searched, called, and executed continuously.

Picture: Screenshot of OpenAI pricing page, multi-layer charging structure besides Token (independent billing items such as Web Search, File Search, Container, etc.)
If you just look at the price of the model API, Token is indeed approaching the cabbage price. Anthropic’s Opus has dropped from US$15/million Tokens in the previous generation to US$5, a two-thirds drop. DeepSeek V3.2 has hit $0.28. Google Gemini 2.5 Flash Lite is available for as low as about $0.10.
The price advantage of the Chinese model is even more obvious. OpenRouter data shows that the Token unit price of the Chinese model is about one-sixth to one-tenth of that of overseas competitors. Even after Tencent Cloud Hunyuan HY2.0 Instruct ended its public beta subsidy and increased its price by more than 460%, the input price is equivalent to approximately US$0.62/million Tokens, which is still lower than Anthropic’s cheapest Haiku 4.5 (US$1) and less than one-fifth of Sonnet 4.6.

Picture: Artificial Analysis maintains a real-time updated LLM ranking list, with huge price gradients between different models
But the total cost of using AI has not declined accordingly. Three mechanisms are at work simultaneously.
First, the model has become smarter, at the cost of becoming "talkative". Artificial Analysis' report points out that the average output token usage of the inference model is approximately 5.5 times that of the non-inference model. Both Anthropic and OpenAI charge extended thinking tokens based on output tokens. The deeper the model thinks, the longer the bill will be. The unit price has dropped, but the total amount of Tokens used to complete the same task has increased several times.
Second, Agent changes Token from "one-time consumption" to "continuous consumption". This is the deep driving force of Tokenmaxxing. Engineers are not manually swiping tokens. Their AI programming agents run non-stop 24 hours a day, automatically splitting tasks, calling tools, and iterating themselves. According to Alibaba Cloud data, the computing power consumption of a single Agent is 100 to 1,000 times that of a traditional Chatbot. China's overall daily average Token consumption will exceed 30 trillion in mid-2025, and has jumped to 180 trillion by February 2026.
Third, the underlying cost of producing Tokens is rising. On March 18, 2026, Alibaba Cloud and Baidu Smart Cloud announced on the same day that they would increase the prices of AI computing power and storage products, with an increase of up to 34%. AWS increased the price of machine learning capacity blocks by approximately 15% in January, and Google Cloud announced that it will increase AI infrastructure fees starting in May.
A cloud computing industry expert said: "This price adjustment in the cloud market is mainly determined by supply and demand and driven by cost. Subsequent prices are also mainly determined by the price trend of the entire supply chain."
GPU, parallel storage, high-speed network, data center power, model prices are falling, but everything that Token production relies on is rising. When Anthropic released Opus 4.6, it specifically emphasized that "the price remains unchanged". The implication is that manufacturers have a stronger ability to absorb costs themselves.
In other words, the model is the engine, but gas costs, parking fees and highway tolls are all going up.
The three mechanisms are stacked together, andthe result is that there is an increasingly wider gap between the token price and the real task cost.
Back to Tokenmaxxing. The rankings record Token consumption, but not the quality of output. An engineer burning 33 Wikipedia tokens in a week does not mean that he has completed 33 Wikipedia's worth of work.
Major manufacturers include Token consumption in KPIs, or as a "welfare". Is it essentially a jump in productivity, or is it some kind of "productivity performance"?
This touches on the core structural flaw of Token economics. The industry has not yet established an effective measurement from Token consumption to task completion. Token measures input, not output. An Agent spent 1 million Tokens to complete a task, and another Agent spent 100,000 Tokens to complete the same task. The performance on the Tokenmaxxing ranking is exactly the opposite, with the former ranking higher.
Shopify CEO Lütke has a note in the memo: He claimed that some colleagues are contributing "10 times the output that was previously thought impossible," but he did not give specific measurements.
A new type of professional anxiety was born: If you don’t demonstrate AI productivity through high token consumption, you may be regarded as falling behind. This kind of anxiety is exactly the same as the logic that made every company rush to build a website in the early 2000s, and every brand must build an app in the 2010s: technology adoption itself becomes a signal, consumption becomes a proxy indicator, and the measurement of true value is delayed.
But unlike before, the cost of this round is real. With a monthly AI bill of US$150,000, a consumption of 210 billion Tokens per week, and the underlying computing power and storage that continues to increase in price, Tokenmaxxing is not free. When the cost is high enough, the difference between "burning tokens" and "using tokens to create value" changes from a philosophical issue to a financial issue.
There is no doubt that the unit price of Token will continue to decline.
The real anxiety lies in who can turn Token into task completion rate most efficiently. For every programmer, every company, and every ordinary user, when measuring the cost of AI, don’t look at how much it costs per million tokens, but how many tokens it is worth spending to complete one thing.
The gap between these two numbers is the biggest business opportunity in the next stage of the "intelligent era with Token as the new weight and measure", and it is also the deepest cost trap.