Token usage is increasing but the link I shared is comparing price per token which is the reason they are excluding reasoning models.
Reasoning models have also gotten cheaper.
Generally cost to “achieve a certain task” using whatever model, even reasoning has drastically reduced.
The best example is arc agi.
https://arcprize.org/leaderboard
This measures cost to achieve certain percentage of score. Fix a certain accuracy and see how much price reduces over time. It’s more than 100x.