These new reasoning models are taking things in a new direction basically by adding search (inference time compute) on top of the basic LLM. So, the capabilities of the models are still improving, but the new variable is how deep of a search you want to do (how much compute to throw at it at inference time). Do you want your chess engine to do a 10 ply search or 20 ply? What kind of real world business problems will benefit from this?