Most of the context is unstructured fluff, much of it is distracting or even plain wrong. Especially the „thinking“ tokens are often completely disjoint halucinations that don’t make any sense.
I think what will have to happen is that context looks less like a long chat and action log and more like a structured, short, schema validated state description, plus a short log trace that only grows until a checkpoint is reached, which produces a new state.