In the section "Programs into weights & training beyond gradient descent", near the end, they say:
[...] *the compilation machinery we built for generating those weights** can go further. In principle, arbitrary programs can be compiled directly into the transformer weights, bypassing the need to represent them as token sequences at all. [...] [my emphasis]
In the same section, they also continue:
Weights become a deployment target: instead of learning software-like behavior, models contain compiled program logic.
If logic can be compiled into weights, then gradient descent is no longer the only way to modify a model. Weight compilation provides another route for inserting structure, algorithms, and guarantees directly into a network.
So they (almost-invisibly) admit they compile in the weights, but make it clearer this was the whole intention the whole time in later sentences.