Nevertheless, roughly consider a dataset D for which we have an approximate stochastic model of its conditional frequency associations: P(next|previous..., D) etc.
Then if your prompt really got that reply, from this model, it would do so like this:
"Construct" is first projected to an encoding which replaces it, effectively, with a set of related words (Construct, Make, Create, Write...) all weighted by how they co-occur with construct.
Then we sample from D based on this word set, obtaining roughly, all conversations where these related words were used, call this Dc.
Next take "a sentence" and replace it with its word-set, say, (Sentence, Phrase, Words, ...) and sample conversations from Dc in which these occur, Dcs..
And so on. Since each token in your prompt actually corresponds to basically all possible words but weighted by association, each "filtering operation" actually selects vast amounts of the training data (space).
Finally, consider the reverse problem: what words could this system possibly produce from this process that weren't relevant to your prompt? Given enough data (PBs of text from all possible digitized conversations, books, etc.) then a sensible-seeming answer becomes the only plausible one to generate.
Now, I do think here PBs wouldnt be enough to generate a single statistical model that behaved this way -- so you need a mixture of them (ie., ChatGPT) and I suspect you also need a system for regulating discrete constraints such as quantities. I suspect many deployed LLMs have improved in this area due to models trained to be specifically sensitive to quantities.