You turn the crank and you get a probability distribution for the next token in the sequence. You then sample the distribution to get the next token, append it to the vector, and do it again and again.
Thus the typical LLM have no memory as such, it inferes what it was thinking by looking at what it has already said and uses that to figure out what to say next, so to speak.
The characters in the input prompt are converted to these tokens, but there are also special tokens such as start of input, end of input, start of output and end of output. The end of output token is how the LLM "tells you" it's done talking.
Normally in a chat scenario these special tokens are inserted by the LLM front-end, say Ollama/llama.cpp in this case.
However if you interface more directly you need to add these yourself, and hence can prefill out the output before feeding the vector to the LLM for the first time, and thus the LLM will "think" it already started writing code say, and thus it is likely to continue doing so.