They go beyond merely "return something that looks as close to the thing I’ve asked for as it can find". Eg: Say we asked for "A todo app that has 4 buttons on the right that each play a different animal sound effect for no good reason and also you can spin a wheel and pick a random task to do". That isn't something that already exists, so in order to build that, the LLM has to break that down, look for appropriate libraries and source and decide on a framework to use, and then glue those pieces together cohesively. That didn't come from a singular repo off GitHub. The machine had to write new code in order to fulfill my request. Yeah, some if it existed in the training data somewhere, but not arranged exactly like that. The LLM had to do something in order to glue those together in that way.
Some people can't see past how the trick is done (take training data and do a bunch of math/statistics on it), but the fact that LLMs are able to build the thing is in-and-of-itself interesting and useful (and fun!).