I wouldn't measure how good/fast/performant a library is looking at the results of the very first LLM attempt at doing a trivial task using that library. If you don't know the libraries well enough to spot some improvements the LLM missed, the only thing you're judging is either how sane the defaults are or how good the LLM is at writing performant code using that library, none of which are equivalent to how good the library is.
Also, performing well in a prototype scenario is very different than performing well in production-ready scenario with a non-trivial amount of templates and complex operations. Even the slowest SSGs perform fast when you put three Markdown posts and one layout in them, but then after a few years of real-world usage you end up in a scenario where the full build takes about half an hour.
Kinda cool that you can do that in an afternoon, but absolutely useless as a benchmark of anything.