The rest of the analysis is all straightforward except for the topic classification: If I was running on all cylinders for nights and weekend hacking I'd expect to spend a week or so on gathering training data for that (maybe suck down category assignments from Tildes) and probably a week coding up a classifier w/ a BERT-family embedding and classifiers from scikit-learn. You could attempt the same with prompt engineering in a day or two of work and get something expensive to run working pretty quickly that will probably not perform as well as my classifier. (Their classifier gives the empty string as a category which is a bad sign) If you want to improve on it you need the same training set I do.