The problem was not that I did not understand the code. I understood it just fine, it wasn't complicated it was just bad and old. All it did was get some data from an api, change it somewhat, store it in a db. Then a scheduled job would call a method which would get that data, change it a bit more and return it where it would be changed yet a bit more and stored as pages for the main web app.
There was no reason all these data mutations couldn't have been in one place instead of all over the place. There was no reason to store it in one db then get it from that one just to store it in another db. Someone said the third party api was slow and unreliable but I don't see how that's relevant - if the api is down then you don't get updated data, it doesn't matter if we have outdated data in an intermediate db. We already have that outdated data in our main db and we'll get updates when the api starts working again. During testing I had absolutely no issues with the performance of the api, it transferred all the data we needed in a completely acceptable amount of time, and this was just in a nightly scheduled job anyway so if it had taken a minute that would have been fine as well. But it didn't, it responded in milliseconds. I never noticed any unreliability on their side either, but if it had been unreliable that would have been totally fine. The app just wouldn't have gotten updates until it started responding again. Nothing can solve that problem.
I honestly can't remember what the actual problem was or how I fixed it in the end. The code had been in production for years and only received the minimum necessary amount of changes. Some dependency or something probably broke from years of nobody wanting to touch that huge piece of crap.
But that's not why I say it was bad and pointless. It was bad because whoever wrote it didn't know about libraries for xml parsing and had implemented all of the parsing from scratch with string operations. We're not talking about real parsing here with lexers and tokenizers and stuff. We're talking about what you might expect if you gave a mediocre first year CS student the task of parsing some specific xml. The db interaction was similarly overcomplicated and outdated, and the code itself was sloppy and full of old messes nobody had bothered to clean up.
All it did was get some data, store it and make it available through some method calls, and for that there were like 50k loc most of which was dead and most of the code still in use was that monstrosity of a homerolled xml parser.
The things left to do on my new solution were trivial. Some of the columns had html tags and stuff like that in them, it just needed to be cleaned out where necessary. Some other stuff needed to be modified a bit. I did not skip it because it was hard, I skipped it because it was tedious and I didn't want to spend all the effort before I got the green light, which turned out to be a great decision because it didn't get the green light. And they probably still to this day waste man-hours on keeping that piece of crap running.