The solution, rather than investing time in fixing the memory leak, was to add a cron job that would kill/reset the process every three days. This was easier and more foolproof than adding any sort of intelligent monitoring around it. I think an engineer added the cron job in the middle of the night after getting paged, and it stuck around forever... at least for the 6 years I was there, and it was still running when I left.
We couldn't fix the leak because the team that made it had been let go and we were understaffed, so nobody had the time to go and learn how it worked to fix it. It wasn't a critical enough piece of infrastructure to rewrite, but it was needed for a few features that we had.