Once every 5m is not "rapidly". It isn't uncommon for configuration systems to do it every few seconds [0].
> While it’s certainly useful to examine the root cause in the code.
Believe the issue is as much an output from a periodic run (clickhouse query) caused by (on the surface, an unrelated change) causing this failure. That is, the system that validated the configuration (FL2) was different to the one that generated it (ML Bot Management DB).
Ideally, it is the system that vends a complex configuration that also vends & tests the library to consume it, or the system that consumes it, does so as if it was "tasting" the configuration first before devouring it unconditionally [1].
Of course, as with all distributed system failures, this is all easier said and done in hindsight.
[0] Avoiding overload in distributed systems by putting the smaller service in control (pg 4), https://d1.awsstatic.com/builderslibrary/pdfs/Avoiding%20ove...
[1] Lessons from CloudFront (2016), https://youtube.com/watch?v=n8qQGLJeUYA&t=1050