I’m not at Google anymore but I was a core contributor to the Firebase emulators project when I was. I can think of many flaws with the emulators but flakey is a new one to me
It often just crashed with an error. Now I am a Windows user, so MMMV, and this might be the reason. In some places the behaviour was slightly different and I had to work around that. I don't recall the specifics. And the idea of a test suite that starts the emulator, runs the tests and gives a result, that can reliably run.... well I gave up on that.