Unfortunately I didn’t really get the point of the article after being bombarded with stats, expect that the authors have an AI tool to sell.
I thought the repetition of these statitics was a little tired, but overall that's an impressive solution. Also totally get that the hardest part is log ingestion and indexing.
I guess they’re missing whatever Google has to make their monorepo scale
In my experience, multiple small repos don’t even have better CI reliability than a mono repo as less is invested because it affects fewer people. 10 person repos regularly have flaky tests that never get addressed because “we’ll deal with it later”. The tolerance for flakiness goes up when you can attribute it to a close teammate you know is heads down on something critical instead of it feeling like a random test you don’t even care about.
Not the problems but the part where broken CI cause everything to stop.
Fractured repos have their own downsides but less chance of literally everyone sitting and waiting is greatly reduced
In my experience the only repos that never get stuck are ones with no checkin gates.
there doesn't seem to be any upside on having it only for flaky tests because the workflow is really agnostic to the context.
Even so, at what point do we consider the LLM-ification of all of tech a hazard? I've seen Claude go and lazily fix a test by loosening invariants. AI writes your code, AI writes your tests. Where is your human judgment?
Someone is going to lose money or get hurt by this level of automation. If the humans on your team cannot keep track of the code being committed, then I would prefer not to use your product.
He does pull a sneaky on you from time to time, even nowadays, in v4.6, doesn't he?
To me it's analogous to the current situation at the strait of Hormuz - it's an enormous crisis but since almost everyone has a buffer of oil stockpiles, we can pretend it's not there.
I’d challenge you to identify where in my post I said I wouldn’t use software that employs automation?
It is pretty clear I am not talking about running CI for automated and predictable signals or cron jobs. I am talking about using AI to write code and also fix tests.
It is exceedingly clear in practice that the volume of code produced by LLMs is too much for the humans using these tools to read and understand. We are collectively throwing decades of best practices out of the window in service of “velocity.” Even the FAANG shops I know of who previously had good engineering cultures seem to be endorsing the cult of: AI generated everything with stamp approval.
Please no AI slop, write your own bloody blog posts.
Jesus, this is why Bazel was invented.
1. Test pass rate is 99.98% is not good - the only acceptable rate is 100%.
2. Tests should not be quarantined or disabled. Every flaky test deserves attention.
In the rare case that one is flaky, it's addressed. During the days when there is a flaky test, of course you don't have 100% pass rate, but on those days it's a top priority to fix.
But importantly: this is library and thick client code. It should be deterministic. There are no DB locks, docker containers, network timeouts or similar involved. I imagine that in tiered application tests you always run the risk of various layers not cooperating. Even worse if you involve any automation/ui in the mix.
Obviously there are systems it depends on (Source control, package servers) which can fail, failing the build. But that's not a _test_ failure.
If the build it fails, it should be because a CI machine or a service the build depends on failed, not because an individually test randomly failed due to a race condition, timeout, test run order issue or similar
I always assumed the purpose was leadership wanting an indicator that implied that someone had at least looked at every failing test.