Yeah, that is why software engineering and system operations is hard.
For example, the article doesn't get to a root cause in an absolute way. There is no absolute SEGFAULT of the OS causing the misbehavior. However, they nail down the crash to a gif, and if the gif is in, it crashes, and if the gif is out it doesn't. If the gif is loaded otherwise it crashes, too. At that level, to me, that would be enough, because we're users of the browser's rendering there.
Finding a solid cause that can demonstrate and reproduce a problem, and basing a workaround around that at a boundary you're unwilling to cross can be fine. If it's within the company, it absolutely is fine as long as you escalate beyond that boundary.
However, I have enough teams who are like "Oh, we set all values to 25 one by one and when we arrived at flum-value at 25 it stopped crashing. Fixed." Why 25? Who knows. Why flum? Who knows. Maybe the other value changed at the same time fixed it? Who knows. Do we use 26 once it starts crashing again? Fuck knows. Maybe 24 is better?
We have no explanation for 25, so why would 25 be a good fix?