It doesn't work that way. If you're dead set on measuring a single passing test run as a milllibyte's worth of information, you have to realize that every subsequent passing test run is worth less and less. Otherwise running a thousand passing tests would give you a full byte of information, and yet a thousand passing tests does not give you 100% confidence that the enabled optimizations are not buggy. The only way to get a full byte's worth of information is to actually hit upon a failing test run.
And in fact even this is mistaken. You don't have a millibyte's worth of information. You have a lot more. The number of passing test runs at a given optimization gives you a probability that the optimization is buggy. Claiming that all this information you have about probabilities of the bugginess of various optimization passes is really just a millibyte is misleading; you have a lot of information, you're just ignoring it and saying you only care about a millibyte's worth of it.
Flip the scenario around: the code is fine but the test has a concurrency bug in it. It fails about 20% of the time. You ask someone to fix it. After six green test runs they declare the bug fixed. But then build 8 fails and so does build 10.
Why? Because probabilities aren’t additive, they’re multiplicative. You didn’t test 6p > 1, you tested (1 - o)^6 < 0, which will never be true.
In this case there was a 26% chance the test was still broken after 6 green tests. You need 11 runs just to get the probability into single digits.
(Another consequence of this is that if you have 5 flaky tests the probability of a green build is one in four, and the likelihood of getting multiple consecutive red builds is high enough that it happens every week or two).
This doesn't work for the case where the failure(s) might be caused by a combination of a subset of the optimisations. That is,
1000 0100 0010 0001
all succeed, but
0101
fails.
A binary search is not possible for these more complex bugs. We had such cases in the reproducible builds project.
For the case where you can assume (subset A exhibits a failure => any superset of A exhibits a failure), then you can just go through all the options (in the OP's case 256) and switch them on, one at a time. Then the running time is 256, not 8.
For more complex cases where you can't assume even that, then you're stuck indeed with 2^256 test cases.
A trinary or quaternary search should be able to find two bugs at once (and intuition tells me a quad search should be able to handle three).
Say you have 16 optimizations and 7 and 12 don’t work together. Eliminate the first half, the second half, the odd or the even opts and the build is still green. But if you remove the first third or quarter the tests still fail.
So in a trinary search you might see 1-16 -> 6-16 -> 6-13 -> 6-8,12-13 -> 6,7,8,12 -> 6,7,12 -> 7,12. Including the first and last sanity checks you’re in for about 8-14 test passes for 16 tests, which is a lot better than 2^16.
But there’s still a binary search you can do, and that’s a bisect of the commit history. Before some commit this bug didn’t happen. And that’s going to contain one half of the bug. Now you run a second binary search to find its complement.
[1] http://blog.ezyang.com/2011/06/debugging-compilers-with-opti...
Base85! 85 different ASCII characters have lb(85) ≈ 6.4094 bit. With 5 of then you get slightly more than 32 bits. Each of the 5 characters have a surplus 0.4 bit of information to add up to 2 bits.
To encode to Base85 use a 32 bit entity as an 32 bit unsigned integer then divide by 85 five times and take the remainders (from 0 to 84) as indices to the string of all Base85 characters of length 85.
To decode view the characters as a number written in base 85. This means that there are a few Base85 representations which are more than 2^31 - 1, this would mean an decoding error.
There's a link to the PDF on https://en.m.wikipedia.org/wiki/A_Mathematical_Theory_of_Com... .
Is this explanation a little odd?
Each step in a binary search halves the amount of suspects, you need 8 steps because with 8 steps you go from 256 suspects to one.
I can't see it the way the article describes it.
It's just me? Anyone else found the article reasoning for 8 steps a little odd?
It's not 100% wrong, because bit shifting and bit packing are real-world applications used in memory all the time.
But these are not the only possible values:
00000001
00000010
00000100
00001000
00010000
00100000
01000000
10000000
It's not incorrect to make a decision to limit one's input to those values, but it is incorrect to presume that binary data can only be recorded that way.Treating those 8 place settings as eight lightbulb circuits that each need an individual test isn't the right mindset for someone performing compiler optimization.