How can we, as software developers, find these cases in our multi-megabyte code bases, and how can we be sure we haven't missed any?
For example, if you're paranoid, make your compiler be conservative, in the same way that you might address buffer overflows by using a language/compiler that inserts bounds checks everywhere.
If you're less paranoid and/or more worried about performance, invest in static analysis tools or languages with augmented type systems. After all, you only have to worry about Spectre variant 1 when handling attacker-controlled data. Tracking type info like this is already done by existing static analysis tools.
Finally, if you're not handling attacker-controlled data at all - which is the case for a lot of performance-sensitive code - you really don't want to (and don't have to) do anything about Spectre variant 1.
By the way, this is really the big difference between the two Spectre variants, and why it's a shame that they fall under the same name. Variant 2 affects all code with indirect jumps/calls, even code that doesn't ever touch attacker-controlled data. That's a huge difference between the variants.
Anyway, the bottom line is that you shouldn't punish the performance of all code over a class of security bugs that a lot of code isn't affected by. Buffer overflows haven't stopped us and shouldn't stop us from writing performance sensitive but security uncritical code in unsafe languages either.
No. the problem here is that the code isn't wrong. The CPU is wrong. Whether or not the CPU will leak data depends on the make and model of CPU. Most MIPS CPUs and many ARM CPUs don't have this problem. Some AMD x86-type CPUs may not. It has to be fixed on the CPU side.
This could introduce Intel to a world auto manufacturers know well - recalls. Intel has been there before, with the floating point bug.
Look: When other side channel leaks were found, e.g. people recovering RSA or AES keys from plain cache timing without speculative execution, maybe there were people similarly arguing that it's the CPU's fault. They lost that fight, too. Today, the uncontested consensus is that cache timing leaks are the code's fault, for good reason.
Because what are you going to do, stop building caches? Obviously not, they exist for very good reasons. The same is true for speculative execution. What do you expect CPU people to do? Rip that out entirely? Be real. (Please, seriously think about that: what is it that you actually want CPU people to do? Don't just handwave!)
This kind of discussion is why Linus Torvalds regularly flames security people.