The globally optimal point on whatever thing we were optimizing might indeed be the highest peak of the graph, but if it is a sharp peak, any deviation in the voltage, temperature, or the real world values of the components would put the operating point far down the slope of that peak.
It was much better to find a reasonable operating point that had low sensitivity to voltage/temperature/component values but had acceptable behavior (gain, noise, whatever was important).
The surprising thing I learned from that class is that even though resistor and capacitor values and the gain of individual transistors of IC op-amps is an order of magnitude worse than for discrete designs, the matching of those terrible components was an order of magnitude better than for discrete components. Designers came up with many clever ways to take advantage of that to wring terrific performance from terrible components.
For example, say the nominal value of a given resistor in the design might be 4K ohms, and in the discrete design they might be 2%, or 1%, or 0.5% off (the ones with tighter tolerance get ever more expensive), while in the monolithic design the tolerance might be +/- 20%. But all the resistors would be off by the same amount and would match each other to a fraction of a percent, even across temperature and voltage variations.
The other funny effect is that when you buy a discrete 2% tolerance resistor, the distribution isn't gaussian around the mean. That is because the manufacturers have measured all of them and the ones within 0.5% get marked up and put in the 0.5% bin, and the remaining ones within 1% tolerance get marked up less and get put in the 1% bin. As a result, the distribution is bimodal on either side of the "hole" in the middle.
In the past, I have looked at using optimizers to solve for component values of complex analog circuits. I only looked at optimizing accuracy at one corner, but it would be interesting to see what people have done to optimize multiple variables, like including noise, for multiple corners. I think I've seen some Monte Carlo simulation mentioned once to suggest fuzzy solutions within some specification. I would be curious to see if others know more about this.
You might even do this in order to find out if wider tolerance parts would be good enough and save a little money.
That's a great explanation of how iterative tolerance filtering works and why that happens!
This really just explains externalities. Competition creates apparent short-range existential risk to a business. Real existential risk (to people or things damaged by your defective product because of "precarious perfection") land somewhere else - usually where their impact is much larger.
This is why I remain an engineer and nothing more. If this sentence ever became meaningful I guess I would have been over-promoted.
How about "prioritizing robustness and other kinds of long-termism need to be at least somewhat protected by the system of incentives they operate in to succeed"? I invoked fractal self-similarity there because I think you best protect long-termism by prioritizing long-termism in decision-making at scopes further out
Until we can move past that silly winner-takes-all incentive we can't have nice things. Most of the genuinely good stuff will be stillborn. We'll always have a 5% vying for perfection in an ever-escaping, unrealistic red queens race, while the bottom 95% suffer a dearth of simply good-enough. How many objectively better search engines than Google died in the ditch of obscurity between 1998 and 2020?
HTTP/2 gets you this behavior less often, but when you get unlucky with packet loss it affects every segment instead of just one, so the player doesn't have any segments to skip forward to.
Only way to optimize well is to include the uncertainty of your world model into the model.
For travelling salesman, you obviously want to model that certain roads take longer time to travel at different times of day. No tweak of the loss function would allow you to get realistic/robust solutions to TSP.
For example, we in infra-operations are responsible to store data customers upload into your systems. This data has to be considered not reproducible, especially if it's older than a few days. If we lose it, we lose it for good and then people are disappointed and turn angry.
As such, large scale data wipes are handled very carefully with manual approvals from several different teams. The full deletion of a customer goes through us, account management, contract and us again. And this is fine. Even with the GDPR and such, it is entirely fine that deleting a customer takes 1-2 weeks. Especially because the process has caught errors in other internal processes, and errors in our customers processes. Suddenly you're the hero vendor if the customer goes "Oh fuck, noooooo".
On the other hand, stateless code updates without persistence changes are supposed to be able to move as fast as the build server gives. If it goes wrong, just deploy a fix with the next build or roll back. And sure you can construct situations in which code changes cause big, persistent, stateful issues, but these tend to be rare with a decent dev-team.
We as infra-ops and central services need to be robust and reliable and are fine shedding speed (outside of standard requests) for this. A dev-team with a good understanding of stateful and stateless changes should totally be able to run into a wall at full speed since they can stand back up just as quickly. We're easily looking at hours of backup restore for hosed databases. And no there is no way to speed it up without hardware changes.