I'm afraid the level of paranoia at Intel has decreased since then.
Now they are pivoting everywhere, but theirs is the only market with sufficient margins. And the perspective is that their margins will shrink because of competition and software emulation (that they are keeping into control by patent trolling).
Competition pressure could make a company's new product worse than (in this case, less stable than) their previous products, e.x. Samsung phone explosion. I still remembered the story was Samsung wanting to release their phone ahead of iPhone and I would imagine the testing went through a similar stressful time as Intel.
Of course not all cases of taking such risks would lead to disasters - just imagine Intel rushes on releasing new chips ahead of competition and 99 out of 100 times it ended up performing well. But a unique character in Intel's case is these bugs, unlike a faulty battery design, are accumulative and additive to future product development, which means a few small wins in catching up with your competitor could also lead to massive failures in some next major battle.
Now imagine Intel's competitors are going through the exact same scenario. One possible outcome is both Intel and its competitors' products become less stable and more buggy over time, and until everyone's stuff seems to be broken they probably never have time to fix them.
There is a valid point there though - if you are testing for testing's sake and not finding anything extra through the extra effort then you are wasting time and potentially worse: lulling yourself into a false sense of security. Testing should be done for utility, not just in response to fear - you need to test intelligently, not just test lots. Like TTD in software, good testing processes make life much easier and quality much higher, bad testing processes can be worse than useless.
Processor bugs are always a thing and always have been a thing - look at the list of bugs the linux kernel scans for and works around many of which pre-date the FDIV debacle.
What made FDIV special isn't that is was a bad bug, it was the recent change in marketing. Before then processors were sold to manufacturers who might tell the customer what is used, unless you were a hobbyist you didn't much care for the specifics. But the Pentium line was the first time a processor had been particularly marketed directly at the end user. It had started with the 486 lines a couple of years earlier when "Intel Inside" was first a thing, but there was a huge push in that direction with the release of the first Pentium lines. Suddenly Joe Public was more aware of that detail, but was blissfully unaware that CPUs are complex beasts and generally not 100% perfect.
It didn't help that the bug was very easy to demonstrate in common applications like Excel so Joseph & Josephine Public could see and understand the problem where they wouldn't, for example, the FOOF bug, and it was easy to joke about (We are Pentium of Borg. Division is futile. You will be approximated) which fanned the rapid spread of the news. The fact that the bug only significantly affected fairly rare combinations was lost in the mass discussion about how such a bug could happen at all.
I look at a statement like "Our competition is moving much faster than we are" as craven and lacking vision. At that point a wisened old Zen master type figure should've stepped forward.
Competition isn't about imitating the competitor anyway, is it? It's about differentiation, right? Maybe not. But it's not like you can't easily market literally any reasonable decision you make. Paul Masson wineries bragged about selling "no wine before its time" and turned their lack of "velocity" into marketing cachet. (Even though they weren't even unique in that regard.) There's no theoretical reason why Intel couldn't market itself as "the accurate chipmaker," keep on validating "lavishly"(1) and let AMD rush headlong into this kind of bug.
(1) Obviously not... but unfortunately you never know it's not enough validation until it's not enough validation.
Well, that depends on the specific attributes on which there is competitive pressure. When its on time to market, yes, quality will suffer. When its on quality, products will be slower\more expensive,etc. Kind of similar to the often repeated quality triangle in Software dev
Which was disproved in practice.
The Skylake/Kaby hyperthread bug has been fixed in microcode and is no longer applicable. It's perfectly safe to run HT on these processors now.
The AMD Ryzen segfault remains unmitigated at this point in time. Phoronix rushed to declare the bug fixed because they got a binned RMA replacement but there are plenty of reports of it occurring in current-production processors to at least a moderate degree, roughly proportionate with ASIC/litho quality. It's unclear what the scope is w/r/t Epyc since Epyc is on a different stepping but also hasn't really ramped yet either. The early Epyc processors were essentially engineering samples (on the order of hundreds to single-digit thousands of samples) with no real (public) visibility into any binning that might be taking place.
The Ryzen high-address bug is no big deal, that's the kind of thing that gets patched all the time (like the Skylake HT bug). That's one thing Dan is glossing over here - there are tons of these bugs all the time and as long as there is an effective mitigation available it's no big deal.
The PTI patch can be viewed as making syscalls take somewhat longer (about double iirc). Gamers and compute-oriented workloads won't be hurt hardly at all. The average mixed-workload case sees 5% performance loss, not ideal but it's not critical either. Losing 30% is real bad though, and that's what you will get on IO-heavy workloads that context-switch into the kernel a lot.
The only real mitigation there appears to be right now is to give up hyperconvergence for now and harden up those DB/NAS servers that are going to be pushing a lot of IO so that you know there won't be hostile code running on them. That will allow you to safely disable PTI and sidestep the performance hit.
Of course, Epyc was not that good at running databases in the first place, so you still might be better off sucking it up and running Intel even with the PTI patch. It will probably depend on your actual workload and the relative amount of IO vs processing.
Only if you can actually get the fix. My main home PC has this bug and the motherboard manufacturer (ASUS) has yet to ship a BIOS update with the fix.
Actually KPTI doesn't only affect syscall but also interrupts. It makes interrups slower, which affects every workload.
Does this mean they could take a hit due to this bug?
(Edit: I'm assuming the USA, and I'm assuming bugs that were not known to the vendor at the time of the sale.)
A 30% performance reduction (like the page table isolation fixes) probably would be considered material.
Interesting, so if you need that particular product (say it has something specific you need, e.g. a program that only runs well on Intel) and there is no competitor to it with that particular feature (e.g. AMD CPU runs the program poorly) then they can sell you as otherwise-defective of a product as they want and you cannot recover damages?
Or to put it another way, there is no notion of "I would have still bought it because I needed it but knowledge of the defect would have lowered its market value"?
When Intel issues a microcode update to slow down aging Skylake processors so that everyone goes out to buy Cannonlake, you might be able to draw a comparison.
Unfortunately, this is incorrect on many levels.
First, under EU warranty laws, it's not any bug that is covered, but defects that have been assured or are expected to not be present. I'd expect disclaimers to allow for certain errata, for example. EDIT: user ta_wh posted an example for such a disclaimer in a sibling comment.
Second, the vendor is usually not the manufacturer, and therefore seldom in the position to fix the defect themselves.
Third, depending on the nature of the defect, the vendor might have other options besides fixing it/getting it fixed, eg: discount, or returns.
If you're a company buying from more qualified vendors then it might be a different story, however at that point consumer law does not apply to you.
Is this true for software as well?
https://www.postgresql.org/message-id/20180102222354.qikjmf7...
Of course, this depends on workload — gaming will see different results than computationally heavy tasks.
It is likely that games using Vulkan, DX12 or OoenGL's AZDO functions will see much lower performance impact (because they usually only do a handful of syscalls per frame) than games using older APIs, or even OpenGLs immediate mode (which does one syscall per emitted vertex, in worst case)
Perhaps with drivers written in the 90s for hardware from the 90s. Any OpenGL implementation worth its salt will buffer those requests on the client side until they need to be observed. Indeed this was a big feature in the heyday of DirectX 9 where D3D programmers had to count the drawcalls whereas with OpenGL you have way more leeway with calls since the driver tends to be smarter and caches that stuff.
In theory with a modern driver using OpenGL's immediate mode API shouldn't need any more syscalls than building the vertex buffers in your program, setting up the necessary state and issuing a buffer draw command.
The only time where you'd need a syscall per emitted vertex would be if the GPU had OpenGL-like commands and your OpenGL implementation was a thin wrapper over that. I think one of ATI's very early GPUs worked like that (although the commands were per primitive, not per vertex).
Remember, a lot of the Zen arch was developed by Jim Keller, who is the brains behind the Athlon 64.
It would be great if the page displayed the date that the article was posted/updated. It is not in the URL nor the sources. The only way to see the dates is in the RSS feed and even that is only for new articles.
Why?
Let me set the scene: It’s late in 2013. Intel is frantic about losing the mobile CPU wars to ARM. Meetings with all the validation groups. Head honcho in charge of Validation says something to the effect of: “We need to move faster. Validation at Intel is taking much longer than it does for our competition. We need to do whatever we can to reduce those times… we can’t live forever in the shadow of the early 90’s FDIV bug, we need to move on. Our competition is moving much faster than we are” - I’m paraphrasing.
Many of the engineers in the room could remember the FDIV bug and the ensuing problems caused for Intel 20 years prior. Many of us were aghast that someone highly placed would suggest we needed to cut corners in validation - that wasn’t explicitly said, of course, but that was the implicit message. That meeting there in late 2013 signaled a sea change at Intel to many of us who were there. And it didn’t seem like it was going to be a good kind of sea change. Some of us chose to get out while the getting was good. As someone who worked in an Intel Validation group for SOCs until mid-2014 or so I can tell you, yes, you will see more CPU bugs from Intel than you have in the past from the post-FDIV-bug era until recently.
So this is why Krzanich sold his stock. He knows the bug is his fault. Whoops. I think someone may "quit for personal reasons" soon.
https://www.fool.com/investing/2017/12/19/intels-ceo-just-so...
Edit: Looks like ARM64 was affected, but it has an architectural feature that makes the mitigation much easier: http://lists.infradead.org/pipermail/linux-arm-kernel/2017-N...
I guess I'm glad now that Apple put a 2 year old CPU in the early 2015 Macbook Pro! Besides my 2012 Mac Pro, that is the most expensive machine in the house!
What do you think, is this realistic?