undefined | Better HN

0 pointsdefanor5y ago0 comments

I've considered using an AMD CPU instead of Intel's Xeon on the primary desktop computer, but even low-end Ryzen Threadripper CPUs have TDP of 180W, which is a bit higher than I'd like. And though ECC is not disabled in Ryzen CPUs, AFAIK it's not tested in (or advertised for) those, so one won't be able to return/replace a CPU if it doesn't work with ECC memory, AIUI, making it risky. Though I don't know how common it is for ECC to not be handled properly in an otherwise functioning CPU; are there any statistics or estimates around?

0 comments

CydeWeys5y ago

> but even low-end Ryzen Threadripper CPUs have TDP of 180W, which is a bit higher than I'd like.

Why does it matter? It doesn't idle that high; it only goes that high of you're using it flat out, in which case the extra power usage is justified because it's giving that much more performance over a 100 W TDP CPU. Now I totally get it if you don't want to go Threadripper just for ECC because it's more expensive, but max power draw, which you don't even have to use? I've never seen anyone shop a desktop CPU by TDP, rather than by performance and price.

phkahler5y ago

>> I've never seen anyone shop a desktop CPU by TDP, rather than by performance and price.

Oh oh, me! Back in the day I bought a 65W CPU for a system that could handle a 90W. I wanted quiet and figured that would keep fan noise down at a modest performance penalty. It should also last longer, being the same design but running cooler. I ran that from 2005 until a few years ago (it still run fine but is in storage).

Planning to continue this strategy. I suspect it's common among SFF enthusiasts.

mlyle5y ago

On AMD, with Ryzen Master, you can set the TDP-envelope of the processor to what you want. Then the boost/frequency/voltage envelope it chooses to operate in under sustained load is different.

IMO, shopping by performance/watt makes sense. Shopping by TDP doesn't. (Especially since there is no comparing the AMD and Intel TDP numbers as they're defined differently; neither is the maximum the processor can draw, and Intel significantly exceeds the specified TDP on normal workloads).

2 more replies

sam_lowry_5y ago

Hm... My 2013 NUC in fanless Akasa enclosure runs 24/7 on a 6W CPU, I recently looked at the options, and the 2019 6W offering changes little in performance. Yes, memory got faster, but that's it.

My passive-cooled desktop is also running a slightly trottled down 65W CPU.

So yes, there are people who choose there hardware by TDP.

1 more reply

koolba5y ago

SFF?

2 more replies

ClumsyPilot5y ago

If the concern is about heat and noise, the cooling system is the most important factor, oddly I don't see it even mentioned.

Get a huge cooler like Noctua d14, and you pc becomes silent. It lasts forever, requires no maintenance, a good investment.

If you are adventurous, watercooling is even better, but its a can of worms I decided I'd rather live without - possibility of leaks and cost make it harder to justify

defanorOP5y ago

I prefer to pick PSU and fans (for both CPU and chassis) that can handle it comfortably (preferably while staying silent and with some reserve) with maximum TDP in mind, and given that I don't need that many cores or high clock speed either, a powerful CPU with high TDP is undesirable because it just makes picking other parts harder. I've mentioned TDP explicitly because I wouldn't mind if it was a (possibly even high-end) Threadripper that somehow didn't produce as much heat. Although price also matters, indeed.

bayindirh5y ago

> I've never seen anyone shop a desktop CPU by TDP, rather than by performance and price.

That's me. When I start to plan for a new system, I select the processor first and read its thermal design guidelines (Intel used to have nice load vs. max temp graphs in their docs) and select every component around it for sustained max load.

This results in a more silent system for idle and peace of mind for loading it for extended duration.

4111111111111115y ago

That's not necessarily correct.

You can passively cool threadrippers if you underclock them enough and have good ventilation in case.

1 more reply

ethanpil5y ago

Built a NAS. My #1 concern for choosing CPU was TDP. This machine is on 24/7 and power use is a primary concern where I live because electricity is NOT cheap.

1 more reply

theevilsharpie5y ago

> And though ECC is not disabled in Ryzen CPUs, AFAIK it's not tested in (or advertised for) those

ECC isn't validated by AMD for AM4 Ryzen models, but it's present and supported if the motherboard also supports it. Many motherboards have ECC support (the manual will say for sure), and a handful of models even explicitly advertise it as a feature.

I have a Ryzen 9 3900X on an ASRock B450M Pro4 and 64 GB of ECC DRAM, and ECC functionality is active and working.

colejohnson665y ago

What do you mean by “validated”? There’s the silicon, but they don’t test it?

loeg5y ago

IMO, "validated" is intentionally wishy-washy and mostly means that AMD would prefer it if enterprises paid them more money by buying EPYC (or Ryzen Pro) parts instead of consumer Ryzen parts. Much like how Intel prefers selling higher-margin Xeons over Core i5. It's market segmentation, but friendlier to consumers than Intel's approach.

Laforet5y ago

More like "The feature is present in silicon but motherboard makers are not required to turn it on". At the end of the day, ECC support does require extra copper traces in the PCB and some low end models may deliberately choose to skip them, thus the expectation has to be managed.

jsmith455y ago

My (largely unfounded) understanding is that this means they don't don't run the consumer chip configurations though a battery of compatibility tests with with various memory modules on a reference motherboard. My understanding is that they run these steps for each stepping or significant process change for their.

Secondly, it probably also means that they do not include tests for this functionality when they perform the final tests against each fully assembled chip. I'd expect that a jtag boundary scan does verify that the bond wires are in place and work, but no functional tests of ECC are run on each processor in the consumer configuration.

The net result is that with a compatible motherboard and memory, ECC almost certainly works (since the memory controller is the same as in the supported model) but AMD does not officially guarantee it. It is much like overclocking. The functionality is present, and it should work, and most likely does, but AMD accepts no responsibility if it does not, since they don't formally test for it.

cturner5y ago

I went through this about a year ago, to build a low-TDP ECC workstation. I do not have stats on failure rates, just this anecdotal experience. Asrock and Asus seem to be the boards to get. For RAM, I got two sticks of Samsung M391A4G43MB1, and verified. The advice I remember from the forums was to stick to unbuffered ram (UDIMMS).

everybodyknows5y ago

Did you consider any off-the-shelf ECC boxes?

Found some here -- bottom of the EPYC product line starts at $2849 ...!

https://www.velocitymicro.com/wizard.php?iid=337

cturner5y ago

The TDP on EPYC chips is a lot higher. I think of Threadripper as mid-tier, and EPYC as the high-end. Ryzen is remarkable because you can buy new equipment with ECC, at consumer prices. I am hazy, but don't think that has been possible since the 386 era ('parity ram').

1 more reply

loeg5y ago

Yes, the consumer parts only support UDIMMs. If you want RDIMMs, you have to pay for EPYC.

vvanders5y ago

I don't think Threadripper is a hard requirement for ECC. There's some pretty reasonable TDP processors if you step down from Threadripper.

cuu5085y ago

I haven't seen definite details and test results on these (but haven't looked recently).

What specific configurations (CPU, MB, RAM) are known to work?

Let's say I have a Ryzen system, how can I check if ECC really works? Like, can I see how many bit flips got corrected in, say, last 24h?

adrian_b5y ago

All desktop Ryzen CPUs without integrated GPU, i.e. with the exception of APUs, support ECC.

You must check the specifications of the motherboard to see if ECC memory is supported.

As a rule, all ASRock MBs support ECC and also some ASUS MBs support ECC, e.g. all ASUS workstation motherboards.

I have no experience with Windows and Ryzen, but I assume that ECC should work also there.

With Linux, you must use a kernel with all the relevant EDAC options enabled, including CONFIG_EDAC_AMD64.

For the new Zen 3 CPUs, i.e. Ryzen 5xxx, you must use a kernel 5.10 or later, for ECC support.

On Linux, there are various programs, e.g. edac-utils, to monitor the ECC errors.

To be more certain that the ECC error reporting really works, the easiest way is to change the BIOS settings to overclock the memory, until memory errors appear.

cturner5y ago

Regarding verification. There is a debian package called edac-utils. As I recall you overclock your RAM and run your system at load in order to generate failures.

Looking back at my notes, the output of journalctl -b tells should say something like, "Node 0: DRAM ECC enabled."

Then 'edac-ctl --status' should tell you that drivers are loaded.

Then you run 'edac-util -v' to report on what it has seen,

    mc0: 0 Uncorrected Errors with no DIMM info
    mc0: 0 Corrected Errors with no DIMM info
    mc0: csrow2: 0 Uncorrected Errors
    mc0: csrow2: mc#0csrow#2channel#0: 0 Corrected Errors
    mc0: csrow2: mc#0csrow#2channel#1: 0 Corrected Errors
    mc0: csrow3: 0 Uncorrected Errors
    mc0: csrow3: mc#0csrow#3channel#0: 0 Corrected Errors
    mc0: csrow3: mc#0csrow#3channel#1: 0 Corrected Errors
    edac-util: No errors to report.

1 more reply

p_l5y ago

All AMD CPUs with integrated memory controllers support ECC. The CPU also exposes an interface usable by the operating system to verify ECC works - the same interface is used to provide monitoring of memory fault data provided by ECC.

They aren't tested on it, so it's possible to get a dud, but it's minuscule chance that isn't worth bothering.

Now, to actual issues you can encounter: motherboards

The problem is that ECC means you need to have, iirc, 8 more data lines between CPU and memory module, which of course mean more physical connections (don't remember how many right now). Those also need to be properly done and tested, and you might encounter a motherboard where it wasn't done. Not sure how common, unfortunately.

Another issue is motherboard firmware. Even though AMD supplies the memory init code, the configuration can be tweaked by motherboard vendor, and they might simply break ECC support accidentally (even by something as simple as making a toggle default to false then forgot to expose it in configuration menu).

Those are the two issues you can encounter.

The difference with AFAIK Threadripper PRO, and EPYC, is that AMD includes ECC in its test and certification programs for it, which kind of enforces support.

1 more reply

theevilsharpie5y ago

On Windows, to check if ECC is working, run the command 'wmic memphysical get memoryerrorcorrection':

    PC C:\> wmic memphysical get memoryerrorcorrection
    MemoryErrorCorrection
    6

SuperUser has a convenient decoder[1], but modern systems will report "6" here if ECC is working.

When Windows detects a memory error, it will record it in the system event log, under the WHEA source. As a side note, this is also how memory errors within the CPU's caches are reported under Windows.

[1] https://superuser.com/questions/893560/how-do-i-tell-if-my-m...

xxs5y ago

Every Ryzen (non APU) supports it* Check the montherboard of your choice, they would declare it in big bold letters, e.g.[0]

*not officially, and the memory controller provides no report for 'fixed' errors.

0: http://www.asrock.com/mb/AMD/X570%20Taichi/

usefulcat5y ago

It’s not. I have a low end Epyc machine with ECC. It has a TDP of something like 30 watts.

defanorOP5y ago

I didn't consider embedded CPUs (I guess that's about an embedded EPYC, not a server one), those look neat. But there's no official ECC support (i.e., it's similar to Ryzen CPUs), is there?

Edit: as detaro mentioned in the reply, there is, and here's the source [0] -- that's what they mean by "RAS" on promotional pages [1]. That indeed looks like a nice option.

[0] https://www.amd.com/system/files/documents/updated-3000-fami...

[1] https://www.amd.com/en/products/embedded-epyc-3000-series

3 more replies

detaro5y ago

what kind of machine is that? Been vaguely looking for one a while back, and everything seemed difficult to get (since the main target is large-volume customers I guess)

1 more reply

stefan_5y ago

I don't understand. Whatever the TDP of Intel processors, you are straight up getting less bang for watt given their ancient process. Same reason smartphones burst to high clocks and power; getting the task done faster is on average much more efficient.

BlueTemplar5y ago

> one won't be able to return/replace a CPU if it doesn't work with ECC memory

I don't know where you live, but around here, (if you buy new?), the vendor MUST take back items up to 15 days after they were delivered, for ANY reason.

So, as long as you synchronize your buying of CPU, RAM, (motherboard), you should be fine.

dannyw5y ago

You can use BIOS settings to change the TDP to whatever you like, with substantially higher efficiency if you are under-powering and substantially lower efficiency if you are over-powering.

loeg5y ago

> I've considered using an AMD CPU instead of Intel's Xeon on the primary desktop computer, but even low-end Ryzen Threadripper CPUs have TDP of 180W, which is a bit higher than I'd like.

Any apples-to-apples comparable Intel CPU will have comparable power use. The difficulty is that Intel didn't really have anything like Threadripper — their i9 series was the most comparable (high clocks and moderate core counts), but i9 explicitly did not support ECC memory, nullifying the comparison.

You're looking at 2950X, probably? That's a Zen+ (previous gen) model. 16 core / 32 thread, 3.5 GHz base clock, launched August 2018.

Comparable Intel Xeon timeline is Coffee Lake at the latest, Kaby lake before that. As far as I can tell, no Kaby Lake nor Coffee Lake Xeons even have 16 cores.

The closest Skylake I've found is an (OEM) Xeon Gold 6149: 16/32 core/thread, 3.1 GHz base clock, 205W nominal TDP (and it's a special OEM part, not available for you). The closest buyable part is probably Xeon Gold 6154 with 18/36 core/threads, 3GHz clock, and 200W nominal TDP.

Looking at i9 from around that time, you had Skylake-X and a single Coffe Lake-S (i9-9900K). 9900K only has 8 cores. The Skylake i9-9960X part has 16/32 cores/threads, base clock of 3.1GHz, and a nominal TDP of 165W. That's somewhat comparable to the AMD 2950X, ignoring ECC support.

Another note that might interest you: you could run the Threadripper part at substantially lower power by sacrificing a small amount of performance, if thermals are the most important factor and you are unwilling to trust Ryzen ECC: http://apollo.backplane.com/DFlyMisc/threadripper.txt

Or just buy an Epyc, if you want a low-TDP ECC-definitely-supported part: EPYC 7302P has 16/32 cores, 3GHz base clock, and 155W nominal TDP. EPYC 7282 has 16/32 cores, 2.8 GHz base, and 120W nominal TDP. These are all zen2 (vs 2950X's zen+) and will outperform zen+ on a clock-for-clock basis.

> And though ECC is not disabled in Ryzen CPUs, AFAIK it's not tested in (or advertised for) those, so one won't be able to return/replace a CPU if it doesn't work with ECC memory, AIUI, making it risky.

If your vendor won't accept defective CPU returns, buy somewhere else.

> Though I don't know how common it is for ECC to not be handled properly in an otherwise functioning CPU; are there any statistics or estimates around?

ECC support requires motherboard support; that's the main thing to be aware of shopping for Ryzen ECC setups. If the board doesn't have the traces, there's nothing the CPU can do.

marcosdumay5y ago

Keep in mind that Intel lies about its TDP.

magila5y ago

There's been a lot of misinformation spread about what TDP means for modern CPUs. In Intel's case TDP is the steady state power consumption of the CPU in its default configuration while executing a long running workload. Long meaning more than a minute or two. The CPU implements this by keeping an exponentially weighted moving average (EWMA) of the CPU's power consumption. The CPU will modulate its frequency to keep this moving average at-or-below the TDP.

One consequence of using a moving average is that if the CPU has been idle for a long time then starts running a high power workload instantaneous power consumption can momentarily exceed the TDP while the average catches up. This is often misleadingly referred to as "turbo mode" by hardware review sites. It's not a mode, there's no state machine at work here, it's just a natural result of using a moving average. The use of EWMA is meant to model the heat capacity of the cooling solution. When the CPU has been idle for a while and the heatsink is cool, the CPU can afford to use more power while the heatsink warms up.

Another factor which confuses things is motherboard firmware disabling power limits without the user's knowledge. Motherboards marketed to enthusiasts often do this to make the boards look better in review benchmarks. This is where a lot of the "Intel is lying" comes from, but it's really the motherboard manufacturers being underhanded.

The situation on the AMD side is of course a bit different. AMD's power and frequency scaling is both more complex and much less documented than Intel's so it's hard to say exactly what the CPU is doing. What is known is that none of the actual power limits programmed into the CPU align with the TDP listed in the spec. In practice the steady state power consumption of AMD CPUs under load is typically about 1.35x the TDP.

Unlike Intel, firmware for AMD motherboards does not mess with the CPU's power limit settings unless the user does so explicitly. Presumably this is because AMD's CPU warranty is voided by changing those settings, while Intel's is not.

vel0city5y ago

Intel and AMD absolutely have "turbo mode" and market such features using the term "turbo". It might just be a result of the weighted moving average, but the term "turbo" isn't something reviewers just made up out of nowhere.

https://www.intel.com/content/www/us/en/architecture-and-tec...

https://www.amd.com/en/technologies/turbo-core

xxs5y ago

Intel measures TDP at base frequency... that's disingenuous.

colejohnson665y ago

They don’t. They just measure it differently than AMD. Intel measures at base clock, but AMD measures at sustained max clock IIRC. It’s definitely deceptive, but it’s not a lie as long as Intel tells you (which they do).

wtallis5y ago

Intel's TDP numbers are at best an indicator of which product segment a chip falls into. They are wildly inaccurate and unreliable indicators of power draw under any circumstance. For example, here's a "58W" TDP Celeron that can't seem to get above 20W: https://twitter.com/IanCutress/status/1345656830907789312

And on the flip side, if you're building a desktop PC with a more high-end Intel processor, you will usually have to change a lot of motherboard firmware settings to get the behavior to resemble Intel's own recommendations that their TDP numbers are supposedly based on. Without those changes, lots of consumer retail motherboards default to having most or all of the power limits effectively disabled. So out of the box, a "65W" i7-10700 and a "125W" i7-10700K will both hit 190-200W when all 8 cores/16 threads are loaded.

If a metric can in practice be off by a factor of three in either direction, it's really quite useless and should not be quantified with a scientific unit like Watts.

ksec5y ago

It is a lie when they change the definition of TDP without telling you first and later redefined the word to different thing once they got caught.

May be we should use a new term for it, something like iTDP.

marcosdumay5y ago

Well, it's a power measurement that isn't total and can't be used for design... So, it's a lie.

If they gave it some other name, it would be only misleading. Calling it TDP is a lie.

mlyle5y ago

They both lie, but Intel lies worse :D

paulmd5y ago

Nah. Both brands pull more than TDP when boosting at max, AMD desktop processors will pull up to 30% above the specified TDP for an indefinite period of time (they call this number the "PPT" instead, but they need to go higher than TDP to hit full boost, and PPT is the number that governs that).

Intel mobile processors actually obey TDP better than AMD processors do - Tiger Lake has a hard limit, when you configure a 15W TDP then it really is 15W steady-state once boost expires, while AMD mobile products will pull up to 50% more than configured in steady-state operation. (the gap is larger than desktop)

https://images.anandtech.com/doci/16084/Power%20-%2015W%20Co...

"the brands measure it differently" is sort of theoretically true but not in the sense people think, and in practice it is not true.

On AMD it is literally just a number they pick that goes into the boost algorithm. Robert Hallock did some dumb handwavy shit about how it's measured with some delta-t above ambient with a reference cooler but the fact is that the chip itself basically determines how high it'll boost based on the number they configure, so that is a self-fulfilling prophecy, the delta-t above ambient is dependent on the number they configure the chip to run at.

In practice: what's the difference between a 3600 and a 3600X? One is configured with a TDP of 65W and one is configured with a TDP of 95W, the latter lets you boost higher and therefore it clocks higher. Configure them both to a 65W PPT limit and they will boost to pretty much the same place.

Intel nominally states that it's measured as a worst-case load at base clocks, something like Prime95 that absolutely nukes the processor (and even then many processors do not actually hit it). But really it is also just a number that they pick. The number has shifted over time, previously they used to undershoot a lot, now they tend to match the official TDP. It's not an actual measurement, it's just a "power category" that they classify the processors as, it's informed by real numbers but it's ultimately a human decision which tier they put them in.

So in practice, for both brands, it is just a number they pick. They have different theoretical methods for getting there but ultimately the marketing department looks at where the clocks would put them and pick a power number that they think represents that. It is not, in practice, a pure measurement from either brand, it is just a "category" they use.

Real-world you will always boost above base clocks on both brands at stock TDP, at least on real-world loads. You won't hit full boost on either brand without exceeding TDP, the "AMD measures at full boost" is categorically false despite the fact that it's commonly repeated. AMD PPT lets them boost above the official TDP for an unlimited period of time, they cannot run full boost when limited to official TDP.

numlock865y ago

Can you cite something? Sounds interesting.

colejohnson665y ago

It’s not true. Sortove. Intel measures at base clock while AMD does at sustained peak clock. Deceptive? Yes. Lie? No.

j / k navigate · click thread line to collapse