They blogged everything to generate the setup, including the hunch and test code but the anecdotal results are missing. It's a little suspect. How much faster is ARM??
1) They’d distract from the main point (I wasn’t aiming to write a benchmarking post), and
2) They can be misleading, since results will vary across ARM hardware and even between Snapdragon X Elite variants.
Instead, I included the PowerShell snippets so anyone interested can reproduce the results themselves.
For a rough sense of the outcome: the Snapdragon VM outperformed the Intel VM by ~20–80%, depending on the test (DNS ~20%, IIS ~50%, all others closer to ~80%).
Weird reasoning.
You already caught our attention with your article. But not everyone has the time or means to go and re-do the tests.
However such information is really important to surface when making infra decisions. And if one of the brain cells pops up and says something about 20-80% perf improvement VS there were some perf improvements - which would be more convincing to research the topic when the time comes for the reader to benefit from your research?
I haven't seen ARM outperform X86 by a margin that large anywhere else.
You're testing "variability" and latency, and you even mention that "modern Intel CPUs tend to ramp frequency..." but entirely neglect to mention which specific Windows Power Profile you were using.
Fundamentally, you're benchmarking a server operating system on laptops and/or desktop-class hardware, and not the same spec either. I.e.: you're not controlling for differences in memory bandwidth, SSD performance, etc...
Even on server hardware the power profiles matter! A lot more than you think!
One of my gimmicks in my consulting gig is to change Intel server power settings from "Balanced" to "Maximum Performance" and gloat as the customer makes the Shocked Pikachu face because their $$$ "enterprise grade server" instantly triples in performance for the cost of a button press.
Not to mention that by testing this in VMs, you're benchmarking three layers: The outer OS (and its power management), the hypervisor stack, and the inner guest OS.
A bit of backstory: there are two, totally independent implementations behind the Windows heap allocation APIs (i.e. the implementation code behind RtlHeapAlloc and RtlHeapFree, which are called by malloc/free). The older of the two, developed uring the Dave Cutler era, is known as the "NT heap". The newer implementation, developed in the 2010s, is known as "segment heap". This is all documented online if anyone wants to read more. When development on segment heap was completed, it was known to be superior to the NT heap in many ways. In particular, it was more efficient in terms of memory footprint, due to lower fragmentation-related waste. Segment heap was smarter about reusing small allocations slots that were recently free'd. But, as ever, Windows was very serious about legacy app compat. Joel Spolsky calls this the 'Raymond Chen camp'. So, they didn't want to turn segment heap on universally. It was known that a small portion of legacy software would misbehave and do things like, rely on doing a bit of use-after-free as a treat. Or worse, it took dependencies on casting addresses to internal NT heap data structures. So, the decision at the time was to make segment heap the default for packaged executables. At that time, Windows Phone still existed, and Microsoft was pushing super hard on the Universal platform being the new, recommended way to make apps on Windows. So they thought we'd see a gradual transition from unpackaged executables to packaged, and thus, a gradual transition from NT heap to segment heap. The dream of UWP died, and the Windows framework landscape is more fragmented than ever. Most important software on Windows is still unpackaged, and most of it runs on x64.
Why does this matter? Because segment heap is also enabled by default on arm. Same logic as the packaged vs unpackaged decision. Arm64 binaries on Windows are guaranteed not to be ancient, unmaintained legacy code. Arm64 windows devices have been a big success, and users widely report that they feel more responsive than x64 devices.
A not insignificant part of why Windows feels better on arm is because segment heap is enabled by default on arm.
I'd be interested to see how this test turns out if you force segment heap on x64. You can do it on a per-executable basis via creating a DWORD value named FrontEndHeapDebugOptions under HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\<myExeName>.exe, and giving it a value of 8.
You can turn it on globally for all processes by creating a DWORD value named "Enabled" under HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Segment Heap, and giving it a value of 3. I do this on my dev machine and have encountered zero problems. The memory footprint savings are pretty crazy. About 15% in my testing.
P.S. In Windows 95 - Windows Vista era, there was a good tradition of "Compatible with Windows XXX" certifications for apps. If MS did something like that for Windows 10/11 and included the segment heap tick mark into it, a considerably larger amount of apps and its users would benefit from increased performance. Think better energy consumption and eco-friendliness as additional bonuses.
P.S. 2: The problem with UWP was not the technology itself, it was the stubbornness to have it packaged and tied to The Store, all of which contradicts the very existence of Windows as an OS.
I can't really complain, though. If UWP would've broken through, the Steam Deck would've probably been a much more massive undertaking to get working right.
As long as developers can opt into the new system (which they can with the manifest approach), I don't think it matters whether you're doing UWP or traditional Windows applications.
Microsoft has added a mishmash of flags in the app manifest and transparently supports manifest-less applications, so developers don't have a need to ever bother including a manifest either.
It'd annoy a lot of people, but if Windows would show a "this app has been written for an older version of Windows and may be slower than modern applications" warning for old .exes (or maybe one of those popups they now like about which apps are slower than they could be), developers would have an incentive to add a manifest to their applications and Microsoft could enable a lot more of these optimisations for a lot more applications.
Could you please tell me, where are all these manifest flags documented? I asked about it a decade and a half ago at stackoverflow (https://stackoverflow.com/questions/5733085/application-mani...), and the only answer was "there isn't".
First, regarding application compatibility: the heap was already changed once prior to the segment heap. The Low Fragmentation Heap (LFH) was added in XP and made default in Vista, with applications no longer having to opt into it:
https://learn.microsoft.com/en-us/windows/win32/memory/low-f...
Second, the segment heap has different tradeoffs that make it not a guaranteed win to swap in, it trades off performance for working set:
Side note on the Chromium topic: Google Chrome decided NT Heap is still best for their usage, but Microsoft Edge, which is also built on the Chromium, uses segment heap. Not sure what Firefox uses. You can check by attaching WinDbg and doing !heap. Note that not every heap will be segment heap, even if you globally opt into segment heap. Some code paths explicitly create their own heaps as NT heaps.
At the very least, using fewer pages to allocate the same amount of data improves memory locality slightly. Folks should test and see what works best in their applications.
Another benefit of segment heap that we haven't discussed yet is that it's more strict and proactive about detecting problems and terminating. From what I understand, heap metadata is now stored separately from heap data, and they use guard pages. So heap buffer overruns don't overwrite the heap manager's bookkeeping. With NT heap, crashes due to use-after-free might manifest much later and more indirectly. Like, maybe it overwrote the free list, or it overwrote some newer allocation that landed on the same address. So, the crash is usually in some unlucky 'innocent bystander' call stack that worked with the corrupted region. With segment heap, you tend to get earlier, more actionable, specific crashing call stacks, closer to the site of the original bug. So, if you're an engineer who looks at a lot of difficult windows crash dumps involving memory corruption, segment heap makes the challenge slightly more surmountable.
I definitely noted that in my tests. Under load, machines with flaky RAM have higher memory access violation rates compared to NT Heap.
I had previously seen this described as 0 vs non-zero. Since you have some inside experience :), anything special about 3 instead? What about 2? How would I find these value meanings out on my own (if that's even possible)?
Thanks!
Using the application manifest approach is the right way to ship software that opts into segment heap. The registry thing is just a convenience for local testing.
Does that global registry key require a reboot, or does it just take effect on executable launch?
Also assuming that most Microsoft first party applications in Windows server (DNS, etc etc) would all be optimised for segment heap ?
It is a crime that segment heap is over a decade old and still so underutilized. Gamers in particular go to such great lengths to tweak and optimize their windows machines for perf, but I still haven't seen that crowd discussing segment heap anywhere. It's more important than ever with the recent explosion in RAM cost.
Maybe not boost clocks, but every arm system I've used supports some form of frequency scaling and behaves the same as any x86 machine I've used in comparison. The only difference is how high can you go... /shrug
This could be tested even on the existing desktop hardware, by disabling "Turbo" in the BIOS settings, so that the Intel CPU would run at the base clock frequency, providing a lower, but stable and predictable performance.
Its possible ARM is a better architecture. But a lot of benchmarks end up stressing one part of the system more than any other. And if thats the case, faster RAM or faster syscalls or faster SSD performance or something could be whats really driving this performance difference.
The biggest reason I still keep a Xeon and Threadripper server around is NVidia support.
But you’re not going to do that in a lab/personal machine, usually.
1. ARM64 is actually less "smart" than x64. While Intel's Core i9 tries to be clever by aggressive boosting and throttling, Snapdragon just delivers steady and consistent performance. This lack of variability makes it easier for the OS to schedule tasks.
2. It is possible that the ARM build is more efficient than the x64 build, because Windows has less historical clutter on ARM than x64.
So, has CPU throttling become too smart to the point it hurts?
The x86 server CPUs, like AMD Epyc or Intel Xeon, have a lower range within which the clock frequency may vary and their policies for changing the clock frequency are less aggressive than for desktop CPUs, so they provide a more constant and predictable performance, which favors multi-threaded workloads, unlike in desktop CPUs, where the clock frequency control algorithms are tuned for obtaining the best single-thread performance, even if that hurts multi-threaded performance.
> The x86 server CPUs, like AMD Epyc or Intel Xeon, have a lower range within which the clock frequency may vary and their policies for changing the clock frequency are less aggressive than for desktop CPUs
Probably we need to compare Xeon/EPYC with something like AWS Graviton or Ampere Altra to get an accurate picture here. That said, I think "Windows Server works fast on Snapdragon" is both crazy and fascinating; I wasn't even sure if that was possible.
Not clear how both Amd and Intel not only lost the smartphone fight but also lost in their own field (aka servers, laptops, desktops)
15 years ago if I told you that windows would be running better on ARM you would call me crazy.
Apple A18 Pro (Q1 2026): Multithread 11977, Single Thread 4043
Intel Core i5-1235U (Q1 2022): Multithread 12605, Single Thread 3084
--
On the high-end we got i9-13900KS at about 60k, M5 Max 18 scores about the same. But when you move on to server CPUs like Threadripper and EPYC things are about 3x faster.
Lets see if the brand new Arm AGI changes this situation in a few months.
browserbench speedometer 3.0 on A18 pro - 33, Intel Core i5-1235U - 22
i9-13900KS gets about 33
M4 Pro - 44-50
Try this:
https://www.cpubenchmark.net/compare/6693vs7115vs7229vs7232/...
There still are applications where ISA matters, like technical/scientific computing, where the performance can be dominated by array operations or operations with big numbers. For such workloads the x86 CPUs with AVX-512 can provide a performance per watt and per dollar that cannot be reached by the current ARM-based CPUs.
However, reading the summary left me confused like you don't understand what's happening at Microsoft.
> Hopefully Microsoft will spend more time in the future on their server product strategy and less on Copilot ;-)
The future product strategy is clear, it's Linux for servers. .Net runs on Linux, generally with much better performance. Microsoft internally on Azure is using Linux a ton and Windows Server is legacy and hell, MSSQL is legacy. Sure, they will continue to sell it because if you want to give them thousands of dollars, they would be idiots to turn it down but it's no longer a focus.
It's not a dominant database anywhere on the outside.
However since we now got the tools for running on both, and experience migrating, we might be moving to PostgreSQL at some point in not too distant future. Managed MSSQL in Azure is not cheap.
Knowing nothing about this, I wonder if they're getting ready to retire Windows Server, and wanted to get their server products off it?
Edit: How they did it is also quite fascinating:
https://www.microsoft.com/en-us/sql-server/blog/2016/12/16/s...
https://www.microsoft.com/en-us/research/project/drawbridge/
>a key contribution of Drawbridge is a version of Windows that has been enlightened to run within a single Drawbridge picoprocess.
MSSQL on Linux only seems to use parts of that project (a smaller abstraction layer), but that's still super cool.
I'd be curious what a better/non-legacy solution is! (as I do this stuff haha, and don't see much else other than full cloud options, sf etc)
Azure networking is Linux.
EDIT: Marvel at the NT4 style Task Manager [0].
[0] https://techcommunity.microsoft.com/blog/windowsosplatform/a...
Windows server is actually kind of awesome for when you need a Windows machine. Linux is great for servers but Windows server is the real Windows pro. Rock solid and none of the crap.
The worst part of Windows server is knowing that Microsoft can make a good operating system and chooses not to.
Even Apple and Google run AD.
Gotta support all those CAD workstations running Windows. Is Apple hardware still designed on a Windows box?
It's a beast in terms of complexity, in my opinion. But the vendor only supports running it on specific configurations.
(I know, I know. That question might be a bit too loaded. I'm really very sorry. No, there's no need that; I'll see myself out.)
mild \s
I know big company that run their core on Windows Server 2012, I’ve no idea how they manage the software assurance and compliance