"Graviton3 processors also include a new pointer authentication feature that is designed to improve security. Before return addresses are pushed on to the stack, they are first signed with a secret key and additional context information, including the current value of the stack pointer. When the signed addresses are popped off the stack, they are validated before being used. An exception is raised if the address is not valid, thereby blocking attacks that work by overwriting the stack contents with the address of harmful code. We are working with operating system and compiler developers to add additional support for this feature, so please get in touch if this is of interest to you"
(it's not impossible to bypass, I'm vaguely aware it's been done on Apple's new chips that implement a similar (the same?) ARM extension, but there's no perfect security)
What's the trade-off?
Here are a couple "real world" examples--
Project Zero had a blogpost about some of the weaknesses on the original Pointer Auth spec [0], and even had a follow up [1].
Here is an example of what some mitigation might look like, showing how gets(), which is a classically trivially vulnerable primitive, becomes not-so-trivial (but still feasible enough to do in a blogpost, obviously) [2].
Cost-wise, in terms of both hardware and software, it's rather cheap. The hardware to support this isn't too expensive, about on par with a multiplier. On the software end, like I said, it's taken some time to mature and gotten to a pretty good state IMO, with basically all compilers providing simple usage since 2019-- just turn on a flag!
ARM also did a performance vs. ROP gadget reduction analysis [3]. The takeaway is, as others have mentioned, while it doesn't completely mitigate, it does heavily increase the complexity for rather cheap.
In fact, I'm rather annoyed Amazon didn't include this feature on Graviton2, and to claim it as new or innovative on their end feels just like marketing speak. Any CPU that claims to be ARMv8.5-a compliant *must* have this feature, and that's been around for quite a few years now.
[0]: https://googleprojectzero.blogspot.com/2019/02/examining-poi...
[1]: https://bazad.github.io/presentations/BlackHat-USA-2020-iOS_...
[2]: https://blog.ret2.io/2021/06/16/intro-to-pac-arm64/
[3]: https://developer.arm.com/documentation/102433/0100/Applying...
Arm-designed licensable cores didn't have it back then, and that's what AWS uses.
Graviton2 used the Neoverse N1 core.
[0] https://en.wikipedia.org/wiki/Shadow_stack
[1] https://www.intel.com/content/www/us/en/developer/articles/t...
> While we are still optimizing these instances, it is clear that the Graviton3 is going to deliver amazing performance. In comparison to the Graviton2, the Graviton3 will deliver up to 25% more compute performance and up to twice as much floating point & cryptographic performance. On the machine learning side, Graviton3 includes support for bfloat16 data and will be able to deliver up to 3x better performance.
This means nothing to me. Why is there more floating point and cryptographic performance? Did Amazon change the Neoverse core? Is this N1 cores still? Did they tweak the L1 caches?
I don't think Amazon has the ability to change the core design unfortunately. This suggests to me that maybe Amazon is using N2 cores now?
But it'd be better if Amazon actually said what the core design changes are. Even just saying "updated to Neoverse N2" would go a long way to our collective understanding.
You can infer this to N2 which ARM gave their own results [1], N2 uses SVE2 256bit.
[1] https://community.arm.com/arm-community-blogs/b/architecture...
I would guess price/performance matters more than peak performance for a lot of use cases. With prior Graviton releases, AWS has made it so they are better price/performance. Keep in mind that a vCPU on Graviton is a full core rather than SMT/Hyperthread (half a core).
Irrelevant.
The vast majority of applications running in the cloud are business applications that struggle to saturate the CPU and waste most of the CPU cycles idlying by in epoll/select loops. Unless you need HPC, you do not need the fastest CPU, either.
> create further lock-in opportunities
Don't like AWS/Graviton? Take your workload to the Oracle cloud and run it on Oracle ARM.
Don't like ARM? If your app is interpreted/JIT'd (e.g. Python/NodeJs) or byte code compiled (JVM), lift and shift it to the IBM cloud and run it on a POWER cloud instance – as long as IBM offers a price-performance ratio comparable to that of AWS/Graviton or you are willing to pay for it.
I am pretty sure Zen 3 doesn't bring 25% ST performance improvement compared to Zen 2.
If anything, this is just another data point that shows how truly commoditized tech is. I just worry what happens when Amazon decides to “differentiate” after they lock you in.
I mean, it's great that the threshold to produce ICs is now lower, but is this really the way forward? Shouldn't we have separate CPU companies, so that everybody can benefit from progress, not only the mega corporations?
It is still just ARM. You can buy ARM chip everywhere. There is no walled garden.
> Shouldn't we have separate CPU companies, so that everybody can benefit from progress,
You are benefiting the same CPU design from ARM, and same Fab improvement from TSMC. Amortised across the whole industry. Doesn't get any better than that.
Only large companies can build CPUs based on ARM. Also, now companies might rely on everything in vanilla ARM, but soon they will be adding parts of their own ISA, improvements to the memory hierarchy, a GPU, or perhaps even their own management engine to keep an eye on things or to keep things locked down.
> There is no walled garden.
There is huge potential for walled gardens, just look at Apple.
Compare RaspberryPi or Snapdragon with Apple M1.
Compare for instance the Zen 3 upset vs the M1 upset. Zen 3 allowed the market to pick what they thought was the best CPU, the M1 allowed the market to pick if they wanted to buy an entire computer, OS, and software set because the CPU was good. Similarly with Graviton and Amazon, you can't just say Amazon is competing the same as Via, their interest is in selling the AWS ecosystem not in providing the best individual components. Same with Google and their custom chips and Microsoft with theirs now. Yes many are "just ARM" but due to custom extensions/chips and (in some cases) lack of standard ARM features that doesn't mean they are the same ARM.
Of course that's not to argue it's wrong because it's vertical integration, many will think that's the better way to make complicated products, but that's not the point - the way big companies are competing on chip design is very different than if one acted like an AMD/Intel/Via competitor to actually compete in the chip space instead of a larger space.
Blobs can also be reverse engineered.
Remember Amazon brought Annapurna Labs in 2015. And only released their first Graviton instances in 2018. The lead time for a Server CPU product is at least a year even when you have blueprints. That is ignoring fab capacity booking and many other things like testing. And without scale ( AWS is bigger than GCP and Azure combined ) it is hard to gain competitive advantage ( which often delays management decision making ).
I think you should see Azure and GCP ARM offering in late 2022. Marvel's exit statement on ARM server SoC pretty much all but confirmed Google and Microsoft are working on their own ARM offering.
Azure cloud is bigger than AWS
https://cloudwars.co/microsoft/microsoft-q2-cloud-revenue-st...
In terms of physical servers. I believe Chinese cloud providers, and their monster CNDs (China has terribly slow Internet, and superlocal CDNs are the only way out there) overtook AWS quite a few years ago.
I think they are bound by long term supply agreements with Intel. They will just bargain for better prices with Intel.
Not an easy task it will be, given that Intel is capacity jammed.
"The second phase will see these entities develop custom integrated chips and System on a Chip (SoC) with "lower power consumption, improved performance, reduced physical size, and improved reliability for application in DoD systems."
maybe its coming?
https://aws.amazon.com/blogs/aws/new-amazon-ec2-m6a-instance...
"Up to 35 percent higher price performance per vCPU versus comparable M5a instances, up to 50 Gbps of networking speed, and up to 40 Gbps bandwidth of Amazon EBS, more than twice that of M5a instances."
"Larger instance size with 48xlarge with up to 192 vCPUs and 768 GiB of memory, enabling you to consolidate more workloads on a single instance. M6a also offers Elastic Fabric Adapter (EFA) support for workloads that benefit from lower network latency and highly scalable inter-node communication, such as HPC and video processing."
"Always-on memory encryption and support for new AVX2 instructions for accelerating encryption and decryption algorithms"
https://aws.amazon.com/about-aws/whats-new/2021/10/amazon-ec...
>First in the cloud industry to be equipped with DDR5 memory.
Quite hard to tell whether this is Neoverse V1 or N2. Since the description fits both . But this SVE extensions will move a lot of workload that previously wont suitable for Graviton 2
Edit: Judging from Double floating point performance it should be N2 with SVE2. Which also means Graviton 3 will be ARMv9 and on 5nm. No wonder why TSMC doubled their 5nm expansion spending. It will be interesting to see how they price G3 and G2. And much lowered priced G2 instances will be very attractive.
SIMD instructions, basically in the Intel x86 world that is like SSE4.
> Can’t be done on G2
Probably close to zero? Assuming your code compiles and run on ARM. It is just a matter of whether that operation is fast or slow, or in AWS terms whether it is cost effective since those EC2 instances are priced differently. And that cost includes porting and testing your software on ARM. For a lot of Web Server workload, G2 nearly offer 50% reduction in cost at the same or better performance. At the scale of twitter it absolutely makes sense to move those operation over. There are some workloads that dont like well things like 3D Renderers, or software that has too many x86 specific optimisation and takes too. much man power to port. So yes in that sense it will be a long tail of x86 instances. ( Assuming that is what you are referring to long tail )
The features listed here match the core: https://developer.arm.com/ip-products/processors/neoverse/ne...
The N2 misses the bfloat, but it could be that the ARM marketing named it differently: https://developer.arm.com/ip-products/processors/neoverse/ne...
Page 50 of 92 shows off BFCVTN, BFDOT, BFMMLA (matrix multiply and accumulate), BFCVT, and other BF16 instructions on the N2.
I'd assume this Graviton 3 is a N2 core. But that's just me assuming.
Yes N2 is more likely than V1. N2 has the better PPA ratio. Own CPU core is very unlikely as I am not aware of any rumors which we would have notice before.
N2 also supports ARMv9 which is nice.