On x86/x86_64 the major advantage 64-bit mode brings to the table is support for more than 4 GiB of RAM. You get more registers and (IIRC) better support for position independent code, OTOH, pointers get bigger which may cause worse cache utilization. My personal experience after running both a 32-bit and a 64-bit system on the same machine was that performance was pretty much the same; either for me, performance was limited by other factors (cough I/O cough), or the advantages and disadvantages of 64-bit mode canceled each other out (or the difference was too low for me to notice).
On a device like the Raspberry Pi 3 (I happen to have one at hand), where the amount of RAM is a) fixed and b) less than 4 GiB, what advantages might a 64-bit OS offer?
EDIT: Thanks for all the answers!
1. For DSP / image processing, 64-bit SIMD allows you to operate on twice as many elements as 32-bit SIMD
2. For normal computing, you get 31 64-bit registers instead of 15 32-bit registers (essentially 4x register memory)
3. 128-bit floating point
I think this is very much wrong. You do get more SIMD registers, but both 32-bit and 64-bit ARM have 128 bits wide SIMD registers. Same number of elements per instruction.
> 2. For normal computing, you get 31 64-bit registers instead of 15 32-bit registers (essentially 4x register memory)
Almost right. 14 vs 31. On 32-bit ARM, R13 is stack pointer, and R15 is PC, program counter. 64-bit ARM doesn't have PC mapped to register file anymore.
> 3. 128-bit floating point
As far as I know, AArch64 does not have 128-bit floating point. Nor would it really be useful except in very rare circumstances.
* Vastly larger address space -> better ASLR
Now that we are at it, I'm wondering if there is something like x32, but for Aarch64? AFAIR Aarch64 is nothing like amd64 is for x86, but it could still be just 32 bit pointers, but all the registers.
I wish I still had the performance data for that, so this would be more than an anecdotal comment.
Also while I can’t speak for low end isa implementations the high end ones are all substantially faster than the old 33bit archs. Both reference microarchs from arm itself and the custom designed ones.
32-bit ARM (like Cortex A7, A15, etc.) already supported 1 TB addressable physical RAM.
So you wouldn't necessarily see all the benefits in a gate constrained in-order core like in the RPi.
This is a 64-bit UEFI firmware for RPi3 that uses ATF for PSCI, and has USB, HDMI and SD card support. It has been successfully booting FreeBSD, SUSE Leap 42.3 and Ubuntu 18.04.
https://www.openbsd.org/arm64.html
Of course I understand why that is. I wouldn't be surprised at all if this port was result of less man-hours than it's peers, but the man-hours put into NetBSD are simply spread pretty thin.