It's not supposed to be just for servers, the Arm Base Boot Requirements (BBR) require it for any non-embedded use https://developer.arm.com/documentation/den0044/latest
The new ARM64 laptops follow BBR so they have UEFI+ACPI, which means they can run standard Windows or Linux distributions from Red Hat / Canonical. By using device tree instead, it means other operating systems won't be able to run out-of-the-box without some modification. I can totally understand and respect the decision not to touch ACPI, I just think it might not be a great long term decision.
In any case I've done plenty of hairy "embedded" development stuff (kernels, uboot, weird hardware drivers, etc) but I draw the line at device trees. I may have some kind of PTSD or something but I just refuse to deal with them in any way. Of course passing a path to a compiled binary device tree is fine but I refuse to do anything else.
I've met plenty of people that have no problem with them, make a living working with them, etc but other than the niche that is device tree expertise (and income) they may be experiencing some kind of Stockholm Syndrome.
I truly hate them and while they'll likely always have a place in the (hopefully) deep deep deep "embedded" world the sooner UEFI + ACPI generally takes over in the ARM ecosystem the better.
Like when you need to write support for a new board or if you need to patch your ACPI tables, because of bugs in there: https://www.kernel.org/doc/html/latest/admin-guide/acpi/init...
`PTSD` sound more like an ACPI table name ;)
The latter might happen at some point; m1n1 is already going to become a minimal hypervisor for experimentation, and that could easily grow a GIC-to-AIC mapper and an HVC-based PSCI implementation some day. At that point, sure, throw some ACPI tables on top and Windows will boot, with decently near-native performance (assuming someone writes all the actual peripheral drivers, of course). Just not really natively. Incidentally, we're making all of our bespoke driver code dual-licensed (MIT/GPL), so people are free to take it and port it to BSD, Windows, or what have you.
Of course, Windows already works in QEMU today under M1 macOS (with emulation of the rest of the hardware as variants that Windows supports), and it should just work on Linux/KVM modulo a silly patch that's still missing related to a smaller than typical physical address space on M1. So full-fat VMs are not a problem, but obviously that has nothing to do with the bare metal boot chain.
As a community project, we can't exactly throw stuff at the UEFI Forum to get them to go down that road and specify representations for all this Apple-specific hardware in ACPI. However, Apple is a member of the Forum, so perhaps they should be the ones to do this :-)
Mainstream Linux distributions should work fine once support trickles upstream, as we will provide UEFI via U-Boot, and any reasonably generic ARM64 kernel should support devicetree and our devices (if they don't, someone should file a bug with RH/Canonical, as it would be completely silly if they don't turn on those config options in their kernels). OpenBSD also supports DT, and is already booting on m1n1.
Edit: I've just been reminded that RH/CentOS deliberately build their kernels without devicetree support, to force vendors to implement ACPI. This is, unfortunately, also mooted by the fact that they build their kernels configured for 64K page sizes, which the M1 does not support. Those kernels will never work on M1 machines, not even in a VM; effectively they target a stricter platform subset / standard than general ARMv8-A. Presumably they do this for performance on large systems, at the cost of less efficient memory usage. Maybe if Linux ever gains multi-page-size support on ARM64...
Incidentally, there is one other platform in the same boat, that deviates from the ACPI standards (no GIC): the Raspberry Pi 2 and 3. Microsoft are using proprietary ACPI extensions and patches to core Windows code to make that work. So we'd need at least another nonstandard ACPI extension and explicit Windows kernel support added by MS to make that happen on M1.
Of course the unfortunate reality is that there is just no motivation for them to do this. Goddamn vertical integration! But.. maybe Boot Camp could motivate this? Though so far it seems they're just pushing virtualization instead.
> However, Apple is a member of the Forum, so perhaps they should be the ones to do this :-)
Agreed.
My guess is that they do that to enable a larger amount of memory (very relevant on servers, which are RedHat's main target market). AFAIK, both the LPA (large physical address) and LVA (large virtual address) extensions, which allow using 52 bits instead of 48 bits for the memory address, require using the 64K page size.
Imagine my face when I read this line: "m1n1 traces its past to mini, which is a minimal environment that I wrote for the Nintendo Wii’s security CPU".
As if there was any doubt: Hardware reverse engineers are a very special breed of dedicated people. There aren't many of them, but those that are around tend to stay around!
The other guy in the talk [1] is the late Ben 'bushing' Byer. I only knew him from his online work, but it's still very sad that he passed away in 2016 [2]. I'm sure if he was still alive, he'd also continue to be doing great reverse engineering work!
Here's a fun one though: how I found and documented the Apple-proprietary memory compression/uncompression instructions.
https://twitter.com/marcan42/status/1362450439845781505
A lot of the hardware research ends up looking like this; twiddle random bits and see what happens. For more complex drivers (e.g. the GPU), my plan is to run macOS under a thin hypervisor built on m1n1 that can log hardware accesses.
The fact that people like you exist in this world really makes me happy.
I signed up as a GitHub sponsor based on this post. Thanks for the great work and write up marcan!
This thing actually goes back further than the name "mini" and BootMii; in the beginning it was a never-released thing called "ios_stub" and actually served the very same purpose on the Wii, to experiment with the hardware over a USB Gecko (which was a USB serial-like interface that plugged into the memory card slot) using the very same Python approach. That code hasn't changed much in checks Git history 13 years... though it obviously flipped endianness on its way from the Wii to other ARM32 platform experiments, before making it to ARM64 and Apple Silicon [1].
The Python side is still mostly the same too, other than getting ported from Python 2 to Python 3 and growing a bunch of utility functions. The Python-side malloc implementation actually got written when I was using all this to experiment with a Chinese MP4 player. That version ended up being called "minimp" [2]. Another thing that happened on the way to the M1 was deleting the "P_RENDER_BUNNY" command [3].
So yeah, what I'm doing now on the M1 is literally, down to the code, shell.py, and proxy command names, the same damn thing I was doing 13 years ago on the Wii. AES engine [4] back then, IRQ controller [5] now... though evidently I put a bit more effort into the MediaWiki register documentation templates back then; the GitHub wiki is a bit more limited... :-)
[1] https://mrcn.st/t/ios_stub_vs_m1n1.png
[2] https://marcan.st/2009/06/sunplus-spmp305x-media-player-hack...
[3] https://www.youtube.com/watch?v=3tg7KSSUl8Q
However, even once you have that working, you still need to connect the other pins to a serial port adapter, and I don't know if any other laptops implement this in a compatible way, so you would probably still need some kind of cut-up cable that breaks out the serial wires.
> I recommend Will Deacon’s talks, such as this one and this one.
They're both the same link! Is the talk that amazing, or is there a second one? I've been looking into ARM lately, and I'll take anything that'll help me understand memory ordering/synchronization.
Here's the one for the blog: https://asahilinux.org/blog/index.xml
Might be nice if we have an actual link to the RSS somewhere in the blog though.
And do you think that all those HIDn registers are more PASemi legacy? The PowerPCs are known for HID SPRs being a grab bag of random one off features and chicken bits in a very similar way.
I do think the HID register naming scheme comes straight from PowerPC. So does their "DART" IOMMU (again just a name thing, unrelated to the old PowerPC DART IOMMU). A much more interesting question is how much the M1 design directly derives from those older PASemi PPC cores (beyond names and such which could just be a little nod); that's much harder to know, but I'm interested in any hints that might point in that direction :)
Anyways, is there anything that a systems engineer with RE experience and isn't looking for any of y'all's donations can do to help out? I'm between jobs and it seems like fun.
Linux is a monolithic kernel.
One drive of course, is to have a micro kernel that will allow you to update your operating system as needed, without the need to reboot.
Can this still be accomplished with a micro kernel system, or is this now an obsolete method for operating systems design?
> Our current Linux bring-up series is in its third version
There's a #fragment in the link, and it goes to a less interesting part of the page. I only circled back after looking at the first and second versions.
Unfortunately, we've yet to hear any feedback from Corellium (they've been CCed on my upstream submissions), nor have they interacted with the Linux kernel community in any other way, so I have no idea what their plans are.
So far, after working with upstream on solving the core problems I mentioned in this article, the M1 support series we are submitting supersedes Corellium's patches for FIQ, the nGnRnE issue, SMP, AIC, UART; we settled on different solutions for all of those from how Corellium did it. I²C is also another one that will be superseded most likely (Corellium wrote their own driver instead of improving the PASemi one; it doesn't seem like there are any show-stoppers that would warrant that approach). I don't know what they're going to do moving forward; perhaps they will re-base on top of mainline and drop their conflicting patches, or perhaps they will attempt to maintain their kernel as a Linux fork...
Nice work! let's see the progress...
~60,000+ words later...
> We could keep talking in depth for another 10000 words, but alas, this post is already too long.
Please no. A TL;DR is just enough for the busy. The Dolphin report even shows more screenshots and diagrams at least.
We have already seen how complex the reverse engineering, booting, discovery and bring up process of this M1 chip running on Linux is, which the first step is already a complicated hellhole in itself, because its Apple. For explaining all of this, you need diagrams of the whole process which would be much better than us deciphering all of these CPU internals / peripheral technical soup.
Just put a TL;DR at the top next time or some diagrams for those interested in helping out.
Other than that, great progress.
Device bringup is, as you say, complex. This complex.
What kind of diagrams are you looking for? There's lots of things that could be diagrammed here, but comprehensively explaining every concept involved would turn this into an embedded systems course, diagrams or not. What I tried to do was give a brief introduction to concepts that are relevant to the issues we ran into, and have links for those who want to go deeper. If you have specific suggestions of bits that are hard to grok without diagrams though, please do let me know. It's tricky knowing what is most confusing to other folks when you've been neck-deep in this stuff for weeks.
The alternative to this long-form post is to just have a laundry list of things that work today, but I don't really know how I would get across what the challenges were without going into at least some level of details like I did here. I figure that if I'm going to do that, I might as well make it a more educational endeavour. Of course, if all you want to know is what works and what doesn't, it may not be for you... I'm open to suggestions though!
Keep in mind that a lot of the early work ends up being "how to find the right solution to problems" (and the post goes into more detail about this); the current feature support status of Linux on M1 almost hasn't changed for the past 30 days, because instead I've been re-visiting and re-working the code into a form that is upstreamable, as well as building tools and chipping away at little details. It's a lot of yak shaving, but it's all things that need to happen sooner or later. Unfortunately, it doesn't really tick boxes in a TL;DR bullet list of working hardware.
This is exactly what needs to be done to make this a viable project, and leaving this stuff out is, for me at least, what categorizes Correlium's project as a mere publicity stunt rather than a serious porting effort.