(famously so, Intel used to ship arm chips with WMMX and Apple for example ships their CPU today with the AMX AI acceleration extension)
(+ there's the can of worms of target-specific MSRs being writable from user-space, Apple does this as part of APRR to flip the JIT region from RW- to R-X and vice-versa without going through a trip to the kernel. That also has the advantage that the state is modifiable per-thread)
Please, stop spreading nonsense. All of this is public knowledge.
The Neural Engine is a completely separate hardware block, and you have good reasons to have such an extension available on the CPU directly, to reduce latency for short-running tasks.
Is it possible AMX is implemented with the implementation-defined system registers and aliases of SYS/SYSL in the encoding space reserved for implementation-defined system instructions? Do you have the encodings for the AMX instructions?
Let me repeat this: part of the ARM architectural license says that you can't modify the ISA. You have to implement a whole subset (the manual says what's mandatory and what's optional), and only that. This is, as I've been saying, public knowledge. This is how it works. And there are very good reasons for this, like avoiding fragmentation and losing control of their own ISA.
And once again, stop spreading misinformation.