So why this over-the-top feature than?
I would assume it's provided because in Xiaomi's primary two markets (India and China), there's a ton of 32-bit apps that their users need and won't upgrade phones (a lot more common in those markets) if they can't keep them. Google Play is also less commonly used in those markets.
The paper https://dl.acm.org/doi/10.1145/3140587.3062371 is also related to the emulator that they are using.
However I suspect that the Xiaomi has to contend with the Chinese app ecosystem which may not be as strictly controlled, so probably has a decent number of legacy 32bit apps floating about.
The industry has been deprecating 32 bit ARM and we have finally reached the point where support is being dropped from the CPUs themselves. This means that if you still want to support 32 bit ARM, then you will need to do it in software.
I'm happy to answer any questions you may have about the technology.
It seems to me like everyone from Qualcomm over Android to manufacturers would want this. Why did you have to build it instead of them? And is Xiaomi the entity licensing it from you? Does this mean that there will likely be implementations of the Snapdragon 8 Gen 3 that don't support 32 bit apps?
Is this just such a niche problem because most phones have been on 64-bit processors for so many years so that vendors expect the compatibility breakage won't be too big?
How did you anticipate this issue and how are you using the situation? i.e. do you already have experience/contacts in the industry?
Phew, that's a lot of questions and I get it if you don't want to answer all of them. Thanks for your time and stopping by!
The technology behind Tango started as a research project while I was doing my PhD. After finishing my dissertation in 2016, I looked into opportunities to commercialize it by contacting every company that was building or planning to build an AArch64-only CPU. There are sufficiently few that it is easily doable even for a small company.
Building a production-ready binary translator is technically challenging and requires a lot of work. The difficult parts are achieving high performance (Tango scores within 10% of native 32-bit execution on benchmarks), low latency (using AOT translation to accelerate startup times) and compatibility (Tango was tested against the top 1000 Android apps and works with all of them).
Already by 2017, Tango was capable of translating AArch32 Android applications. At that point it makes more sense for companies to license our technology rather than developing their own implementation from scratch.
What was the main technical challenge of this project and what was the solution?
One challenge that particularly comes to mind was dealing with anti-emulating/anti-debugging code in various Android applications. These apps would do all sorts of crazy things like attaching to themselves with ptrace, installing bizarre seccomp filters which check for specific 32-bit syscalls and using self-modifying code without proper cache flushing to check for the presence of an instruction cache.
The solution for each of those was to emulate the relevant functionality well enough to trick these apps into thinking they were running natively. Although in the case of self-modifying there was no good solution and we ended up hard-coding some particular instruction sequences in the translator for special handling.
One thing that really made the above possible is that for Tango v2.0 we re-wrote a large part (~half) of the codebase in Rust, which was previously entirely written in C. In particular, the ptrace emulation code needs to maintain a lot of internal state about traced threads. This requires maintaining complex data structures, and the ability to easily use enums, Option, HashMap, etc, is a huge help for this.
It has a 64- to 32-bit translator!
Not the other way around!
AArch32 to AArch64 translator, because it may imply that the apps are suddenly running as 64bit applications
Then again, they want to support small storage sizes, so maybe discouraging the OS from growing too much with redundant 32 and 64 bit versions of base libraries is the real reason?
I suspect the specific case of sdiv may be practically handled already for post-div Armv7a and ARMv8a AARCH32 on dynamically loaded operating systems, either by the dynamic loader or the compiler support library choosing to swap in a different implementation of __aeabi_idiv according to the available capabilities.