The conclusion, that it was not the fault of the developer was correct, but assuming anything other than a problem at some point in the software stack is unreasonable.
What has existed before is the Apple Neural Engine (ANE) which is very different from the newer Neural Accelerator support within the GPU blocks. In fact MLX does not even support ANE yet since at least in previous versions it was hardware-limited to computing FP16 and INT8 MADDs, and not even that fast.
Still, sad state of affairs that it seems like Apple is still fixing bugs based on what blog posts gets the most attention on the internet, but I guess once they started that approach, it's hard to stop and go back to figuring out priorities on their own.
I almost guarantee there is no way they can read this blogpost, escalate it internally, get the appropriate approval to the work item, actually work on the fix, get it through QA and get it live in production in 3 days. That would only happen on really critical issues, and this is definitely not critical enough for that.
I've seen a blog-post, authored a bug in Radar, assigned it to myself, and fixed it the same day. Whether it goes out in the next release is more a decision for the bug-review-board, but since the engineering manager (that would have been me) sits on that too, it's just a matter of timing and seeing if I can argue the case.
To be fair, the closer we are to a release, the less likely a change is to be accepted unless you can really sweet-talk the rest of the BRB, and there's usually a week of baking before the actual release goes out, but that has sometimes been shrunk for developer-preview releases...
If not, talk about coincident that someone reported an issue and all of that you mentioned was already done before that happened, and the only thing missing was merging the code to the repository which was done after the issue was reported. Not unheard of, but feels less unlikely than "Engineer decided to fix it".
I don't think that fix is specific to this, but it's absolutely true that MLX is trying to lever every advantage it can find on specific hardware, so it's possible it made a bad choice on a particular device.
But phenomenon is another thing. Apple's numerical APIs are producing inconsistent results on a minority of devices. This is something worth Apple's attention.
My mind instantly answered that with "bright", which is what you get when you combine the sun and moon radicals to make 明(https://en.wiktionary.org/wiki/%E6%98%8E)
Anyway, that question is not without reasonable answers. "Full Moon" might make sense too. No obvious deterministic answer, though, naturally.
Edit: Spoiler -
It's 'Eclipse'
Eclipse, obviously.
https://neal.fun/infinite-craft/
For the record, Sun+Moon is indeed eclipse.
i just looked up mass of sun vs mass of moon (they differ by 10^30 vs 10^20), and the elemental composition of the sun: the moon would entirely disappear into the insignificant digits of trace elements which are in the range of .01 % of the sun. I could be off by orders of magnitude all over the place and it would still disappear.
Still think it was a good response :)
I'll just add that if you think this advice applies to you, it's the - https://en.wikipedia.org/wiki/Barnum_effect
"Monsoon," says ChatGPT.
It’s a reasonable Tarot question.
But it's still surprising that that LLM doesn't work on iPhone 16 at all. After all LLMs are known for their tolerance to quantization.
But, what got me about this is that:
* every other Apple device delivered the same results
* Apple's own LLM silently failed on this device
to me that behavior suggests an unexpected failure rather than a fundamental issue; it seems Bad (TM) that Apple would ship devices where their own LLM didn't work.
It is commutative (except for NaN). It isn't associative though.
There's a C++26 paper about compile time math optimizations with a good overview and discussion about some of these issues [P1383]. The paper explicitly states:
1. It is acceptable for evaluation of mathematical functions to differ between translation time and runtime.
2. It is acceptable for constant evaluation of mathematical functions to differ between platforms.
So C++ has very much accepted the fact that floating point functions should not be presumed to give identical results in all circumstances.
Now, it is of course possible to ensure that floating point-related functions give identical results on all your target machines, but it's usually not worth the hassle.
[P1383]: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p13...
Intel Compiler for e.g. uses less than IEEE764 precision for floating point ops by default, for example.
[0] https://www.npr.org/sections/memmos/2016/06/09/605796769/che...
I'm wondering if we couldn't re-think "bit" to the computer science usage instead of the thing that goes in the horse's mouth, and what it would mean for an AI agent to "champ at the bit"?
What new sayings will we want?
a * b = b * a for all "normal" floating point numbers.
An encrypted iTunes backup of a device was a perfect image. Take the backup, pull the SIM card, restore the backup to a new phone with the sim card installed, and it was like nothing had happened.
No reauthentication. No missing notifications. No lost data. Ever.
It was nice.
Isn’t this built in when transferring devices? Are backups different?
"Well, now it's Feb. 1st and I have an iPhone 17 Pro Max to test with and... everything works as expected. So it's pretty safe to say that THAT specific instance of iPhone 16 Pro Max was hardware-defective."
[1] as the author knows (“MLX uses Metal to compile tensor operations for this accelerator. Somewhere in that stack, the computations are going very wrong”) there’s lots of soft- and firmware in-between the code being run and the hardware of the neural engine. The issue might well be somewhere in those.
The best way to do math on my phone I know of is the HP Prime emulator.
https://pcalc.com/mac/thirty.html
My other favorite calculator is free42, or its larger display version plus42
https://thomasokken.com/plus42/
For a CAS tool on a pocket mobile device, I haven't found anything better than MathStudio (formerly SpaceTime):
You can run that in your web browser, but they maintain a mobile app version. It's like a self-hosted Wolfram Alpha.
They do have some new AI math app that's regularly updated
Honestly, the main beef I have with Calculator.app is that on a screen this big, I ought to be able to see several previous calculations and scroll up if needed. I don't want an exact replica of a 1990s 4-function calculator like the default is (ok, it has more digits and the ability to paste, but besides that, adds almost nothing).
Also it does some level of symbolic evaluation: sin^-1(cos^-1(tan^-1(tan(cos(sin(9))))))== 9, which is a better result than many standalone calculators.
Also it has a library of built in unit conversations, including live updating currency conversions. You won’t see that on a TI-89!
And I just discovered it actually has a built in 2D/3D graphing ability. Now the question is it allows parametric graphing like the MacOS one…
All that said, obviously the TI-8X family hold a special place in my heart as TI-BASIC was my first language. I just don’t see a reason to use one any more day to day.
I'd like multitasking too with multiple apps visible at once so I could copy figures easily from one app to another, like the Android I tried in 2020, but obviously that's asking too much of Apple.
I use the NumWorks emulator app whenever I need something more advanced. It's pretty good https://www.numworks.com/simulator/
What I want is something like a repl. I want to be able to return to an earlier expression, modify it, assign it to a variable, use that variable in another expression, modify the variable and rerun and so on.
It is astonishing how often ANE is smeared on here, largely by people who seem to have literally zero idea what they're talking about. It's often pushed by either/or people who bizarrely need to wave a flag.
MLX doesn't use ANE for the single and only reason that Apple hid the ANE behind CoreML, exposing zero public APIs to utilize ANE directly, and MLX -- being basically an experimental grounds -- wanted to hand roll their implementation around the GPU / CPU. They literally, directly state this as the reason. People inventing technical reasons for why MLX doesn't use ANE are basically just manufacturing a fan fiction. This isn't to say that ANE would be suitable for a lot of MLX tasks, and it is a highly optimized, power-efficient inference hardware that doesn't work for a lot of purposes, but its exclusion is not due to technically unsuitability.
Further, the ANE on both my Mac and my iPhone is constantly attenuating and improving my experience. Little stuff like extracting contents from images. Ever browse in Safari and notice that you can highlight text in the image almost instantly after loading a page? Every image, context and features detected effortlessly. Zero fans cycling up. Power usage at a trickle. It just works. It's the same way that when I take a photo I can search "Maine Coon" and get pictures of my cats, ANE used for subject and feature extraction. Computational photography massively leverages the ANE.
At a trickle of power.
Scam? Yeah, I like my battery lasting for more than a couple of minutes.
Apple intended ANE to bring their own NN augmentations to the OS and thus the user experience, and even the availability in CoreML as a runtime engine is more limited than what Apple's own software can do. Apple basically limits the runtime usage to ensure that no third party apps inhibit or restrict Apple's own use of this hardware.
Typing on my iPhone in the last few months (~6 months?) has been absolutely atrocious. I've tried disabling/enabling every combination of keyboard setting I can thinkj of, but the predictive text just randomly breaks or it just gives up and stops correcting anything at all.
https://news.ycombinator.com/item?id=46232528 ("iPhone Typos? It's Not Just You - The iOS Keyboard is Broken")
At least the machine didn't say it was seven!
Did you file a radar? (silently laughing while writing this, but maybe there's someone left at Apple who reads those)
> - MiniMax can't fit on an iPhone.
They asked MiniMax on their computer to make an iPhone app that didn't work.
It didn't work using the Apple Intelligence API. So then:
* They asked Minimax to use MLX instead. It didn't work.
* They Googled and found a thread where Apple Intelligence also didn't work for other people, but only sometimes.
* They HAND WROTE the MLX code. It didn't work. They isolated the step where the results diverged.
> Better to dig in a bit more.
The author already did 100% of the digging and then some.
Look, I am usually an AI rage-enthusiast. But in this case the author did every single bit of homework I would expect and more, and still found a bug. They rewrote the test harness code without an LLM. I don't find the results surprising insofar as that I wouldn't expect MAC to converge across platforms, but the fact that Apple's own LLM doesn't work on their hardware and their own is an order of magnitude off is a reasonable bug report, in my book.
Fascinating the claim is Apple Intelligence doesn't work altogether. Quite a scandal.
EDIT: If you wouldn't mind, could you edit out "AI rage enthusiast" you edited in? I understand it was in good humor, as you describe yourself that way as well. However, I don't want to eat downvotes on an empty comment that I immediately edited when you explained it wasn't minimax! People will assume I said something naughty :) I'm not sure it was possible to read rage into my comment.
No, the claim is their particular device has a hardware defect that causes MLX not to work (which includes Apple Intelligence).
> EDIT: If you wouldn't mind, could you edit out "AI rage enthusiast" you edited in? I understand it was in good humor, as you describe yourself that way as well. However, I don't want to eat downvotes on an empty comment that I immediately edited when you explained! People will assume I said something naughty :) I'm not sure it was possible to read rage into my comment.
Your comment originally read:
> This is blinkered.
> - MiniMax can't fit on an iPhone.
> - There's no reason to expect models to share OOMs for output.
> - It is likely this is a graceful failure mode for the model being far too large.
> No fan of Apple's NIH syndrome, or it manifested as MLX.
> I'm also no fan of "I told the robot [vibecoded] to hammer a banana into an apple. [do something impossible]. The result is inedible. Let me post to HN with the title 'My thousand dollars of fruits can't be food' [the result I have has ~nothing to do with the fruits]"
> Better to dig in a bit more.
Rather than erase it, and invite exactly the kind of misreading you don't want, you can leave it... honestly, transparently... with your admission in the replies below. And it won't be downvoted as much as when you're trying to manipulate / make requests of others to try to minimize your downvotes. Weird... voting... manipulating... stuff, like that, tends to be frowned upon on HN.
You have more HN karma than I do, even, so why care so much about downvotes...
If you really want to disown something you consider a terrible mistake, you can email the HN mods to ask for the comment to be dissociated from your account. Then future downvotes won't affect your karma. I did this once.
nothing to see here.