The qwq-2.5-0.5b is the tiniest useful model I've used, and pretty easy to fine-tune locally on a Mac. Haven't tried it on an iPhone, but given it runs at about 150-200 tokens/second on a Mac, I'm kinda doubtful it could do the same on an iPhone. But I guess you'd just have to try