Setting aside I've tried both, we'll bore each other to death if we just assert one is better:
From first principles, Phi 2 is extremely unlikely to be better, it's a base model and doesn't know how to chat. (see README on HF repo and also "Responses by phi-2 are off, it's depressed and insults me for no reason whatsover?", https://huggingface.co/microsoft/phi-2/discussions/61)
re: Benchmarks, see https://huggingface.co/stabilityai/stablelm-zephyr-3b. Phi-2 wins on some, StableLM on others. For some reason the HF and Lmsys leaderboards don't show it, and I don't know why.
Phi-2's license just changed and you still need to finetune it yourself. $20/month is more than reasonable for commercial use IMHO, it's a game changer.
Until I can use a truly* chat finetuned Phi-2, StableLM remains a clear winner in my experience. It can do RAG, the only other small model I've seen do that is Mistral 7B, and Phi-2 acts like PaLM acted when I would play around with it internally at Google, when it was just a base model. Impossible to use but fun toy.
* there's a couple other there, but they don't seem to have enough fine-tuning...yet