It's certainly enough to run a decent Llama, but hardly the most cost-effective. Apple's approach falls between the low-bandwith Intel/AMD laptops and the high-bandwith PCIe HPC components. In a way it's trapped between two markets - ultra-cheap Android/Windows hardware with 4-8gb of RAM that can still do AI inferencing, and ultra-expensive GPGPU setups that are designed to melt these workloads.
The genial thing to say is that it performs very favorable against other consumer inferencing hardware. The numbers get ugly fast once you start throwing money at the problem, though.