Running them locally is cool and has privacy/autonomy benefits, but you can't really make a value case for it. Guaranteed if you run the math you will never run enough inference to pay off your hardware vs buying tokens. Last time I ran the math on my MBP I'd have to run inference 24 hours a day for 5+ years to pay off the cost of my MBP, not accounting for electricity costs.