Well, "good" has a few dimensions to it:
1. Speed of output (not very fun to wait multiple seconds for each letter to be output)
2. Coherence of output (how far back does the model remember the context of the conversation?)
3. Variety of output (how's the diversity of the model's vocabulary? How about topics it can plausibly discuss?)
You can easily get comparable speed, so nothing of interest to really compare there.
I haven't done particularly strenuous coherence comparisons, but for my uses, at least, megacorp and self-hosted models are pretty comparable. Though you do need the better models to get the best coherence simply because they retain more tokens in memory.
Variety is, in my opinion, where the megacorp models still rule. Most of my dabbling has been with models designed to be writing assistants and they can certainly generate plausible strings of words and follow a general theme, but they barely "know" anything (generally when using them to write fiction, you would provide them a "factbook" that they can work from). ChatGPT by comparison can generate plausible responses to a surprising breadth of technical questions, although it definitely has a feeling of being generated from scraping certain online sources since it's decent at answering devops questions but bad at obscure grammar and physics questions, at least in my experience.