undefined | Better HN

0 pointszamnos3y ago0 comments

Maybe. Certainly in the past, before the world was aware LLMs on the level of ChatGPT were possible with today's technology. OpenAI's chosen not to release any real details about GPT-4, so we don't actually know what it would take to train a model of equivalent quality, especially considering training isn't a one-shot. Multiple training runs easily add up training costs. So training for a 12-figure parameter size model(s) (175B) is assumed to be very expensive. But there has been great progress made for optimized models which are smaller by a two orders of magnitude - 7B for a debatable drop in quality (7B alpaca is in no-way competitive with ChatGPT, but it's still very much not a markov chain from during the AI winter). So one possibility is that OpenAI chose not to release salient GPT-4 details is due to it being much smaller than GPT-3's 175B model size and they're hiding the details because of how much that cuts down on training costs. (Which I should note is unsubstantiated conjecture but not outside the realm of possibility.)

The other aspect is that fine-tuning an existing model is way cheaper than creating a competing model from scratch, so a company could offer CompetitorGPT/CompetitorCoPilot competitive with GPT-3.5, and offer fine-tuning of that model trained on the source code repository of the purchaser company's codebase, possibly on-prem or at least inside their AWS VPC/Azure/GCP equivalent.

The other thing to note is that OpenAI is hosting ChatGPT as a public resource available to anyone with an account, akin to Google being open to the public from day one (although that is without an account. Maybe Gmail is a better comparison). I can't say for certain, only OpenAI would know for sure, but I'm willing to bet that inference for ChatGPT is the vast majority of their costs (which is all but trivial). Any private internal-only instance of OpenChatGPT (using the unlicensed leaked LLaMA model or a legal copy or someone else's) could be paying (relatively) minuscule training costs, and way lower inference costs if it's internal-use only. Whether that cost can be borne by a small SaaS company's existing AWS budget is up in the air, which is to say ultimately that you're right - ChatGPT would be difficult without the support of Microsoft via a huge Azure grant, it's less obvious that a self hosted internal-only OpenChatGPT, not from OpenAI, would be possible by hobbyist self-hosters with a prosumer GPU cluster (Say with last generation K80's instead of business-priced A100's), or by a company wanting to leverage LLMs for private use by that company that wants to provide a Copilot like productivity multiplier internal tool to their developers, without sending private source code to OpenAI in lieu of a privacy agreement with them.

0 comments

4 comments · 2 top-level

sfriedr3y ago· 1 in thread

> OpenAI's chosen not to release any real details about GPT-4

Actually, they have release some details about it, in this 99-page technical report https://arxiv.org/abs/2303.08774 (which is actually two papers stitches together, once you read it; oddly enough using different fonts).

But I'm not sure if this content qualifies as "real details".

kmeisthax3y ago

The intro to that paper specifically says:

> Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar. We are committed to independent auditing of our technologies, and shared some initial steps and ideas in this area in the system card accompanying this release. We plan to make further technical details available to additional third parties who can advise us on how to weigh the competitive and safety considerations above against the scientific value of further transparency.

In other words, "Stable Diffusion wasn't supposed to happen, so we're making all our methodology trade secret[0], if you want to Do Science then agree to this massive NDA and have enough skin in the game for us to cut you."

[0] Presumably at some point OpenAI will have to 'relent' to independent discovery by patenting AI architectures and refusing to license them

generalizations3y ago· 1 in thread

I've been using the Chat GPT-4 model, and openAI has been putting warnings about max queries per N hours. Given the degree to which they're limiting access (up until they crashed today, they'd dropped to 25 queries / 3 hours), I suspect GPT-4 is actually much, much larger, and they just don't have the computational resources to support its use at the same level as GPT-3.5 or GPT-3.5 Turbo.

zamnosOP3y ago

You could be right! I don't claim access to any private OpenAI information so any theories by me are based on what's known publicly, which isn't much for GPT-4. I do want to call attention to the difference between training runs and inference runs (post-training usage of the model). If each training run costs mid six-figures, CompetitorGPT is going to have to be well-funded and likely sponsored by AWS/GCP (eg Deepmind) just to train up the model, given that it's probably not a one-shot. If it's much lower due to optimizations in training, on top of only having to fine-tune the model on a company's codebase instead of training the whole model from scratch each time, then I could see a company selling the service of creating CompetitorGPT or CompetitorCoPilot seems like it could be a very worthwhile investment, by companies that are willing to invest in such services for their developers. (Eg companies that are willing to pay Splunk's exorbitant costs vs one that would rather burn time self-hosting a graphana setup. Not to impugn graphana, but it's very much a home-grown, open source self-hosted deployment. Managing a Splunk cluster is also far from free, it's just that not all companies are willing to bear the yearly licensing cost for it and would prefer to self-host graphana solely for cost reasons even if TCO including the opportunity cost makes it more expensive in the long run.)

j / k navigate · click thread line to collapse