If you "have enough compute" available -- which OpenAI definitely does -- the best current technique is to use mixed precision with post-quantisation fine tuning to restore performance. That's most probably how all of the "turbo" models work. Take a model that was initially 16 or 32 bits per parameter during training, quantise it down to a mixture of 4, 8, and 16 bits, and then fix it up with an additional training pass that uses the original full-fat model's predictions as the loss function. With access to the raw parameters, it's possible to do this training such that all of the output weights are considered and adjusted during this phase instead of just the top word. Third parties fine-tuning against GPT4 chats can't do this, even with the collected samples, because they only have individual selected tokens/words instead of the full probability distribution.