undefined | Better HN

0 pointsjavidlakha2y ago0 comments

How does that work?

0 comments

5 comments · 1 top-level

JackHopkins2y ago· 4 in thread

Currently we distill the general GPT-4 down to function specific GPT3.5 turbo model using pseudo-labelling. The input-output pairs from the aligned few-shot GPT-4 are saved and this dataset is used to finetune a function-specific GPT3.5 model. Then that finetuned GPT3.5 is switched as the primary model used to carry out the function, which results in multiple times lower costs as the need for few-shot examples is removed and lower latency as well. If the finetuned model output does not follow the enforced constraints, we employ GPT-4 to "repair" the output and include that datapoint in the dataset used for future finetuning resulting in continuous improvements.

javidlakhaOP2y ago

How much control do I have over this process? I might not want this to be abstracted.

JackHopkins2y ago

Currently the distillation happens automatically in the background for all functions but we're aiming to implement ways for the user to be able to turn it off if they wish to keep using the teacher models. Good to know that this'd be a wanted feature!

mnky9800n2y ago

Does it ever just use the code that works and no longer makes calls to any LLM?

JackHopkins2y ago

Great question! That is one of the ideas that we have on the roadmap and seems quite exciting to us. The general feasibility of switching the function execution over from a LLM to synthesised code depends on the specific use-case and if a deterministic program can solve the use-case well enough (or atleast as well as the SOTA LLMs can). But for all those cases where this could be done, the cost and latency of executing the program would become essentially 0

j / k navigate · click thread line to collapse

0 comments

5 comments · 1 top-level

JackHopkins2y ago· 4 in thread

Currently we distill the general GPT-4 down to function specific GPT3.5 turbo model using pseudo-labelling. The input-output pairs from the aligned few-shot GPT-4 are saved and this dataset is used to finetune a function-specific GPT3.5 model. Then that finetuned GPT3.5 is switched as the primary model used to carry out the function, which results in multiple times lower costs as the need for few-shot examples is removed and lower latency as well. If the finetuned model output does not follow the enforced constraints, we employ GPT-4 to "repair" the output and include that datapoint in the dataset used for future finetuning resulting in continuous improvements.

javidlakhaOP2y ago

How much control do I have over this process? I might not want this to be abstracted.

JackHopkins2y ago

Currently the distillation happens automatically in the background for all functions but we're aiming to implement ways for the user to be able to turn it off if they wish to keep using the teacher models. Good to know that this'd be a wanted feature!

mnky9800n2y ago

Does it ever just use the code that works and no longer makes calls to any LLM?

JackHopkins2y ago

Great question! That is one of the ideas that we have on the roadmap and seems quite exciting to us. The general feasibility of switching the function execution over from a LLM to synthesised code depends on the specific use-case and if a deterministic program can solve the use-case well enough (or atleast as well as the SOTA LLMs can). But for all those cases where this could be done, the cost and latency of executing the program would become essentially 0

j / k navigate · click thread line to collapse