Currently we distill the general GPT-4 down to function specific GPT3.5 turbo model using pseudo-labelling. The input-output pairs from the aligned few-shot GPT-4 are saved and this dataset is used to finetune a function-specific GPT3.5 model. Then that finetuned GPT3.5 is switched as the primary model used to carry out the function, which results in multiple times lower costs as the need for few-shot examples is removed and lower latency as well. If the finetuned model output does not follow the enforced constraints, we employ GPT-4 to "repair" the output and include that datapoint in the dataset used for future finetuning resulting in continuous improvements.