I fully don't agree.
> I am pretty sure that GPT-3.5 should be thought of as GPT-4-lite, in the sense that it uses techniques and compute of the GPT-4 era rather than the GPT-3 era
Compute of the "GPT-3 era" vs the "GPT-3.5 era" is identical, this is not a distinguishing factor. The architecture is also roughly identical, both are dense transformers. The only significant difference between 3.5 and 3 is the size of the model and whether it uses RLHF.