My guess is ChatGPT will be obsolete in 2 years.
I’m not sure if the next one will be a lot smarter than ChatGPT, in particular the ‘accuracy/honesty’ problem is not going to be easy to address without some kind of structural change (pair up the neural network with an SMT solver the way AlphaGo pairs game tree search with a network.)
What will change, however, is that the next one will be more resource efficient for training and inference. People still don’t really understand how deep networks work but they are figuring it out and there are many little changes that can be made that will add up to big gains.
Another problem w/ those LLMs is they all have a fixed window size, I think it is 4096 subword tokens for ChatGPT, I have been playing around with RoBERTa 3 for which it is 512 subword tokens. Those models are good on what they are trained to do but there is no really great way to apply them to larger texts that doesn’t break the ‘magic’. I have plenty of documents I want to cluster and classify that are much longer than that.
People will certainly be training models with larger windows but it seems like something that is scalable (more text makes a larger vector) or that has some way of consolidating multiple windows into a larger structure (think of how different it is read a paragraph critically than to read a book critically.)
There will be scalability problems going in that direction but I think that’s where the mountain is.