They train an end-to-end model to drive based on 8 camera streams and recorded input from human drivers, training on tens, (if not hundreds now) of millions of 30 second clips from their consumer fleet. That's why they're bought one of the largest GPU clusters and making their own chips and transport protocols.
It's not widely known, but Tesla probably has one of the largest training cluster, because practically all the GPUs they buy go towards training, while most of GPUs for e.g. OpenAI go towards inference. Tesla does inference in the car.