The examples they give are for eye and hand tracking -- which not coincidentally are used for navigating the Apple Vision Pro user interface.
As I understand the RLHF method of training LLMs, this involves the creation of an internal "reward model" which is a secondary model that is trained to try to predict the score of an arbitrary generation. This feels very analogous to the "discriminator" half of a GAN, because they both critique the generation created by the other half of the network, and this score is fed back in to train the primary network through positive and negative rewards.
I'm sure it's an oversimplification, but RLHF feels like GANs applied to the newest generation of LLMs -- but I rarely hear people talk about it in these terms.
# shuffle the combined batch to prevent the model from learning order
indices = torch.randperm(combined_images.size(0))
combined_images = combined_images[indices]
combined_labels = combined_labels[indices]
You don’t need to do this