Implementing the Goodfellow GANs paper (opens in new tab)

(ym2132.github.io)

104 pointsTwo_hands2y ago18 comments

18 comments

17 comments · 5 top-level

Are GANs useful for synthetic data generation for transformer based models?

Probably. Apple published a paper back in 2017 about improving synthetic data for the purposes of training models (though not transformers).

The examples they give are for eye and hand tracking -- which not coincidentally are used for navigating the Apple Vision Pro user interface.

https://machinelearning.apple.com/research/gan

Two_handsOP2y ago

It'd be cool to run some tests where you train a model with data and then supplement the training data with generated stuff.

HanClinto2y ago

Yes, the concept is still powerful and in use today.

As I understand the RLHF method of training LLMs, this involves the creation of an internal "reward model" which is a secondary model that is trained to try to predict the score of an arbitrary generation. This feels very analogous to the "discriminator" half of a GAN, because they both critique the generation created by the other half of the network, and this score is fed back in to train the primary network through positive and negative rewards.

I'm sure it's an oversimplification, but RLHF feels like GANs applied to the newest generation of LLMs -- but I rarely hear people talk about it in these terms.

Two_handsOP2y ago

I think diffusion models are useful too, I’m currently working on a project to use them to generate medical type data. It seems they'd both be useful as they are both targeted towards generation of data, especially in areas where data is hard to come by. Doing this blog made me wonder of the application in finance too.

HanClinto2y ago

I agree -- I would love to see diffusion models applied to more types of data. I would love to see more experiments done with text generation using a diffusion model, because it would have an easier time looking at the "whole text" rather than the myopia that can occur from simple next-token prediction.

GaggiX2y ago

Adversarial loss is used in many cases like when training a VAE, and a VAE can use a transformer architecture.

eru2y ago

Compare https://gwern.net/gan

toxik2y ago· 2 in thread

        # shuffle the combined batch to prevent the model from learning order
        indices = torch.randperm(combined_images.size(0))
        combined_images = combined_images[indices]
        combined_labels = combined_labels[indices]

You don’t need to do this

Two_handsOP2y ago

Is it better to train without the shuffling or shuffling has negligible effects?

Doxin2y ago

I'd assume there's no real state the network can "remember" between iterations, so shuffling will at best just waste time.

1 more reply

3abiton2y ago· 1 in thread

This is a blast from the past, I still remember the StyleGAN demos and how cool it was for its time. https://www.youtube.com/watch?v=Ps7bmdxy0Xc

Two_handsOP2y ago

Right, even though the paper is almost 10 years old I still found it fascinating. I hope you enjoyed the post!

HanClinto2y ago· 1 in thread

Great writeup, thank you! Nicely done!

Two_handsOP2y ago

Thank you, I appreciate the kind comments!

nothrowaways2y ago· 1 in thread

Cool

Two_handsOP2y ago

Thank you

j / k navigate · click thread line to collapse

18 comments

17 comments · 5 top-level

countvonbalzac2y ago· 7 in thread

Are GANs useful for synthetic data generation for transformer based models?

rgovostes2y ago

Probably. Apple published a paper back in 2017 about improving synthetic data for the purposes of training models (though not transformers).

The examples they give are for eye and hand tracking -- which not coincidentally are used for navigating the Apple Vision Pro user interface.

https://machinelearning.apple.com/research/gan

Two_handsOP2y ago

It'd be cool to run some tests where you train a model with data and then supplement the training data with generated stuff.

HanClinto2y ago

Yes, the concept is still powerful and in use today.

I'm sure it's an oversimplification, but RLHF feels like GANs applied to the newest generation of LLMs -- but I rarely hear people talk about it in these terms.

Two_handsOP2y ago

HanClinto2y ago

GaggiX2y ago

Adversarial loss is used in many cases like when training a VAE, and a VAE can use a transformer architecture.

eru2y ago

Compare https://gwern.net/gan

toxik2y ago· 2 in thread

        # shuffle the combined batch to prevent the model from learning order
        indices = torch.randperm(combined_images.size(0))
        combined_images = combined_images[indices]
        combined_labels = combined_labels[indices]

You don’t need to do this

Two_handsOP2y ago

Is it better to train without the shuffling or shuffling has negligible effects?

Doxin2y ago

I'd assume there's no real state the network can "remember" between iterations, so shuffling will at best just waste time.

1 more reply

3abiton2y ago· 1 in thread

This is a blast from the past, I still remember the StyleGAN demos and how cool it was for its time. https://www.youtube.com/watch?v=Ps7bmdxy0Xc

Two_handsOP2y ago

Right, even though the paper is almost 10 years old I still found it fascinating. I hope you enjoyed the post!

HanClinto2y ago· 1 in thread

Great writeup, thank you! Nicely done!

Two_handsOP2y ago

Thank you, I appreciate the kind comments!

nothrowaways2y ago· 1 in thread

Cool

Two_handsOP2y ago

Thank you

j / k navigate · click thread line to collapse