undefined | Better HN

0 pointsTwo_hands1y ago0 comments

> because progress is made in small steps,

This seems easily forgotten by a large number of people. I try to remind myself to step back from the hype and explore the lesser travelled paths.

> I'll give some examples that are easier to read[0-2]

I need to reach ResNet strikes back, it was one the first networks I implemented and it is cool to see it still being worked on.

I'll check out [3]. I've wondered recently how you could get a GAN to generate things out of distribution but that still look like the training data, if that even makes sense.

> the StyleGAN code is not the easiest to read lol

Yup, even the official PGGAN code was quite hard to understand. I'll try out the PyTorch compile I've heard a lot about it recently. I had thought TensorRT was for LLMs I suppose it's applicable in other areas too?

> so recognize this as a hyper-parameter

Okay that makes sense. I'll reread this after exploring Diffusion models too in the future.

> carefully study Goodfellow's original paper

This is something I have not done, my current workflow is just to understand how best to implement what is written. I think deep exploration is the next step, no matter how many "I know nothings" I will experience. This side of GANs I had not considered (the theoretical, it looked interesting but very complex).

> I hope this can help provide direction

It certainly will, I imagine I'll come back to this comment many times. Thanks for taking the time to read my posts and provide so much material for further study.

> if unfortunately hard to gain

I agree it is rewarding and I hope I can purvey some of this knowledge in my blog for others too! That was why I started it, so much knowledge is locked away and hard to access or understand without some guidance.

0 comments

1 comments · 1 top-level

godelski1y ago

  > This seems easily forgotten

I think an undervalued exercise is learning the history of a field. The value helps with this but also helps in teaching you how to tackle problems. Because you need to understand the motivations and things they had available at the time.

  > ResNet strikes back

Don't forget ConvNext!

  > how you could get a GAN to generate things out of distribution

OOD is a fuzzy term, used fast and loose. You're not really generating anything out of distribution. And remember that generative models are often improperly tested for generalization. Even most models are. If you tune your parameters on a hold out set, well then it isn't a hold out set, it is a validation set. You've provided additional information to the model: information leakage. There are also major limitations to all the metrics. You'll find the exercise with FID fairly enlightening. One major assumption is the belief that the normalization layer results in a normal distribution. Do you take this at face value? I also suggest looking into CleanFID. You'll find some surprising results if you dig deeper. Never fool yourself into thinking that metrics are objective, they are models. You can never directly measure the thing you intend to. Sometimes this proxying doesn't matter, sometimes it does. In either case, we shouldn't forget. To make this clearer, when you measure with a ruler you don't measure the length of an object in meters, you measure the length of the object in relation to your ruler. Go get a few and see how exact they are. Or go to your physics department, find the experimentalists and trade them a beer for rants on meta physics (Ian Hacking can be a good place to start).

  > the official PGGAN code

It is a fork of StyleGAN

  > TensorRT

It is general. In fact, you don't even require machine learning code. Though that is what it is targeted at. And I want to point this out, because it is an easy trap to fall into. One I fell into when starting and one many never escape from. Stop thinking about models and architectures as applications. See the forest, but don't forget the trees, the shrubs, moss, mushrooms, and all the other things in the forest. Look closely at the LLM and don't just find the differences between other architectures, but also find the similarities. It's kinda like people: easier to see the differences between us, especially because we're so similar.

  > the theoretical, it looked interesting but very complex

I tell my students: you don't need math to make good models, but you do need math to know why your models are wrong. If you don't know what's referenced in the second part, seek out a mentor who will tell you. The barrier to entry is low but don't forget the fundamentals. It's like with any programming. Your success can cause you to stop progressing because "why do I need more when this works?" It's hard work, but highly fruitful.

Keep it up. It's easy to get discouraged, but don't let that stop you. You're not as far behind as you might think.

j / k navigate · click thread line to collapse