I have always been wondering about where I can find the data from the data breach (i.e. equifax leak, facebook, etc ...). Some of these dataset look like a great source to train ML models. It seems like people in hn or website like haveibeenpawned have accessed to these data.
When developing ML/DL models, I often have to make decisions about architecture (layers, loss function, shape), preprocessing method (tiling vs. resizing, text embeddings algorithm), optimization algorithm (SGD or ADAM) and hyperparameters choice. In many library/research papers, this decision seems to be made at random. Do we have collective sources of things people have tried that work? Where do you discuss best practices and techniques that sometimes make a lot of difference in developing a good model?
I'm debating about what would be a better of use of time between reading different blog posts in HN and other tech blogs/news vs. reading books. Some tech blogs or research paper offers a lot of value and help me come up with new ideas and stay up to date. However, reading a book is also very helpful in understanding a particular topic in details.