You could start collecting it starting today, but you wouldnt have a huge back-test and it wouldnt reflect different regimes. It would regardless start giving you insights.
I'd guess the best ways to learn it are either doing it and learning bits over time, or joining a hedge fund where you get it all as common infra and learn there.
I'm also curious if anyone has ideas here. I havent seen a non-internal historical dataset.