1. I would start in the numerai tournament, I did this for 3 years after the first two years of me by myself on the market. It's useful because they provide ml ready data, and you can iterate very quickly. If you do not have ML experience numerai will teach you about many different types of overfitting and the many correct and incorrect ways to deal with them. An example would be some ML people always apply dropout, but when you have a small signal to begin with, dropout can dropout the signal, and then there is only noise left for the model to fit and of course it will then perform poorly. The other thing it will help with is the hopelessness that you will encounter from hitting a wall(hitting a wall is common in ML, and should be expected), the scoreboard shows individuals who have broken through that wall so you can know it is possible. I stopped participating after stabilizing in the top 20 because they change the format of the tournament every so often and I wanted my Saturdays back. You don't need to reach top 20, I hit a wall around rank 100 back when they used actual bitcoin to pay people. You just need to do well in one of the rounds where everyone else fails so you can go through the process of 'what did I do that I'm not aware of that made me succeed where everyone else failed'
2. Read Advances in Financial Machine Learning by Marcos Lopez de Prado. This goes over the false assumptions that outsiders make, and then outlines rookie mistakes(I made many of the mistakes described in the book, then read this book when it came out). It also will break you out of the thinking that leads to typical approaches and why it is unrealistic to expect them to work.
3. Become familiar with retail trader mistakes like overtrading, improper sizing, and emotions as well as the fact that you cannot rely upon regulating bodies to prevent fraud from occurring, they only act after it has occurred.(This is for scenarios where your models says short this stock, then you see that the stock is fraudulent but it continues to exist.) Learning blackjack probabilities + sizing helps with developing a strategy. Things like, do you want a trading system that has 60% accuracy and 10% profit each time, or one that has 45% accuracy but 200% profit each time. It's interesting because even if you have a 50% accurate 200% profit/50% loss strategy, you still need to calculate the probability of what number of losses you will see in a row that will still bankrupt you if you have the wrong size. In college for me this was covered under the Discrete Math Class.
After steps 1+2+3 I think people who have some level of control over their emotions have the right foundation to code a system. There are people that should not trade because they don't have the right personality profile.
4. Find a way to fit the data you encounter into a DB. Early on I had to pay 100 a month to get daily csvs for stock data. I wrote code that answered questions for me from the csvs. This was wasted time, because you can write SQL to answer so many questions. Keep this DB on a separate computer from the one you do ML dev on. Because the computer that ML dev happens on inevitably gets wiped(it will happen to you).
Then for you its a matter of just leveraging python+pandas etc to code a solution that meets your criteria. There are three categories that you have to operate across, infrastructure+forecasting+trading strategy. When you see one of your models predictions become true it really is a different feeling. But to ease my conscience I should warn you, if you are the curious type and you try this once, you will always be curious about it.
For timeseries data im currently using iextrading even though it has downsides(they only have data for trades that route through their exchange). I used to use kibot, alphavantage,scrape yahoo, download stock data csvs from ebay,and save etrade realtime quotes. For placing trades I am currently using alpaca.(I've used IB,etrade,and robinhoods private api before they blocked it).