It should be designed to pick and hold a position for >12 months, use alternative data sources, and be open/hackable.
When I graduated college, I spent 3 months as a programmer with my econ friend trying to build exactly this. I started off creating a system to paper trade stocks retroactively. So you imagine you go back in time and pretend it's January 1st, 1982 then have an algorithm look at the stocks then, then move it a day forward, and let it trade for the past 40 years and see how it does.
We tried linear models, SVMs, neural networks, RNNs, ensembles, genetic algorithms, anything with stock data, news sentiment data, classic quant structures, and everything in-between. Basically, 3 solid months of coding before I started working.
Anyway, I found out a lot of stuff the hard way, because I didn't have an econ degree.
First off, you try enough methods, you end up p hacking or hill climbing the past anyway, and it's no good.
Second off, historical clean data is hard to get. It may or may not have splits in it or other things, so you may inadvertantly supply information from the future when playing back from the past. It's hard to get this right.
Third off, for many of the models we used, they were almost always competitive in the 80s (even a linear regression), but in the oughts or 2010's, they stopped being competitive. We thought computer based trading was becoming more competitive in hedge funds.
Fourth, simple models tended to work better. So for instance we may have trained the model on data from 70s-80s, then starting in the 80s, we did online (continuous) training as we moved the model forward in time. There's just not enough data. You can train on all historical stocks or all stocks or related data streams in the industry up to that point, but I think we probably didn't have enough data and the market is competitive.
Fifth, I wish I read a Random Walk Down Wall Street earlier, or all of Taleb's stuff. These are books that have deep mistrust of quants.
Sixth, I think to be competitive, you need to have money in the game, many heuristics, and industry experience. Big firms have this and equipment, but it's hard to get in as an individual.
Seventh, I put several hundred hours into this project and learned a bunch about machine learning and economics. In every way I loved the experience, and I'd encourage you to try it. Probably I'm a n00b here, but I hope some of my notes can help you.
Another thing as you said is that it's hard to get quality data. For example most databases don't include price history for bankrupt companies (or miss quite a bunch), which makes some quantitative strategies like focusing on low PE and PB for example, completely bogus. Which is sad because most books will actually tell you to do that, without ever talking about how many of those backtests lack companies with -100% return in their virtual portfolios. Those tend to be low PE companies that the market consider risky, and it was right, but because they disappeared, the strategies outperform because they ignore so many losers.
Then, you're going to paper trade it. Then live trade it. Historical data can only give you a directional indicator... is your 'thesis' of market inefficiency... directionally accurate.
I recall one particular overnight reversion strategy in my local market which in the 90s and early 2000s had like a 45 degree straight line equity curve and then overnight in 2011 it just went flat. Someone clearly turned on a model that day and started trading it. It's a small market here so I know who it was but I still find it fascinating how clearly it showed up in the data.
1. Filter all underdog stocks.
2. Have a catalyst detector - e.g. OpenAI announces new model. Link that to NVIDIA, MSFT, etc.
3. Among these stocks, when you see a marubozu break through the previous resistance point, buy the stock. Then sell it near peak. (Need a peak detection algorithm)
The trick is having all three work fine. But it's easier to debug and test when one doesn't work if you break it into parts. You also don't need one really good methodology, you just need a few decent ones.
Or you could find similar patterns that can be broken down into small parts and do those too.
I think thats all you need. ;)
In realiity, trading is hard because it requires 2 points where you have to get the timing right.
First you have buy the position at a time when it is favorable. Then you have to exit the position when you get profit. Sounds easy, but the hard part is that this profit also has to cover any past losses. where you failed timing the entry into the position.
Its easy when the stock goes up 800% and then falls 50% to 400% your original position, that you should have sold when it was at its "peak". But along the way to 800% you had so many times to sell for 500%, 600%, 700%, etc. and along the way the stock had fluctuations with many peaks.
If you sell to early you can't get enough profit to cover past and future losses. If you sell to late same story. So you have to nail the exit position also and that is where most models that rely on past data fail. People just walk through the parameters until the entry and exit positions on their test data line up to make a profit, but then can't replicate when going "live".
Another way to look at investing is everyday you are in the market, it is almost the same logically as selling and choosing to reinvest every day. So if you hold AAPL for 5 years, that is about 1000 days where the algo is choosing to invest (i.e. keep invested) in AAPL. Its pretty tough to have that many decisions points because even 0.1% noise would cause you to sell.
As others noted the datasets are not really standardized even with the SEC Edgar data so there is a lot of massaging you have to do.
A system that does that for you would not really be a trading bot per se, it would just be a general algorithm for "picking stocks". Automating the actual purchasing is probably unnecessary.
If you find one, let us know! Most investors in the world are searching for the such a thing.
I bet we would could reduce US capital market trading to four days a week and still arrive at a similar outcome wrt efficient price discovery and liquidity windows, tangentially [2].
[1] https://en.wikipedia.org/wiki/Parkinson%27s_law
[2] https://www.bloomberg.com/opinion/articles/2021-10-06/why-no... | https://archive.today/Ykapv
Actual long term investors today are looking at an additional two billion people by 2050, increased demand for food and water, and regional destability due to climate change.
Long term investors today are buying land and resource access about the globe, or moving to secure such things via private contractor | mercanary armies.
China has purchased one in four US pigs (the farms, the feed, the processing), the Saudis have locked in access to large quantities of US aquifers, and Eric Prince wants the US to retake Africa: https://theintercept.com/2024/02/10/erik-prince-off-leash-im...
These are all examples of securing access to water and food resources to ensure supply into the long term.
The investment payoff of is having those resources when others don't, being secure in what you need and being able to profit from what you don't in times of extreme demand.
Property is the easy one, but it's useful to look at things like farms along affected areas.
Sure, Siberia will be open praire cattle country; cossacks and reindeer herders will be cowboys mustering steers.
Russia knows this also, hence the hiring of US cowboys to train russians and the recent prevalance of Russian Rodeos in both the US and in Russia.
This all goes to the question of long term investment, projected affected areas are being abandoned by deep investors who have turned to probable new prospects.
Farming real estate questions here in Western Australia have already turned to what land will be prime in 20 years and what used to wet boggy questionable land is already fetching a climbing price in anticipation of when it becomes "sweet".
https://www.spglobal.com/marketintelligence/en/campaigns/met...
Our clients way back when were mostly all billionaires, some personal, some corporate; between client discussions, GIS data linking, trends, etc we were able to get a line on a bunch of penny stocks that were selling then ( ~2003 ) for 2c (AU) and now trade at $20+ each.
None of that came from looking at day trading, it came from deeply looking at global energy and resource demands against known and suspected sources as yet unexploited.
You don't need trading bots for long term investing or even infrequent trading. In LT investing, portfolio tracking and asset allocation/reallocation are the primary tasks. Robo-advisors were very popular almost a decade ago. Most brokerages have integrated such features now. Also, checkout M1 Finance.
I started investing first with the help of spreadsheet then shell scripting and now Jupyter Notebooks and Python. Beyond LT investing portfolio tracking, majority of time I spend on short to mid-term strategy development, back-testing and implementation; portfolio hedging and leverage; and options trading.
Only manual aspect is actual order placement, which takes only few minutes at best.
Case in point, my framework for mining companies is here: https://emergingtrajectories.com/a/pub/mining_company_risk_f... You can see the scores here: https://emergingtrajectories.com/c/copper_mining_companies
"Long term" -- we'll see, I expect to hold positions for 12-24 months.
For those interested, my work above is influenced by two important books: "You Can Be a Stock Market Genius Even if You're Not Too Smart" by Joel Greenblatt and "Superforecasting: The Art and Science of Prediction" by Philip Tetlock. The idea from Joel's writing is to look for less liquid or less popular asset classes (or ones that structurally can't be invested in by the pros who are smarter/better-resourced than you), and Tetlock really drills process and research for long-term forecasting.
This platform, allows one to do automated trading based on your own strategy. US only traders, for now. https://www.composer.trade
If you are just doing portfolio re-balancing. Say, twice a year. You could re-balance based on each stock's risk parity.
i.e. Risk parity is an approach to investment portfolio management which focuses on the allocation of risk, rather than the allocation of capital. The risk parity approach asserts that when asset allocations are adjusted to have the same level of risk, the portfolio can achieve a higher risk-adjusted return.
Some Quant Resources: https://quantpedia.com
They teach a class on quant. Pretty good. Python oriented. https://quantscience.io
I haven't looked at it in a while, but it was promoted heavily on some podcasts I listened to years ago when it came out.
Edit:
Ah, I just realised you might have means software part more than the financial part. Shyam does publish R code of various things on GitHub
Why?
Long-term trading doesn’t work like this, in fact you want more people to identify the opportunity and buy to improve your position in the long term, but generally you should be investing on alternative signals not technical analysis.
For example, I personally think TSLA is way, way, way over bought. The reasons for this are many and varied but still largely irrelevant. My opinion didn't stop an "analyst" from issuing a buy recommendation yesterday with a target price of $265.
Maybe this "analyst" has clear, logical reasoning with supporting computer analysis --- or maybe his grandma is just heavily bought into TSLA. In any case, the stock jumped 2.5% yesterday.
Bottom line --- everybody and his brother is using software. But opinion and psychology remain significant factors that defy computerization. The market can still stay irrational longer than you can stay solvent.
It's a lot easier to raise that money by selling the tool than it would by trading the stocks.
It also doesn't hurt to share knowledge. Very often you can compare your notes to another.
In fact, if you had an algorithm that picked winning stocks, just add a little delay to everyone else's bots. Your bot buys it first, then some whale buys it and pumps your stock.
What would be the point? HFT works because you can beat the market by being faster, I don't see how long term trading could beat the market unless you have insider information.
And if you can't beat the market, there is absolutely no point in the bot, as you can trivially just buy an index fund tracking the market. Which is also what I am doing, I would never use a bot over that, as it is just additional risk.
You are talking about 2 different things in your post though, I believe: 1 - automating long term investmenets (this is the Revolut thing) - ie, setup an amount you set for investment every month - and it automatically buys whatever you want 2 - a research tool ? (not a bot though)
Or, it just hit me while writing, are you talking about quantitatives strategies ?? If yes, then yeah, half of Wall Street was working on that ! There were some open source attempts, I think the best known was Quantopian - look it up.