Machine learning for financial prediction (opens in new tab)

(robotwealth.com)

214 pointsMatetricks10y ago82 comments

82 comments

55 comments · 14 top-level

aj710y ago· 9 in thread

Really successful traders spend their obtaining insider information, not massaging public data. It stands to reason that an ensemble of technical trading methods would regress towards the mean.

SyneRyder10y ago

Some of it can be publicly available data that others just haven't thought to use. In another comment someone has mentioned the Walmart carpark satellite imagery example, where they could estimate trends in Walmart's sales by counting the number of cars in the carpark over time.

Nik Cubrilovic recently demonstrated how information leakage can be used to trade stocks, eg estimating the growth rate of the Adobe Creative Cloud customer base based on assigned customer ID numbers, before Adobe announced the figures themselves:

http://www.itnews.com.au/news/how-an-aussie-hacker-used-info...

cheez10y ago

This is exactly true. I had the pleasure of having a very short sit down with one of the world's most successful traders and he said this in not so many words.

It was more subtle than inside information. He implied that he could actually influence the outcome and hurried me on from that point.

Mind blown.

kejaed10y ago

Sounds like Billions is not that far off...

2 more replies

Bromskloss10y ago

Please do share more if you have anything.

1 more reply

kasey_junk10y ago

You probably mean "information that others do not have". That is not the same thing as "insider information".

egwor10y ago

Using insider information is illegal. If you mean insider information then this is total nonsense. Sure, there are a few that do illegal things (and inevitably get caught since there is so much monitoring going on).

aswanson10y ago

If you read "Lessons of a Street Addict", Jim Cramer pretty much says directly he used to call his contacts at goldman to get info on trades. Reading that convinced me that day trading is for suckers without inside information.

CPLX10y ago

It's only illegal if you are an insider or got it from an insider who realized a gain on it. There's a lot of misconceptions here about what insider trading actually is.

falsestprophet10y ago

Exactly, when I heard drugs usage was a problem in some communities I too knew that was impossible because selling and using drugs is illegal.

Sure, there are a few that do illegal things (and inevitably get caught since there is so much monitoring going on).

1 more reply

xivzgrev10y ago· 7 in thread

Anyone know where he got all the raw data to feed his algo? Clearly he used a lot of data and the two main sources of free info i know of are google finance and yahoo finance. At least with google finance i run into issues with their api if you execute too many calls simultaneously, a bunch end up not returning any data

TDL10y ago

Not sure where he got his data, but you might want to try https://www.quandl.com/

They have a free, community, curated data set of ~3200 stocks.

xivzgrev10y ago

Wow i have not heard of that site before - thanks!

jihoon79610y ago

Agreed with Quandl being a good source for financial data. Their APIs are also quite well-maintained. Within Quandl, I've found Zack's to be a good resource.

I work with both Quandl and Zack's pretty frequently, let me know if you're interested in buying large amounts of data from Zack's, and I can perhaps get you a discount from the listed prices on the Quandl website.

miscellaneous10y ago

If I was interested in such a discount, how would I contact you? (Also, how much is a "large amount"?)

bladecatcher10y ago

In my experience, Yahoo finance data is not reliable. In one case, I noticed that the stock price is incorrectly adjusted for dividends for all shares trading on a particular exchange. Free correct data is hard to obtain.

chvid10y ago

You can get data from Interactive Brokers; I assume most of the other brokers that provide an API will give you data too.

SixSigma10y ago

You can get constant updates from IG via scraping. Market prices are fractal I.e. Self similar at any scale

mathgenius10y ago· 4 in thread

It's just so ridiculously easy to overfit these models, and so so many ways to shoot yourself in the foot as a result.

For example, "I split the data set into 5 random segments and then trained a model on 4 of the 5 segments and then tested it on 5th." Such data is serially correlated (it's not good old iid) so already it looks like you have poisoned the test set with information from the training set.

The hard part is not "feature engineering" or "ensemble methods", the hard part is controlling the entropy that you feed these things because they are voracious monsters and will absolutely eat all of it.

lpage10y ago

> Such data is serially correlated (it's not good old iid) so already it looks like you have poisoned the test set with information from the training set.

Kind of. If it was that simple making money off of an autoregressive model would be trivial -> everyone would do it -> serial correlation would disappear.

I agree with your observation that figuring out what to feed the beast is one of the bigger challenges though. Case and point: train a mean reversion model on the last seven years of S&P data to buy dips and train a momentum model to buy higher highs. That equity curve would look very encouraging. Do it on a fifteen year basis, and not so much. Now the question becomes: how long of a lookback do you use when training your models? Chopping up data at random will mux out useful correlations. Subsetting into periods leads to poorly generalized models. Not fun.

narrator10y ago

It's so easy to do machine learning and think you're a genius when you are in fact overfitting. It's almost like casino gambling. You tweak some hyperparameter, pull the slot machine lever, and wham, your model says you should be rich real soon...

tmpanon1234act10y ago

This is one of the better responses. Issues that arise: low # of data points at macro timescale, time series data (and local correlation between individual data points) making it hard to extract training/testing sets, and the overarching structural shifts in the market over time that invalidate older data (depending on context).

joeyo10y ago

Doesn't that cut both ways though? If there are serial correlations in the data then modeling and accounting for the variance explained by those correlations should help with future predictions, no?

joegreen10y ago· 4 in thread

If anyone else is getting errors when loading the page, here's the google cached version http://webcache.googleusercontent.com/search?q=cache:-ciyXfS...

meeper1610y ago

More tools for ML financial prediction

http://52.11.211.67/recommend/app/hidden_connections?query=h...

http://52.11.211.67/recommend/historical-trends/index-contra...

t3rseCode10y ago

+1 thanks for the cached version.

codecamper10y ago

Thank you.

sonabinu10y ago

Thank You !!!

mcbrown10y ago· 4 in thread

Former professional investment manager here...

The biggest problem with things like this, which almost nobody talks about in the context of investing, is publication bias.

100 people try to develop a profitable trading algorithm. 1 comes up with one that looks great on back-tests at a 1% confidence (in other words, exactly what you'd expect from random chance alone over 100 trials).

That person writes an article/pitch/business plan based on their algorithm. You never see results from the 99 who failed.

Going forward, the successful algorithm is no more likely to work than the failed 99, but from the perspective of the general public it sure looks like a winner!

JoeAltmaier10y ago

There's an old con game - you send 500 letters to gamblers, predicting the next Dodgers game. 250 predict they'll win; 250 say lose. Game happens, 250 people think Hey lucky guess. To those you send 250 letters, 125 predict they'll win the next game; 125 lose. After 6 games you have 8 people who have seen you guess right 6 times in a row. Get them to pay you for another (worthless) letter.

branchless10y ago

So much this. Also even if you have won it means very little going forward. If you put 100 guys in a room and asked them to try to flip N consecutive tails one guy will come out thinking he is the king of flipping, with a rock-solid "system". He's just someone who doesn't understand probability. And as you say you don't hear from the other 99 including the math guy who flipped N/2 heads and is muttering about it.

mathgenius10y ago

> 100 people try to develop a profitable trading algorithm....

It's much worse than this with machine learning approaches. Imagine a million people trying to find a profitable algo, all on your laptop, and you are choosing the best one out of all of those.

If you are used to pen-and-paper trading strategies, or even excel spreadsheets, machine learning is just a completely different level to this. And probably how it works will be unintelligible to anyone. I don't even see how someone can write a business plan based on this.

selectron10y ago

The type of approach used has limited effect on survivorship bias, what matters is the number of people employing different approaches and the size of the effect. So if machine learning approaches can produce real results, the data will show this. Survivorship bias is real, but it is not the full story.

nxzero10y ago· 4 in thread

Never understood why anyone would spend time creating any trading method given even if it did work (possible, but unlikely) the SEC would audit you and then leak how you were making the outperforming returns.

Welcome any thoughts, in part because legally beating the market is possible, just don't get the SEC & OPSEC aspect.

lordnacho10y ago

Why would the SEC audit you? Just by random chance there are many people outperforming the market. They can't audit you just for outperforming.

If they do audit you, how will they discover how you are generating your trading decisions? Their remit is to make sure you aren't doing something illegal. There's no reason they would understand what you were doing in anything other than a superficial way.

Also, something can be profitable, and obviously so, without being easily reproducible. For instance there are firms that do simple footrace arbitrage on the same security between different exchanges. Not hard to understand, but you still can't do it. There's a whole spectrum of strategies that are on a frontier on the map of easy-to-understand vs easy-to-implement.

Besides all that, I think even if you were to learn about a way to beat the market, the way you found out might lead you to be very skeptical of whatever was proposed. If a guy is selling it on a website, you will probably not believe him, right? And if he showed you backtests that worked, you would suspect they were generated from a random generator of some sort. And if he then shows you the math, you would almost certainly find fault with it. Why did he do this or that transformation on the data? Must be random...

nxzero10y ago

Few years back, SEC started being very agressive about finding entities making above average returns; my understanding is that if over a set amount of transactions you're making over 30% that you will get "knocked" and the auditors have zero reason not to leak the information. Best example I know is the Walmart parking lot satellite imagery analysis; happy to dig up a link.

2 more replies

ozi10y ago

The problem isn't coming up with an algorithm that works (i.e. more wins than losses).

The difficulty is gaining confidence in your algo and determining when to move from paper trading to actual trading.

You run into counter-intuitive things while training a neural net, for example. You'd think more training data would be good, but when training neural nets, you actually want to use as little data as possible while still creating an ideal ROC curve.

0xmohit10y ago

> The problem isn't coming up with an algorithm that works (i.e. more wins than losses).

An algorithm that works would also include the ability to limit losses. An algorithm might be correct 9 out of 10 times, but may lose more in a single transaction than what it earned in those 9 winning transactions.

dpweb10y ago· 3 in thread

There are a few problems with turning your laptop into a money machine using data analysis.

Remember the maxim, past performance is not a guarantee of future results. You can develop strategies based on past data that will beat the market, but, the nature of markets is to adapt to kill your edge. Markets adapt constantly and your edge stops working at an unknown point in time. It's unknowable when that WILL happen because past data can't show that.

The other reason is transaction costs. In gambling called vig. Let's say I'm betting NFL games. NFL home teams win 51% of games. Even flipping a coin I've read come up heads 50.1% of the time. These are profitable systems. But you're paying the bookie 10% on each loss. You could find someone to bet you on coin tosses and bet heads each time. You have a positive expected return, although you need a huge number of flips to make money!

In trading of course costs is commissions. Why do you think there was a rise in HFT? The strategies are consistently profitable. (Besides the flashing/manipulation tactics) It is ONLY profitable because of extremely low commission costs that are not available to the retail (or even semi-professional) trader.

Systems that can pull $0.0001 out of every share traded overall on high volume can be (pretty easily) created, but you can't trade them profitably. In fact, you will find commissions (semi-pros who pay about $3 per 1000 shares) priced right at the point of an edge you could be expected to develop.

melling10y ago

"nature of markets is to adapt to kill your edge"

If you are a low volume, small time trader, the market isn't going to move as quickly to adapt to you. If you have $100,000, for example, and return 30% a year, you aren't on anyone's radar.

auntienomen10y ago

Provided you're the only one trading your strategy, which is unlikely.

1 more reply

jzwinck10y ago

30% on 100k is 30k. You'd be better off getting a regular job unless you can sustain that for more than 10 years. Which you can't predict.

2 more replies

hendzen10y ago· 3 in thread

If you can actually reliably generate alpha from a model like this there is no point of running the strategy yourself. There are any number of hedge funds that will sign you on, let you keep all of the IP you develop, and give you 10-12% of any returns you generate. That sounds small, but it's mitigated by the fact that you will have access to potentially billions of dollars in capital to trade if your strategy has the capacity for it. So you get 10% of a much bigger pie, with way less downside risk. Plus you get access to all their internal trading systems, execution services, data feeds, etc, which are usually orders of magnitude better than what an individual has access to.

malux8510y ago

Who do I contact? I have a deep learning startup that is trading forex right now, I would like to make some contacts and see if I can integrate

brobinson10y ago

Create a public, real account on myfxbook.com and let them find you. If you contact them directly, you can link them to your myfxbook account to give them an instant view into how your algorithm has performed.

wocram10y ago

Why FX? More direct access to exchanges?

1 more reply

lordnacho10y ago· 2 in thread

Interesting article. I do something related, and here's my take:

Data mining is useful because it gives you things that are predictive that you might not have considered at first, but make sense after. This is mainly due to combinatorial explosion in the potential number of formulas.

You generally have a vague idea of what might be predictive, eg cheapness vs earnings and cash flow, but there's a huge number of ways that might show up in the data, and there's a huge number of ways it might hide in the data.

So for instance an old school analyst might do a ranking of price/earnings as well as cash flow, or whatever bespoke formula desired.

A data mining approach could take all the fundamentals and generate formulas mixing the variables, yielding a number that seem to be effective. Out of those, you'd look at them and decide that they capture some thesis (low P/E, upward trend in earnings). Then you'd look at whether the formula is sensitive to small tweaks. For instance, if you regressed the last 6 earnings and it had phenomenal performance, but with 5 or 7 it wasn't, you probably conclude it's some sort of random result.

There's funds that take the mass approach to an extreme. They have huge databases, with a genetic algorithm that generates expression trees, and a battery of stats (incl backtests) to decide what works. They end up with many thousands of strategies that are a great deal more effective than your standard one-trick pony fund.

Wonnk1310y ago

very interesting. Can you recommend any resources for someone with a fairly strong stats / programming background but no real substantive finance experience?

lordnacho10y ago

Igor Tulchinsky has a fund that does this. He also writes books and papers about how he does it, with everything you need to do it yourself.

dreamdu5t10y ago· 1 in thread

There's a hedge fund built by anonymous data scientists - https://numer.ai

You can use ML to make money on encrypted stock data for free. Think Kaggle but the winning models are used to trade.

chillacy10y ago

It looks like the feature set is fixed on numer.ai? If so everyone's probably developing mega ensembles (this is what netflix's competition ended up as, with teams merging because their models did better together). Compared to quantopia, where you're responsible for feature engineering too (though numer.ai is probably easier to get started, since model selection is imo the fun part).

ChuckMcM10y ago

I think financial prediction via machine learning will be a useful cruicible for defining AI from non-AI. So far, so many companies that have applied machine learning to prediction have ended up on the wrong side of the order book at the wrong time. I don't know if this is because other algorithms figure out what they are doing and rapidly develop a counter algorithm to fleece them, or if its just savvy traders intuition about what the algorithm is keying on and manipulating it. Sort of like good RTS game players that figure out how the opponent AI is playing and start playing against its programming rather than some strategy from first principles.

sovande10y ago

I'll invoke the Black Swan (https://en.wikipedia.org/wiki/The_Black_Swan_%28Taleb_book%2...) since it hasn't been done yet in this thread.

aj710y ago

...spend their time and resources...

robotwealth10y ago

Hello

I'm Kris, the guy who wrote the article that started this thread. Thanks to all who have read my article and taken the time to comment. In the context of my motivation for starting my blog, it means a lot. I'm an engineer who became interested in quantitative finance and machine learning a few years ago. I learned how to code and apply my maths and stats knowledge to finance independently - no formal training whatsoever. This meant that for a long time I was conducting research and developing trading systems in a vacuum; I had no one to bounce ideas off or learn from. So I started writing about what I was doing in the hopes of getting some feedback. So thank you all for providing some. The insights were immensely valuable and I learned a lot.

I thought it would be useful to respond to some of the comments.

mathgenius brought up the extremely valid point that regular k-fold cross validation in a time series context doesn't make sense since the data is autocorrelated, not iid. I no longer use this approach for time series data, instead favoring Rob Hyndman's time series cross validation approach, also known as forward chaining. I believe this approach is the best representation of a real trading environment. The issue becomes deciding how large the rolling window of training data should be - older data may be obsolete, but excluding too much history can lead to not enough training instances.

dpweb raises a good point too, namely that just because your model performed well on past data, even if that data was out of sample, there is no guarantee that the future will be sufficiently like the past, meaning that your model may well become useless at some point in time (possibly very quickly). This is a valid point, but no reason to abandon the markets. It does however require that any algorithm's live performance be objectively monitored such that the level of deviation from expected performance can be statistically quantified. Once a pre-determined confidence level in the model's obsolescence is reached, it should be removed from the portfolio.

mcbrown's comment about publication bias is a good one too. Even worse, I've personally developed hundreds of trading systems that I haven't published. Other bloggers and publishers have most likely also done the same. This form of selection bias is very likely rampant, and is especially applicable to models 'discovered' using machine learning techniques that may not be rooted in traditional economic or financial principles. The moral: absent some form of robust accounting for selection bias, view all of these types of systems with a healthy dose of skepticism, and the published performance as a theoretical upper limit to what could be achieved in practice.

hendzen's point about partnering with a fund or proprietary trading company rather than running your reliable, alpha generating strategy yourself is also a valid one. I have happily found this out for myself recently.

Also, lordnacho is spot on regarding his take on the utility of data mining in finance.

Thanks again for all the comments!

j / k navigate · click thread line to collapse

82 comments

55 comments · 14 top-level

aj710y ago· 9 in thread

Really successful traders spend their obtaining insider information, not massaging public data. It stands to reason that an ensemble of technical trading methods would regress towards the mean.

SyneRyder10y ago

http://www.itnews.com.au/news/how-an-aussie-hacker-used-info...

cheez10y ago

This is exactly true. I had the pleasure of having a very short sit down with one of the world's most successful traders and he said this in not so many words.

It was more subtle than inside information. He implied that he could actually influence the outcome and hurried me on from that point.

Mind blown.

kejaed10y ago

Sounds like Billions is not that far off...

2 more replies

Bromskloss10y ago

Please do share more if you have anything.

1 more reply

kasey_junk10y ago

You probably mean "information that others do not have". That is not the same thing as "insider information".

egwor10y ago

aswanson10y ago

CPLX10y ago

It's only illegal if you are an insider or got it from an insider who realized a gain on it. There's a lot of misconceptions here about what insider trading actually is.

falsestprophet10y ago

Exactly, when I heard drugs usage was a problem in some communities I too knew that was impossible because selling and using drugs is illegal.

Sure, there are a few that do illegal things (and inevitably get caught since there is so much monitoring going on).

1 more reply

xivzgrev10y ago· 7 in thread

TDL10y ago

Not sure where he got his data, but you might want to try https://www.quandl.com/

They have a free, community, curated data set of ~3200 stocks.

xivzgrev10y ago

Wow i have not heard of that site before - thanks!

jihoon79610y ago

Agreed with Quandl being a good source for financial data. Their APIs are also quite well-maintained. Within Quandl, I've found Zack's to be a good resource.

miscellaneous10y ago

If I was interested in such a discount, how would I contact you? (Also, how much is a "large amount"?)

bladecatcher10y ago

chvid10y ago

You can get data from Interactive Brokers; I assume most of the other brokers that provide an API will give you data too.

SixSigma10y ago

You can get constant updates from IG via scraping. Market prices are fractal I.e. Self similar at any scale

mathgenius10y ago· 4 in thread

It's just so ridiculously easy to overfit these models, and so so many ways to shoot yourself in the foot as a result.

lpage10y ago

> Such data is serially correlated (it's not good old iid) so already it looks like you have poisoned the test set with information from the training set.

Kind of. If it was that simple making money off of an autoregressive model would be trivial -> everyone would do it -> serial correlation would disappear.

narrator10y ago

tmpanon1234act10y ago

joeyo10y ago

Doesn't that cut both ways though? If there are serial correlations in the data then modeling and accounting for the variance explained by those correlations should help with future predictions, no?

joegreen10y ago· 4 in thread

If anyone else is getting errors when loading the page, here's the google cached version http://webcache.googleusercontent.com/search?q=cache:-ciyXfS...

meeper1610y ago

More tools for ML financial prediction

http://52.11.211.67/recommend/app/hidden_connections?query=h...

http://52.11.211.67/recommend/historical-trends/index-contra...

t3rseCode10y ago

+1 thanks for the cached version.

codecamper10y ago

Thank you.

sonabinu10y ago

Thank You !!!

mcbrown10y ago· 4 in thread

Former professional investment manager here...

The biggest problem with things like this, which almost nobody talks about in the context of investing, is publication bias.

That person writes an article/pitch/business plan based on their algorithm. You never see results from the 99 who failed.

Going forward, the successful algorithm is no more likely to work than the failed 99, but from the perspective of the general public it sure looks like a winner!

JoeAltmaier10y ago

branchless10y ago

mathgenius10y ago

> 100 people try to develop a profitable trading algorithm....

It's much worse than this with machine learning approaches. Imagine a million people trying to find a profitable algo, all on your laptop, and you are choosing the best one out of all of those.

selectron10y ago

nxzero10y ago· 4 in thread

Welcome any thoughts, in part because legally beating the market is possible, just don't get the SEC & OPSEC aspect.

lordnacho10y ago

Why would the SEC audit you? Just by random chance there are many people outperforming the market. They can't audit you just for outperforming.

nxzero10y ago

2 more replies

ozi10y ago

The problem isn't coming up with an algorithm that works (i.e. more wins than losses).

The difficulty is gaining confidence in your algo and determining when to move from paper trading to actual trading.

0xmohit10y ago

> The problem isn't coming up with an algorithm that works (i.e. more wins than losses).

dpweb10y ago· 3 in thread

There are a few problems with turning your laptop into a money machine using data analysis.

melling10y ago

"nature of markets is to adapt to kill your edge"

If you are a low volume, small time trader, the market isn't going to move as quickly to adapt to you. If you have $100,000, for example, and return 30% a year, you aren't on anyone's radar.

auntienomen10y ago

Provided you're the only one trading your strategy, which is unlikely.

1 more reply

jzwinck10y ago

30% on 100k is 30k. You'd be better off getting a regular job unless you can sustain that for more than 10 years. Which you can't predict.

2 more replies

hendzen10y ago· 3 in thread

malux8510y ago

Who do I contact? I have a deep learning startup that is trading forex right now, I would like to make some contacts and see if I can integrate

brobinson10y ago

wocram10y ago

Why FX? More direct access to exchanges?

1 more reply

lordnacho10y ago· 2 in thread

Interesting article. I do something related, and here's my take:

So for instance an old school analyst might do a ranking of price/earnings as well as cash flow, or whatever bespoke formula desired.

Wonnk1310y ago

very interesting. Can you recommend any resources for someone with a fairly strong stats / programming background but no real substantive finance experience?

lordnacho10y ago

Igor Tulchinsky has a fund that does this. He also writes books and papers about how he does it, with everything you need to do it yourself.

dreamdu5t10y ago· 1 in thread

There's a hedge fund built by anonymous data scientists - https://numer.ai

You can use ML to make money on encrypted stock data for free. Think Kaggle but the winning models are used to trade.

chillacy10y ago

ChuckMcM10y ago

sovande10y ago

I'll invoke the Black Swan (https://en.wikipedia.org/wiki/The_Black_Swan_%28Taleb_book%2...) since it hasn't been done yet in this thread.

aj710y ago

...spend their time and resources...

robotwealth10y ago

Hello

I thought it would be useful to respond to some of the comments.

Also, lordnacho is spot on regarding his take on the utility of data mining in finance.

Thanks again for all the comments!

j / k navigate · click thread line to collapse