The anatomy of an ML-powered stock picking engine (opens in new tab)

(principiamundi.com)

298 pointsmuggermuch3y ago103 comments

103 comments

77 comments · 26 top-level

chollida13y ago· 9 in thread

Someone asked about how difficult it is to get outside investment....

It's usually very difficult and it takes a lot of money to run a proper fund.

Let's say you raise $50M. You can maybe charge 1 and 20,meaning you get 1% of assets each year for running the fund and 20% of profits.

1% of $50M( and keep in mind this is a large raise for someone without a track record on the sell side or inside another fund) give you $500,000 a year to pay:

- salaries( lets say you pay yourself $100,000 all in plus the same for a single analyst

- a Bloomberg terminal $30,000 including data feeds

- market data feeds you need $25,000/year for basic market data and fundamental data that you are allowed to warehouse(you can't store data you get from the Bloomberg terminal).

- rent $50,000/year for office space

- outside lawyer fees and outside accounting fees $100,000/year

- similar fees for someone to run your back office, roughly $100,000/year.

And on the other side of expenses you have the money making side of things. Which as the OP pointed isn't great. If you return 10% on the 50M you get to keep 20% of that so a 10% return gives $5M in profits and you keep $1M.

That allows you to bonus out yourself and analysts on good years. If you lose money one year then you get no bonus and have to bonus out the employees out of the retained earnings you kept from previous bonuses.

it usually gets worse as most funds have what's called a high water mark. This means you don't collect the performance fee until your fund gets back to the high water mark. So if you are down 10% one year you need to make that back before you start to make any performance fee, which is why most funds shut down if they go down more than 20%.

As to raising money.....Anyone can show a model that makes money. that doesn't mean its easy to create a model, its just that there are alot of people capable of building such a model.

Its the risk management that people with money are really looking for and sadly that's just really hard to show out of a model as part of the risk management is things like positions sizing and showing your model doesn't pile into one asset class or trade correlated products.

it bodes well for the OP that they talk about market regimes as, IMHO, this is one of the biggest risk management tools that aspiring traders ignore.

And this risk management is why people ask for a track record of more than a year.

rmah3y ago

Working in the industry, I can confirm that the above numbers are approximately correct except for the employee costs -- those are roughly double and up. You also need to hire a fund administrator, auditors and compliance firms (maybe $50k to $100k per year each) which add on even more costs. And you can't skip the lawyers, outside administrator, outside compliance, etc. as they are required by regulations/law.

HFguy3y ago

This is actually way too optimistic.

Your first 1-2 seed investors will:

- Only pay 1 and 10 (1% fixed fee and 10% of PNL)

- They will also get ownership of the actual fund management firm and will get that in the form of 20% of REVENUE (not equity, revenue, think about that)

This is one reason new fund formation is way down. The economics are bad for years. Know a bunch of HF people that started vc-backed tech firms instead.

The other reason is 10+ year run where stocks, bonds, private firms and real estate just went up. No need for diversifying return streams.

HFguy3y ago

BTW, data costs also too low.

Just a BB terminal around 30k and a lot of extra data from BB costs extra (can be 200-300k per additional product).

For quant strategy probably looking at 500k up to 2M for data initially. And you will likely be at a disadvantage to existing firms that have been collecting data for years.

And that is at the low end. Spent many millions per year for 1 strategy at last large firm. And that was small fraction of total firm spend.

ackbar033y ago

I guess if you have a reasonably profitable strategy it probably just makes more sense to run it with your own money? I guess the only reason why you would want to trade as a hedge fund is if you want to scale up, but most strategies aren't really that scale-able anyways from what I understand, since when you start trading in any real significant size you start moving the markets.

chollida13y ago

Ha, you need a better network:)

I don't know too many people who have started funds in the past 5 years, but of the 3 who did, none gave up any ownership in the management firm.

That's a suckers game and the only people who would need to give up any ownership are people who are very green:)

1 more reply

pid-13y ago

> a Bloomberg terminal $30,000 including data feeds

> market data feeds you need $25,000/year for basic market data and fundamental data that you are allowed to warehouse(you can't store data you get from the Bloomberg terminal).

Total nitpick: you can get those using soft dollars.

But your numbers are spot on. In my job we estimate running a hedge fund with AUM < 250MM is just not worth it.

chollida13y ago

Sure, if you generate enough commissions to pay for them, which is not a given, assuming the size of fund we are talking about.

muggermuchOP3y ago

Thank you for this comprehensive response!

I have often found myself struggling to explain the difference between building a strategy or trading system (which reduces to a technical/intellectual challenge) and running a hedge fund (essentially running a complex information-driven business).

Your cost breakdown really puts matters into perspective.

> it bodes well for the OP that they talk about market regimes

I concur. Market regimes (modeling, detecting, reasoning about them) are too delicious of an intellectual puzzle to resist.

TrackerFF3y ago

Hardest part is raising those $50M - even more so if you have zero professional experience in high-finance. Getting a foot inside investment banking, hedge funds, private equity, etc. is extremely competitive to say the least.

I think the best shot for any outsider programmer would be to seek (and team up with) those finance professionals that are already thinking about exiting to start their own funds, and in the need of some technical partner...but even then, you're also competing against experienced devs. already in the field.

In the end, it is just really, really difficult for outsiders to just enter this sector, if they have any hopes of working with any substantial amount of capital.

I guess the better option would be to make some product you can sell as SaaS to the masses, or figure out how to manage thousands and thousands of low-$ investors.

muggermuchOP3y ago· 7 in thread

Hi, fellow HN'ers! Author here, please let me know if you have any questions or thoughts!

krschultz3y ago

I'm not at all interested in finance / stock picking but found this to be one of the best walkthroughs of an ML system end-to-end that I've ever read. I'm not in the field of ML but I'm interested in learning more and this was fantastic, thank you.

muggermuchOP3y ago

Thank you so much for your kind words! Your comment made my day! :)

dennisy3y ago

This is great! Thanks for writing this!

I have wanted to do something like this for a while, purely for learning. The thing which puts me off is that there is a huge amount of knowledge needed in understanding the features vs the ML.

Could you recommend a base system / reference one could use to get started which explains or bakes in some of the feature / signals engineering work?

Also would this approach work with crypto?

muggermuchOP3y ago

> Also would this approach work with crypto?

Some of it works on crypto. TBH I've stayed away from the asset class, but only because I find it difficult to build mental models and think about features (in my mind, it's a mix of commodity factors and currency factors, but I'd have to test it out).

I seem to remember coming across papers that have tested momentum factors at larger time-frames (e.g. weeklies).

> Could you recommend a base system / reference one could use to get started which explains or bakes in some of the feature / signals engineering work?

The references I put in at the end of the post will really help with this! I might actually write out a separate blog post about starting out in this space from an ML perspective. Thanks for the idea!

defrost3y ago

EDGAR filings (structured text) is an area unto itself, I see you've limited yourself to quarterlies.

Across any market area (eg: mineral resources) there are thousands of documents released daily across multiple exchanges (via EDGAR, SEDAR, etc) ranging from two line advisories, to 4,000 page technical reports on projects | acquisitions, alongside the usual quarterly | yearly annual reports, etc.

There's plenty to do parsing common forms for generic changes (board members, board member share changes, etc) and market regime specifics (exploration property aquisition) and trends (series of related aquisitions) for those that like the weeds.

Some might argue that 'understanding' these patterns lead the changes in stock price movements, and give insight wrt weathering short term changes for longer term returns.

muggermuchOP3y ago

This is a very insightful remark, thank you.

I focused on 10-Qs for the EDGAR filings module as you rightly pointed out - it seemed to be a good balance between implicit information and usefulness of the data. TBH I didn't actually investigate the other (many) patterns.

Having said that, I have really enjoyed Kai Wu's research from Sparkline Capital (https://www.sparklinecapital.com/), especially his extraction of the innovation factor from EDGAR filing texts. He's appeared in numerous podcasts, and they have all been super useful to listen to. Maybe someday when I re-investigate EDGAR filings and go further, I might target these signals you talk about here.

1 more reply

awb3y ago

Thanks for sharing! Really curious about ML stock market models as they seem extremely difficult to outperform the market consistently over time.

A few questions:

1) Were these stock picks for major stocks / ETFs? Or small market cap stocks?

2) How many people were subscribed to your newsletter?

3) What do you estimate the impact was of creating a “self-fulfilling prophecy” of entering a position and then recommending your subscribers take the same position?

4) Do you think your asset mix outperformed the market by picking high risk / high reward stocks in a bull market? Or picking safe stocks in a bear market? In other words, how do you know the engine wasn’t biased towards the market trend that happened to play out? For example, if I have a basket of tech stocks that would typically outperform in a bull market and flip a coin to buy or short them and guess right, I could outperform the market by chance. How did you account for this?

5) Have you backtested the engine to see what it would have returned in previous years? (Obviously on unseen data, rather than data it used for training)

gbasin3y ago· 6 in thread

If your predictions are good, I'd be happy to get you $100 million in assets to manage. It's very unlikely that your predictions are good...

Straw3y ago

I've been thinking about trying to build something from scratch with a similar spirit, but very different methods, but I also doubt that my predictions will be any good without far more time investment than I have available.

As you say you have expertise in the area, any chance I could ask you for advice or how to decide if its worth trying?

gbasin3y ago

Sure ping me on twitter

notacop313373y ago

It's very unlikely that you're able to get OP $100 million in assets...

sbierwagen3y ago

"Inadequate Equilibria".

If you picked a HN comment at random, then the person who made that comment are overwhelmingly unlikely to raise $100MM at the drop of a hat. Picking a HN user at random won't do it either.

But there's not a lot of finance-related submissions on HN. The people in the comments may be unusual. And off the top of my head there certainly is an existence proof for at least a few investors on the news site for Y Combinator, a VC fund, such as pg or sama. There are obviously other VC people on here as well.

gbasin3y ago

I founded and ran a quant trading firm for about a decade :)

collegeburner3y ago

no one said it would be fast but plenty of people on here know how to talk to allocators and have helped raise before.

alpineidyll33y ago· 4 in thread

My heart goes out to this author, but you can tell even by his first table that he doesn't quite understand the mathematics of financial markets, the purpose of a hedge fund, how they grow etc.

1) It's plain by quickly looking at the allocation of capital in investment firms, that AUM is not made by performance; it's marketing. At best people invest when they believe a person is connected to inside information. Saying you have an ML advisor is really just a pre-req to these people.

2) Is that allocation stupid? No, it's not, because actually the powers of mathematics and by extension ML are intrinsically limited for investment returns because they are fat-tailed </Taleb>. For example this author quotes a realistic sharpe (0.8), but didn't calculate the standard deviation in his sharpe, which I would bet a large sum was _at least_ 0.8. Ie: he doesn't really know what his sharpe is. This is because equity assets behave like a student-t distributions with a degree-of-freedom parameter ~2 or less </Mandlebrot, /Bergomi, /Gatheral etc.>. Ie: higher moments such as uncertainty in sharpe, literally do not exist or converge and are unknowable. The only exception is if your strategy explicitly cuts off tails.

Once you understand 2) you begin to understand that there's no such thing as a real quant fund (ie a fund which truly makes money predictably using models) which doesn't trade a liquidity limited book that has quite advanced hedging. Wealthy people are aware of this, which is why the author can't market this product.

If you're doing something silly like holding equities without tail risk control, you literally cannot be quantitatively investing. You are just slowly rediscovering what Kelly, Bergomi, Mandlebrot, Bernay's etc. realized with a little deep thought over pen and paper (while clumsily writing boilerplate software.) That markets are entropy machines rougher than a normal distribution, and any gains come directly from information. (see: Kelly: "a novel interpretation of the information rate".)

For a high latency (ms) market data feed, the returns on information are very very small. Markets are efficient.

mrhands5563y ago

> That markets are entropy machines rougher than a normal distribution, and any gains come directly from information.

Isn’t this partially what this model is accomplishing with sentiment analysis?

Also has there been a lot of investment into sentiment analysis for algo trading? I’m sure there have but references including books would be interesting.

alpineidyll33y ago

No. Because to train the sentiment model you need estimable distributions.

maCDzP3y ago

I found your comments about rediscovering Kelly et al interesting. Could you recommend some textbooks that describes what you are referring to? If there are good overviews of the subject?

alpineidyll33y ago

If you can read and understand bergomi's book you basically understand financial math

idoh3y ago· 4 in thread

If you have a tool that can generate great returns, then why fall back to a newsletter?

muggermuchOP3y ago

Great question.

If I beat the market by 20% (say SPY generated 0% for the year, very optimistic at this point), and I have allocated $100k to this, I make $20k before taxes.

That's less than minimum wage.

Meanwhile, allocators expect a track record of at least 3-5 years.

Ideally, if I have an asset, I'd like to extract as much revenue as I can.

Hope this makes sense.

xapata3y ago

If you're sitting on a gold mine, you can wait 5 years. This does not make sense.

4 more replies

Straw3y ago

How difficult is it to get investors when you can show your model beats the market consistently?

Of course, they have to check your not trading a strategy with extreme tail risk, but here it sounds like that's not the case?

3 more replies

YetAnotherNick3y ago

Learn about hedging. Basically, for $100k, if your prediction could consistently beat some index, you don't just buy a stock, but you sell some other(short) stock/index at the same time. So you own 0 worth of stock but you get the difference in the increase as your profit. Obviously in real world, you would need some sort of deposit, but you could bet millions for $100k.

1 more reply

jesuslop3y ago· 3 in thread

Nice report. How did you did risk management? Have you been leveraged? Have you paid for data? Kudos for a view from the trenches.

muggermuchOP3y ago

Thank you!

>How did you did risk management? I put in a basic position management layer (1% fixed stop). Also, the market regime module would modulate participation, i.e. in really risky environments it would dial down the number of stock picks. I can definitely do much more on this front, but I wanted to nail down the stock picking first! :)

>Have you been leveraged? No leverage.

>Have you paid for data? Yes, my monthly running costs for data are ~$1.2k.

pneumatic13y ago

Have you looked into Kelly criterion?

1 more reply

JSGdev3y ago

what data sources did you use? im interested in working on something similar.

moeymo13y ago· 3 in thread

"steadily beating the S&P 500 for over a year on a weekly basis"

Can be achieved by chance alone.

If not chance, I can give you a strategy that would be highly likely to achieve such a result: it would take a lot of risk though!

I love it when people post this stuff to HN. Naive people try it, loose a bundle to market makers, then go back to their day job.

darawk3y ago

> "steadily beating the S&P 500 for over a year on a weekly basis"

If you're going to make a claim like that, you should actually follow up with the calculations. When you do that, you'll realize that the issue is quite a bit more complex than this shallow dismissal.

He has very low correlation to the index, which means he's not just levering beta and getting lucky on a trending market. His standard deviation is smaller than the index, which means he didn't just make one large and lucky bet. The evidence that he has real alpha is certainly not incontrovertible, but the numbers look quite good.

It also doesn't appear that he cherry picked his reporting/aggregation cadence, because he sent out a weekly newsletter ex ante, and all his stats are reported weekly. He could still just be lucky, but his numbers are much better than this sort of dismissal would imply.

One real risk is that, in some implicit way, he's pursing a negatively-skewed strategy. That is, one that has a latent large downside risk. Strategies like this can produce very good looking numbers for longish periods, but still have ultimately negative alpha. Judging whether or not that is the case here is hard without more detail, but nothing he says in the writeup indicates to me that that is the case here.

moeymo13y ago

If he can do what he claims (which as I said above is less impressive than it sounds) he can take it to a Chicago prop shop. They'll give him a budget and a share of the PnL. Very straightforward, it happens all the time.

However, his write up is completely devoid of talk of risk (beta is not risk), bankroll, Kelly sizing, etc. This is integral to understanding the trade.

For example, he could have a successful strategy that works in small lots. However, absent from nearly every ML model is the impact on sizing up. As soon as you post a sizable bid, the market will lean against you, and the edge evaporates. Same if you cross bid-ask, plus you're now giving up edge. ML cannot take this into account, at least not very easily and with the usual models.

Most programmers with models like this fall into this last category.

JSGdev3y ago

usually the answer for a negative strategy is set aside a long period of data for backtesting, right?

darawk3y ago· 2 in thread

This was a very enjoyable read. I built a nearly (architecturally) identical system a few years back that also had to be scrapped for different reasons. This brought back a lot of memories. The sanity checks, the index reconstitution issues, dealing with the insanity of security identification and tracking through time.

The fun cases are the ones where it's not even clear what the right answer truly is, e.g. company A spins out company B, and then 5 years later they re-merge. Who's time series and associated data is "the" canonical one? The data vendors often try to give their answers to this question, but maybe their answers don't make sense for your analysis.

Then there's the fact that a lot of vendors don't really do point in time correctly. They like to go back and helpfully revise data points for you that they or the company initially misreported. This is all well and good except that if you were trading for real, you wouldn't have known the correct information at the time, and so any backtest based on the updated information will be invalid. Vendors are a bit better now about providing true point in time data sets, or at the very least accurately describing when they are/aren't doing this. But we had a few cases where they said they were, but they definitely weren't.

muggermuchOP3y ago

Thank you, and ha! An emphatic yes to all the points you raised! It's especially daunting when there are multiple vendors with incompatible point-in-time hygiene setups, a situation I faced at the beginning of setting up Didact.

Also, this was really my first time with equities - my professional trading career was derivatives-focused - both listed (CME) and OTC (FX forwards/swaps). I think I lost the first few months simply trying to reorient my style of thinking.

Water under the bridge I guess.

webinvest3y ago

Yea, mergers and acquisitions is a hard table to incorporate.

igorkraw3y ago· 2 in thread

Nice writeup, thank you for sharing so openly!

The three things I always want to know from stock picking ML people:

1. Did you put your own money in it ?

2. How'd it go?

3. How well does your engine do vs a fixed stock allocation based on trend-statistics computed on the whole time window (i.e., compared to a fixed optimal portfolio computed with mean/std values you don't have access to, but which isn't allowed to change its choice. what's the regret if you are familiar with online learning)

muggermuchOP3y ago

Thank you for appreciating the article; I tried to disclose all that I could!

1. Yes, I did put my own money in it (low 6 figures).

2. It went as described in the article - for the capital I allocated to Didact, I beat the market (SPY) by ~20% since inception.

3. If I understand your question correctly, this would be the equivalent of the payoff on an optimal lookback option (https://en.wikipedia.org/wiki/Lookback_option). I haven't actually done that analysis, but it sounds like a nice idea.

adamsmith1433y ago

>2. It went as described in the article - for the capital I allocated to Didact, I beat the market (SPY) by ~20% since inception.

This seems extremely hard to believe. You should be running a multi-billion $ Quant fund if this is the case. The idea that you would try to push this as a newsletter rather than just taking investor money and becoming a billionaire literally makes the story seem farcical.

6 more replies

Joel_Mckay3y ago· 2 in thread

Every gambler thinks they have a system, but often fails to recognize a game is unfair long before they arrived. lol =)

darepublic3y ago

You can think outside the box to beat the unfair game but then you end up in jail.

Joel_Mckay3y ago

Some simply build a portfolio by copying those who can't be charged for violating market rules. Not sure why some folks find this strategy so controversial. =)

Congress member holdings report:

http://clerk.house.gov/public_disc/financial-search.aspx

Senate member holdings report:

https://efdsearch.senate.gov/search/

2 more replies

ajoseps3y ago· 2 in thread

this is very cool! where did you get your data from and how's the transition to airflow?

muggermuchOP3y ago

There are commercial feeds available via Nasdaq DataLink (FKA Quandl). I also bought bulk historical data to feed through my backtester (I haven't talked about this in the post; it was getting to be a bit too long).

timeserious3y ago

Let's get a write up of your backtesting framework too please! Terrific post @muggermuch - thank you!

1 more reply

unpwn3y ago· 2 in thread

Lmao this engine is down 6.9% for the year, when literally it's as simple as just buying some puts.

ramesh313y ago

> this engine is down 6.9% for the year

That’s pretty damn good, and still beating the market by nearly 20%. Of course you can always make more with riskier strategies.

Jabbles3y ago

You realise that puts have a cost that is determined by the market?

hendzen3y ago· 1 in thread

I know a bit about this industry and I have worked on some profitable systems. Honestly not a bad effort for someone working on their own with low-cost data. Don’t let the haters get you down. I would recommend you to pick up a more recent textbook on portfolio construction like Isichenko’s recent book.

muggermuchOP3y ago

Thank you for the note! Just picked up Isichenko from the online bookstore we all love-hate.

I'd love to get in touch (as per your HN profile) - my email is am(at)principiamundi.com

antognini3y ago· 1 in thread

Have you considered submitting your predictions to the Numerai Signals? It's market neutral so as long as your models can generate some alpha you can still get good returns.

muggermuchOP3y ago

That's a good idea. I'll try it out, thanks!

sanp3y ago· 1 in thread

OP, what are you using to draw the diagrams? They look nice and are very readable.

muggermuchOP3y ago

Thank you!

I used Excalidraw (https://excalidraw.com), and I highly recommend it! It gives me 'xkcd' vibes.

mbarras_ing3y ago· 1 in thread

Brilliantly written. As someone considering a move into the Quant field it is very informative.

muggermuchOP3y ago

Thank you!

prabdude3y ago· 1 in thread

Excellent article

muggermuchOP3y ago

Thanks!

conformist3y ago

What's the market beta? What's the average turnover/holding period? How are transaction costs modelled? What features explain most of the variance? How are they related to known factors? What's the beta hedged performance?

These are all things I'd want to know before deploying something like this. (Perhaps some mentioned in the post, might have missed them.)

To first order, I'd forget about fat tails and similar popular concerns. They matter, but not as much as structurally understanding what this model is up to. Perhaps one feature is explicitly selling tails? That might answer it already.

asavinov3y ago

> I have always kept in mind is that feature engineering is almost always the key difference between success and failure

I also developed an ML-powered service heavily relying on feature engineering

https://github.com/asavinov/intelligent-trading-bot Intelligent Trading Bot

Its difference from Didact is that this intelligent trading bot is focused on trade signal generation with higher frequency of evaluation. It is more suitable for cryptocurrencies but also works for traditional stocks with daily frequencies so it could be adapted for stock picking. What I find interesting in your work is the general design of such kind of ML systems relying on feature engineering.

artirdx3y ago

This was not only a very informative read but felt like an amazing achievement if everything described here was developed by one person (the author - @muggermuch).

The breadth of knowledge demonstrated by author from technology (bringing performance down to 14 minutes) to ML to deep understanding of financial markets is super-impressive.

Granted the author has an educational background in computer science and has been a trader which probably explains many of his abilities but to my small brain it feels next-level achievement.

Maybe I live in average circle of finance but I have never met nor heard of a person who could single-handedly conjuncture and implement such a system. To my knowledge, a typical hedge fund has several highly-paid people in different teams to build and maintain such a system.

I never thought one-person could do it. I genuinely wonder how he managed to wrap his head across this much knowledge. He seem to fall in 10x category. Kudos!

drdrek3y ago

If you have a guaranteed compounding money machine that out perform the market by 20% just let it run, sooner or later you will be able to buy out those who did not invest in you. If its just a useful recommendation engine than there are indeed a lot of questions relating to personal finance or investment strategies that have nothing to do with machine learning that needs to be addressed for PMF. You don't need better models you need to understand the needs of your customers.

mywaifuismeta3y ago

I'm curious what happens if you look at your returns and other metrics at different time scales, i.e. monthly and weekly, in addition to yearly. You can't make any argument based on a sample size of 2.

As someone who used to work in the industry, I am 99.99% confident that you cannot have any alpha with a system like this, you are basically flipping coins, as some other commenters have pointed out.

wlamond3y ago

Thanks for the article! What tool do you use to create the figures? I like the sketch style.

For anyone else, what tools do you recommend for generating pretty for system architectures, workflows, etc.?

jshaqaw3y ago

"Predicting" markets isn't the challenge. Implementing real world strategies with associated frictions is. Show me a cash p&l or it's just a student project.

Gatsky3y ago

Do these frequent trading strategies ever account for taxation? Where I live, the returns compared to a buy and hold passive strategy would be cut by 50%.

ktiwari313y ago

Great post! Very informative! Thoroughly enjoyed it.

j / k navigate · click thread line to collapse

103 comments

77 comments · 26 top-level

chollida13y ago· 9 in thread

Someone asked about how difficult it is to get outside investment....

It's usually very difficult and it takes a lot of money to run a proper fund.

Let's say you raise $50M. You can maybe charge 1 and 20,meaning you get 1% of assets each year for running the fund and 20% of profits.

1% of $50M( and keep in mind this is a large raise for someone without a track record on the sell side or inside another fund) give you $500,000 a year to pay:

- salaries( lets say you pay yourself $100,000 all in plus the same for a single analyst

- a Bloomberg terminal $30,000 including data feeds

- market data feeds you need $25,000/year for basic market data and fundamental data that you are allowed to warehouse(you can't store data you get from the Bloomberg terminal).

- rent $50,000/year for office space

- outside lawyer fees and outside accounting fees $100,000/year

- similar fees for someone to run your back office, roughly $100,000/year.

As to raising money.....Anyone can show a model that makes money. that doesn't mean its easy to create a model, its just that there are alot of people capable of building such a model.

it bodes well for the OP that they talk about market regimes as, IMHO, this is one of the biggest risk management tools that aspiring traders ignore.

And this risk management is why people ask for a track record of more than a year.

rmah3y ago

HFguy3y ago

This is actually way too optimistic.

Your first 1-2 seed investors will:

- Only pay 1 and 10 (1% fixed fee and 10% of PNL)

- They will also get ownership of the actual fund management firm and will get that in the form of 20% of REVENUE (not equity, revenue, think about that)

This is one reason new fund formation is way down. The economics are bad for years. Know a bunch of HF people that started vc-backed tech firms instead.

The other reason is 10+ year run where stocks, bonds, private firms and real estate just went up. No need for diversifying return streams.

HFguy3y ago

BTW, data costs also too low.

Just a BB terminal around 30k and a lot of extra data from BB costs extra (can be 200-300k per additional product).

For quant strategy probably looking at 500k up to 2M for data initially. And you will likely be at a disadvantage to existing firms that have been collecting data for years.

And that is at the low end. Spent many millions per year for 1 strategy at last large firm. And that was small fraction of total firm spend.

ackbar033y ago

chollida13y ago

Ha, you need a better network:)

I don't know too many people who have started funds in the past 5 years, but of the 3 who did, none gave up any ownership in the management firm.

That's a suckers game and the only people who would need to give up any ownership are people who are very green:)

1 more reply

pid-13y ago

> a Bloomberg terminal $30,000 including data feeds

> market data feeds you need $25,000/year for basic market data and fundamental data that you are allowed to warehouse(you can't store data you get from the Bloomberg terminal).

Total nitpick: you can get those using soft dollars.

But your numbers are spot on. In my job we estimate running a hedge fund with AUM < 250MM is just not worth it.

chollida13y ago

Sure, if you generate enough commissions to pay for them, which is not a given, assuming the size of fund we are talking about.

muggermuchOP3y ago

Thank you for this comprehensive response!

Your cost breakdown really puts matters into perspective.

> it bodes well for the OP that they talk about market regimes

I concur. Market regimes (modeling, detecting, reasoning about them) are too delicious of an intellectual puzzle to resist.

TrackerFF3y ago

In the end, it is just really, really difficult for outsiders to just enter this sector, if they have any hopes of working with any substantial amount of capital.

I guess the better option would be to make some product you can sell as SaaS to the masses, or figure out how to manage thousands and thousands of low-$ investors.

muggermuchOP3y ago· 7 in thread

Hi, fellow HN'ers! Author here, please let me know if you have any questions or thoughts!

krschultz3y ago

muggermuchOP3y ago

Thank you so much for your kind words! Your comment made my day! :)

dennisy3y ago

This is great! Thanks for writing this!

I have wanted to do something like this for a while, purely for learning. The thing which puts me off is that there is a huge amount of knowledge needed in understanding the features vs the ML.

Could you recommend a base system / reference one could use to get started which explains or bakes in some of the feature / signals engineering work?

Also would this approach work with crypto?

muggermuchOP3y ago

> Also would this approach work with crypto?

I seem to remember coming across papers that have tested momentum factors at larger time-frames (e.g. weeklies).

> Could you recommend a base system / reference one could use to get started which explains or bakes in some of the feature / signals engineering work?

defrost3y ago

EDGAR filings (structured text) is an area unto itself, I see you've limited yourself to quarterlies.

Some might argue that 'understanding' these patterns lead the changes in stock price movements, and give insight wrt weathering short term changes for longer term returns.

muggermuchOP3y ago

This is a very insightful remark, thank you.

1 more reply

awb3y ago

Thanks for sharing! Really curious about ML stock market models as they seem extremely difficult to outperform the market consistently over time.

A few questions:

1) Were these stock picks for major stocks / ETFs? Or small market cap stocks?

2) How many people were subscribed to your newsletter?

3) What do you estimate the impact was of creating a “self-fulfilling prophecy” of entering a position and then recommending your subscribers take the same position?

5) Have you backtested the engine to see what it would have returned in previous years? (Obviously on unseen data, rather than data it used for training)

gbasin3y ago· 6 in thread

If your predictions are good, I'd be happy to get you $100 million in assets to manage. It's very unlikely that your predictions are good...

Straw3y ago

As you say you have expertise in the area, any chance I could ask you for advice or how to decide if its worth trying?

gbasin3y ago

Sure ping me on twitter

notacop313373y ago

It's very unlikely that you're able to get OP $100 million in assets...

sbierwagen3y ago

"Inadequate Equilibria".

If you picked a HN comment at random, then the person who made that comment are overwhelmingly unlikely to raise $100MM at the drop of a hat. Picking a HN user at random won't do it either.

gbasin3y ago

I founded and ran a quant trading firm for about a decade :)

collegeburner3y ago

no one said it would be fast but plenty of people on here know how to talk to allocators and have helped raise before.

alpineidyll33y ago· 4 in thread

My heart goes out to this author, but you can tell even by his first table that he doesn't quite understand the mathematics of financial markets, the purpose of a hedge fund, how they grow etc.

For a high latency (ms) market data feed, the returns on information are very very small. Markets are efficient.

mrhands5563y ago

> That markets are entropy machines rougher than a normal distribution, and any gains come directly from information.

Isn’t this partially what this model is accomplishing with sentiment analysis?

Also has there been a lot of investment into sentiment analysis for algo trading? I’m sure there have but references including books would be interesting.

alpineidyll33y ago

No. Because to train the sentiment model you need estimable distributions.

maCDzP3y ago

I found your comments about rediscovering Kelly et al interesting. Could you recommend some textbooks that describes what you are referring to? If there are good overviews of the subject?

alpineidyll33y ago

If you can read and understand bergomi's book you basically understand financial math

idoh3y ago· 4 in thread

If you have a tool that can generate great returns, then why fall back to a newsletter?

muggermuchOP3y ago

Great question.

If I beat the market by 20% (say SPY generated 0% for the year, very optimistic at this point), and I have allocated $100k to this, I make $20k before taxes.

That's less than minimum wage.

Meanwhile, allocators expect a track record of at least 3-5 years.

Ideally, if I have an asset, I'd like to extract as much revenue as I can.

Hope this makes sense.

xapata3y ago

If you're sitting on a gold mine, you can wait 5 years. This does not make sense.

4 more replies

Straw3y ago

How difficult is it to get investors when you can show your model beats the market consistently?

Of course, they have to check your not trading a strategy with extreme tail risk, but here it sounds like that's not the case?

3 more replies

YetAnotherNick3y ago

1 more reply

jesuslop3y ago· 3 in thread

Nice report. How did you did risk management? Have you been leveraged? Have you paid for data? Kudos for a view from the trenches.

muggermuchOP3y ago

Thank you!

>Have you been leveraged? No leverage.

>Have you paid for data? Yes, my monthly running costs for data are ~$1.2k.

pneumatic13y ago

Have you looked into Kelly criterion?

1 more reply

JSGdev3y ago

what data sources did you use? im interested in working on something similar.

moeymo13y ago· 3 in thread

"steadily beating the S&P 500 for over a year on a weekly basis"

Can be achieved by chance alone.

If not chance, I can give you a strategy that would be highly likely to achieve such a result: it would take a lot of risk though!

I love it when people post this stuff to HN. Naive people try it, loose a bundle to market makers, then go back to their day job.

darawk3y ago

> "steadily beating the S&P 500 for over a year on a weekly basis"

moeymo13y ago

However, his write up is completely devoid of talk of risk (beta is not risk), bankroll, Kelly sizing, etc. This is integral to understanding the trade.

Most programmers with models like this fall into this last category.

JSGdev3y ago

usually the answer for a negative strategy is set aside a long period of data for backtesting, right?

darawk3y ago· 2 in thread

muggermuchOP3y ago

Water under the bridge I guess.

webinvest3y ago

Yea, mergers and acquisitions is a hard table to incorporate.

igorkraw3y ago· 2 in thread

Nice writeup, thank you for sharing so openly!

The three things I always want to know from stock picking ML people:

1. Did you put your own money in it ?

2. How'd it go?

muggermuchOP3y ago

Thank you for appreciating the article; I tried to disclose all that I could!

1. Yes, I did put my own money in it (low 6 figures).

2. It went as described in the article - for the capital I allocated to Didact, I beat the market (SPY) by ~20% since inception.

adamsmith1433y ago

>2. It went as described in the article - for the capital I allocated to Didact, I beat the market (SPY) by ~20% since inception.

6 more replies

Joel_Mckay3y ago· 2 in thread

Every gambler thinks they have a system, but often fails to recognize a game is unfair long before they arrived. lol =)

darepublic3y ago

You can think outside the box to beat the unfair game but then you end up in jail.

Joel_Mckay3y ago

Some simply build a portfolio by copying those who can't be charged for violating market rules. Not sure why some folks find this strategy so controversial. =)

Congress member holdings report:

http://clerk.house.gov/public_disc/financial-search.aspx

Senate member holdings report:

https://efdsearch.senate.gov/search/

2 more replies

ajoseps3y ago· 2 in thread

this is very cool! where did you get your data from and how's the transition to airflow?

muggermuchOP3y ago

timeserious3y ago

Let's get a write up of your backtesting framework too please! Terrific post @muggermuch - thank you!

1 more reply

unpwn3y ago· 2 in thread

Lmao this engine is down 6.9% for the year, when literally it's as simple as just buying some puts.

ramesh313y ago

> this engine is down 6.9% for the year

That’s pretty damn good, and still beating the market by nearly 20%. Of course you can always make more with riskier strategies.

Jabbles3y ago

You realise that puts have a cost that is determined by the market?

hendzen3y ago· 1 in thread

muggermuchOP3y ago

Thank you for the note! Just picked up Isichenko from the online bookstore we all love-hate.

I'd love to get in touch (as per your HN profile) - my email is am(at)principiamundi.com

antognini3y ago· 1 in thread

Have you considered submitting your predictions to the Numerai Signals? It's market neutral so as long as your models can generate some alpha you can still get good returns.

muggermuchOP3y ago

That's a good idea. I'll try it out, thanks!

sanp3y ago· 1 in thread

OP, what are you using to draw the diagrams? They look nice and are very readable.

muggermuchOP3y ago

Thank you!

I used Excalidraw (https://excalidraw.com), and I highly recommend it! It gives me 'xkcd' vibes.

mbarras_ing3y ago· 1 in thread

Brilliantly written. As someone considering a move into the Quant field it is very informative.

muggermuchOP3y ago

Thank you!

prabdude3y ago· 1 in thread

Excellent article

muggermuchOP3y ago

Thanks!

conformist3y ago

These are all things I'd want to know before deploying something like this. (Perhaps some mentioned in the post, might have missed them.)

asavinov3y ago

> I have always kept in mind is that feature engineering is almost always the key difference between success and failure

I also developed an ML-powered service heavily relying on feature engineering

https://github.com/asavinov/intelligent-trading-bot Intelligent Trading Bot

artirdx3y ago

This was not only a very informative read but felt like an amazing achievement if everything described here was developed by one person (the author - @muggermuch).

The breadth of knowledge demonstrated by author from technology (bringing performance down to 14 minutes) to ML to deep understanding of financial markets is super-impressive.

Granted the author has an educational background in computer science and has been a trader which probably explains many of his abilities but to my small brain it feels next-level achievement.

I never thought one-person could do it. I genuinely wonder how he managed to wrap his head across this much knowledge. He seem to fall in 10x category. Kudos!

drdrek3y ago

mywaifuismeta3y ago

As someone who used to work in the industry, I am 99.99% confident that you cannot have any alpha with a system like this, you are basically flipping coins, as some other commenters have pointed out.

wlamond3y ago

Thanks for the article! What tool do you use to create the figures? I like the sketch style.

For anyone else, what tools do you recommend for generating pretty for system architectures, workflows, etc.?

jshaqaw3y ago

"Predicting" markets isn't the challenge. Implementing real world strategies with associated frictions is. Show me a cash p&l or it's just a student project.

Gatsky3y ago

Do these frequent trading strategies ever account for taxation? Where I live, the returns compared to a buy and hold passive strategy would be cut by 50%.

ktiwari313y ago

Great post! Very informative! Thoroughly enjoyed it.

j / k navigate · click thread line to collapse