It's usually very difficult and it takes a lot of money to run a proper fund.
Let's say you raise $50M. You can maybe charge 1 and 20,meaning you get 1% of assets each year for running the fund and 20% of profits.
1% of $50M( and keep in mind this is a large raise for someone without a track record on the sell side or inside another fund) give you $500,000 a year to pay:
- salaries( lets say you pay yourself $100,000 all in plus the same for a single analyst
- a Bloomberg terminal $30,000 including data feeds
- market data feeds you need $25,000/year for basic market data and fundamental data that you are allowed to warehouse(you can't store data you get from the Bloomberg terminal).
- rent $50,000/year for office space
- outside lawyer fees and outside accounting fees $100,000/year
- similar fees for someone to run your back office, roughly $100,000/year.
And on the other side of expenses you have the money making side of things. Which as the OP pointed isn't great. If you return 10% on the 50M you get to keep 20% of that so a 10% return gives $5M in profits and you keep $1M.
That allows you to bonus out yourself and analysts on good years. If you lose money one year then you get no bonus and have to bonus out the employees out of the retained earnings you kept from previous bonuses.
it usually gets worse as most funds have what's called a high water mark. This means you don't collect the performance fee until your fund gets back to the high water mark. So if you are down 10% one year you need to make that back before you start to make any performance fee, which is why most funds shut down if they go down more than 20%.
As to raising money.....Anyone can show a model that makes money. that doesn't mean its easy to create a model, its just that there are alot of people capable of building such a model.
Its the risk management that people with money are really looking for and sadly that's just really hard to show out of a model as part of the risk management is things like positions sizing and showing your model doesn't pile into one asset class or trade correlated products.
it bodes well for the OP that they talk about market regimes as, IMHO, this is one of the biggest risk management tools that aspiring traders ignore.
And this risk management is why people ask for a track record of more than a year.
Your first 1-2 seed investors will:
- Only pay 1 and 10 (1% fixed fee and 10% of PNL)
- They will also get ownership of the actual fund management firm and will get that in the form of 20% of REVENUE (not equity, revenue, think about that)
This is one reason new fund formation is way down. The economics are bad for years. Know a bunch of HF people that started vc-backed tech firms instead.
The other reason is 10+ year run where stocks, bonds, private firms and real estate just went up. No need for diversifying return streams.
Just a BB terminal around 30k and a lot of extra data from BB costs extra (can be 200-300k per additional product).
For quant strategy probably looking at 500k up to 2M for data initially. And you will likely be at a disadvantage to existing firms that have been collecting data for years.
And that is at the low end. Spent many millions per year for 1 strategy at last large firm. And that was small fraction of total firm spend.
I don't know too many people who have started funds in the past 5 years, but of the 3 who did, none gave up any ownership in the management firm.
That's a suckers game and the only people who would need to give up any ownership are people who are very green:)
> market data feeds you need $25,000/year for basic market data and fundamental data that you are allowed to warehouse(you can't store data you get from the Bloomberg terminal).
Total nitpick: you can get those using soft dollars.
But your numbers are spot on. In my job we estimate running a hedge fund with AUM < 250MM is just not worth it.
I have often found myself struggling to explain the difference between building a strategy or trading system (which reduces to a technical/intellectual challenge) and running a hedge fund (essentially running a complex information-driven business).
Your cost breakdown really puts matters into perspective.
> it bodes well for the OP that they talk about market regimes
I concur. Market regimes (modeling, detecting, reasoning about them) are too delicious of an intellectual puzzle to resist.
I think the best shot for any outsider programmer would be to seek (and team up with) those finance professionals that are already thinking about exiting to start their own funds, and in the need of some technical partner...but even then, you're also competing against experienced devs. already in the field.
In the end, it is just really, really difficult for outsiders to just enter this sector, if they have any hopes of working with any substantial amount of capital.
I guess the better option would be to make some product you can sell as SaaS to the masses, or figure out how to manage thousands and thousands of low-$ investors.
I have wanted to do something like this for a while, purely for learning. The thing which puts me off is that there is a huge amount of knowledge needed in understanding the features vs the ML.
Could you recommend a base system / reference one could use to get started which explains or bakes in some of the feature / signals engineering work?
Also would this approach work with crypto?
Some of it works on crypto. TBH I've stayed away from the asset class, but only because I find it difficult to build mental models and think about features (in my mind, it's a mix of commodity factors and currency factors, but I'd have to test it out).
I seem to remember coming across papers that have tested momentum factors at larger time-frames (e.g. weeklies).
> Could you recommend a base system / reference one could use to get started which explains or bakes in some of the feature / signals engineering work?
The references I put in at the end of the post will really help with this! I might actually write out a separate blog post about starting out in this space from an ML perspective. Thanks for the idea!
Across any market area (eg: mineral resources) there are thousands of documents released daily across multiple exchanges (via EDGAR, SEDAR, etc) ranging from two line advisories, to 4,000 page technical reports on projects | acquisitions, alongside the usual quarterly | yearly annual reports, etc.
There's plenty to do parsing common forms for generic changes (board members, board member share changes, etc) and market regime specifics (exploration property aquisition) and trends (series of related aquisitions) for those that like the weeds.
Some might argue that 'understanding' these patterns lead the changes in stock price movements, and give insight wrt weathering short term changes for longer term returns.
I focused on 10-Qs for the EDGAR filings module as you rightly pointed out - it seemed to be a good balance between implicit information and usefulness of the data. TBH I didn't actually investigate the other (many) patterns.
Having said that, I have really enjoyed Kai Wu's research from Sparkline Capital (https://www.sparklinecapital.com/), especially his extraction of the innovation factor from EDGAR filing texts. He's appeared in numerous podcasts, and they have all been super useful to listen to. Maybe someday when I re-investigate EDGAR filings and go further, I might target these signals you talk about here.
A few questions:
1) Were these stock picks for major stocks / ETFs? Or small market cap stocks?
2) How many people were subscribed to your newsletter?
3) What do you estimate the impact was of creating a “self-fulfilling prophecy” of entering a position and then recommending your subscribers take the same position?
4) Do you think your asset mix outperformed the market by picking high risk / high reward stocks in a bull market? Or picking safe stocks in a bear market? In other words, how do you know the engine wasn’t biased towards the market trend that happened to play out? For example, if I have a basket of tech stocks that would typically outperform in a bull market and flip a coin to buy or short them and guess right, I could outperform the market by chance. How did you account for this?
5) Have you backtested the engine to see what it would have returned in previous years? (Obviously on unseen data, rather than data it used for training)
As you say you have expertise in the area, any chance I could ask you for advice or how to decide if its worth trying?
If you picked a HN comment at random, then the person who made that comment are overwhelmingly unlikely to raise $100MM at the drop of a hat. Picking a HN user at random won't do it either.
But there's not a lot of finance-related submissions on HN. The people in the comments may be unusual. And off the top of my head there certainly is an existence proof for at least a few investors on the news site for Y Combinator, a VC fund, such as pg or sama. There are obviously other VC people on here as well.
1) It's plain by quickly looking at the allocation of capital in investment firms, that AUM is not made by performance; it's marketing. At best people invest when they believe a person is connected to inside information. Saying you have an ML advisor is really just a pre-req to these people.
2) Is that allocation stupid? No, it's not, because actually the powers of mathematics and by extension ML are intrinsically limited for investment returns because they are fat-tailed </Taleb>. For example this author quotes a realistic sharpe (0.8), but didn't calculate the standard deviation in his sharpe, which I would bet a large sum was _at least_ 0.8. Ie: he doesn't really know what his sharpe is. This is because equity assets behave like a student-t distributions with a degree-of-freedom parameter ~2 or less </Mandlebrot, /Bergomi, /Gatheral etc.>. Ie: higher moments such as uncertainty in sharpe, literally do not exist or converge and are unknowable. The only exception is if your strategy explicitly cuts off tails.
Once you understand 2) you begin to understand that there's no such thing as a real quant fund (ie a fund which truly makes money predictably using models) which doesn't trade a liquidity limited book that has quite advanced hedging. Wealthy people are aware of this, which is why the author can't market this product.
If you're doing something silly like holding equities without tail risk control, you literally cannot be quantitatively investing. You are just slowly rediscovering what Kelly, Bergomi, Mandlebrot, Bernay's etc. realized with a little deep thought over pen and paper (while clumsily writing boilerplate software.) That markets are entropy machines rougher than a normal distribution, and any gains come directly from information. (see: Kelly: "a novel interpretation of the information rate".)
For a high latency (ms) market data feed, the returns on information are very very small. Markets are efficient.
Isn’t this partially what this model is accomplishing with sentiment analysis?
Also has there been a lot of investment into sentiment analysis for algo trading? I’m sure there have but references including books would be interesting.
If I beat the market by 20% (say SPY generated 0% for the year, very optimistic at this point), and I have allocated $100k to this, I make $20k before taxes.
That's less than minimum wage.
Meanwhile, allocators expect a track record of at least 3-5 years.
Ideally, if I have an asset, I'd like to extract as much revenue as I can.
Hope this makes sense.
Of course, they have to check your not trading a strategy with extreme tail risk, but here it sounds like that's not the case?
>How did you did risk management? I put in a basic position management layer (1% fixed stop). Also, the market regime module would modulate participation, i.e. in really risky environments it would dial down the number of stock picks. I can definitely do much more on this front, but I wanted to nail down the stock picking first! :)
>Have you been leveraged? No leverage.
>Have you paid for data? Yes, my monthly running costs for data are ~$1.2k.
Can be achieved by chance alone.
If not chance, I can give you a strategy that would be highly likely to achieve such a result: it would take a lot of risk though!
I love it when people post this stuff to HN. Naive people try it, loose a bundle to market makers, then go back to their day job.
If you're going to make a claim like that, you should actually follow up with the calculations. When you do that, you'll realize that the issue is quite a bit more complex than this shallow dismissal.
He has very low correlation to the index, which means he's not just levering beta and getting lucky on a trending market. His standard deviation is smaller than the index, which means he didn't just make one large and lucky bet. The evidence that he has real alpha is certainly not incontrovertible, but the numbers look quite good.
It also doesn't appear that he cherry picked his reporting/aggregation cadence, because he sent out a weekly newsletter ex ante, and all his stats are reported weekly. He could still just be lucky, but his numbers are much better than this sort of dismissal would imply.
One real risk is that, in some implicit way, he's pursing a negatively-skewed strategy. That is, one that has a latent large downside risk. Strategies like this can produce very good looking numbers for longish periods, but still have ultimately negative alpha. Judging whether or not that is the case here is hard without more detail, but nothing he says in the writeup indicates to me that that is the case here.
However, his write up is completely devoid of talk of risk (beta is not risk), bankroll, Kelly sizing, etc. This is integral to understanding the trade.
For example, he could have a successful strategy that works in small lots. However, absent from nearly every ML model is the impact on sizing up. As soon as you post a sizable bid, the market will lean against you, and the edge evaporates. Same if you cross bid-ask, plus you're now giving up edge. ML cannot take this into account, at least not very easily and with the usual models.
Most programmers with models like this fall into this last category.
The fun cases are the ones where it's not even clear what the right answer truly is, e.g. company A spins out company B, and then 5 years later they re-merge. Who's time series and associated data is "the" canonical one? The data vendors often try to give their answers to this question, but maybe their answers don't make sense for your analysis.
Then there's the fact that a lot of vendors don't really do point in time correctly. They like to go back and helpfully revise data points for you that they or the company initially misreported. This is all well and good except that if you were trading for real, you wouldn't have known the correct information at the time, and so any backtest based on the updated information will be invalid. Vendors are a bit better now about providing true point in time data sets, or at the very least accurately describing when they are/aren't doing this. But we had a few cases where they said they were, but they definitely weren't.
Also, this was really my first time with equities - my professional trading career was derivatives-focused - both listed (CME) and OTC (FX forwards/swaps). I think I lost the first few months simply trying to reorient my style of thinking.
Water under the bridge I guess.
The three things I always want to know from stock picking ML people:
1. Did you put your own money in it ?
2. How'd it go?
3. How well does your engine do vs a fixed stock allocation based on trend-statistics computed on the whole time window (i.e., compared to a fixed optimal portfolio computed with mean/std values you don't have access to, but which isn't allowed to change its choice. what's the regret if you are familiar with online learning)
1. Yes, I did put my own money in it (low 6 figures).
2. It went as described in the article - for the capital I allocated to Didact, I beat the market (SPY) by ~20% since inception.
3. If I understand your question correctly, this would be the equivalent of the payoff on an optimal lookback option (https://en.wikipedia.org/wiki/Lookback_option). I haven't actually done that analysis, but it sounds like a nice idea.
This seems extremely hard to believe. You should be running a multi-billion $ Quant fund if this is the case. The idea that you would try to push this as a newsletter rather than just taking investor money and becoming a billionaire literally makes the story seem farcical.
Congress member holdings report:
http://clerk.house.gov/public_disc/financial-search.aspx
Senate member holdings report:
I'd love to get in touch (as per your HN profile) - my email is am(at)principiamundi.com
I used Excalidraw (https://excalidraw.com), and I highly recommend it! It gives me 'xkcd' vibes.
These are all things I'd want to know before deploying something like this. (Perhaps some mentioned in the post, might have missed them.)
To first order, I'd forget about fat tails and similar popular concerns. They matter, but not as much as structurally understanding what this model is up to. Perhaps one feature is explicitly selling tails? That might answer it already.
I also developed an ML-powered service heavily relying on feature engineering
https://github.com/asavinov/intelligent-trading-bot Intelligent Trading Bot
Its difference from Didact is that this intelligent trading bot is focused on trade signal generation with higher frequency of evaluation. It is more suitable for cryptocurrencies but also works for traditional stocks with daily frequencies so it could be adapted for stock picking. What I find interesting in your work is the general design of such kind of ML systems relying on feature engineering.
The breadth of knowledge demonstrated by author from technology (bringing performance down to 14 minutes) to ML to deep understanding of financial markets is super-impressive.
Granted the author has an educational background in computer science and has been a trader which probably explains many of his abilities but to my small brain it feels next-level achievement.
Maybe I live in average circle of finance but I have never met nor heard of a person who could single-handedly conjuncture and implement such a system. To my knowledge, a typical hedge fund has several highly-paid people in different teams to build and maintain such a system.
I never thought one-person could do it. I genuinely wonder how he managed to wrap his head across this much knowledge. He seem to fall in 10x category. Kudos!
As someone who used to work in the industry, I am 99.99% confident that you cannot have any alpha with a system like this, you are basically flipping coins, as some other commenters have pointed out.
For anyone else, what tools do you recommend for generating pretty for system architectures, workflows, etc.?