Many will cast stones - but they have been there too. If they haven't, well maybe their day will also come. You may feel bad at the moment - but the best way professionally forward is "We try our best tomorrow"
The prioritization problems may not be due to ignorance or malice though, and may be justifiable if there are other fires that are burning brighter. It's still pointing to problems though, and I think it's completely legitimate for engineers to question the stability of the company when this sort of thing happens.
At the very least as an engineer I would be asking some pointed questions of my leadership. Maybe not dusting off the resume yet, but still I'd want to get reassurance from internally that the leadership problems that caused this are being addressed.
Or I'm talking about a 200 node hadoop cluster thats doing the electrical metering and billing for 8 million people, and is NOT allowed to stop.
Or the trading platform thats running sub millisecond trades and downtime means 300,000 $ USD per minute.
These are systems I have engineered over the last 10 years, and I can say: These things are complex and have failures in 1000 different ways, and while you're monitoring 999 of them that one thing you're not looking at is festering under the surface (your monitoring system is tracking IRQ hardware interrupt response times, right???)
Part of being in a team is everyone pulling together, and yes it's stressful at the time, but even very good management cant see all ends, just like very good engineering cant predict everything. I don't think it's useful to start pointing the finger at management and "asking some pointed questions at leadership" because sometimes everyone is doing their best. Yes we should analyse our failures so we can do better, but your tone is very accusatory, and I believe that a better approach is an all inclusive chat about how we can do better, and management saying "great job engineering" for fixing it, and giving them a break after the stressful event.
And FWIW, they have down time every day and weekend, at least in a virtual sense; the load does drop off in a very real sense too. You are spiritually correct, they should pull together and sort it out, and they owe nobody money here (don’t use a discount broker if you want some sort of guarantee about trades) but as a general rule you should ever feel too sorry for banker under just about any circumstances. The harshest lesson here, for everybody, was the only thing they would do for you was give you some commission free trades but that won’t work with this one, so a non-apology is what you get.
I think the issue here isn’t so much that the system went down but the blog post.
It’s very light on details and doesn’t go far enough in terms of re-establishing trust with the customers that were affected. Which by the looks of it is everyone attempting any trade most of the day on Monday.
Sure, don't burn people at the stake, but "hey, it's hard, don't blame them, they are doing their best" doesn't cut it for me. I'm sure they're expecting to be paid and not for someone to "do their best" to pay them.
I mean, I'll bite. Assuming you only traded 6 hours a day (ie US) that'd be a 27bn dollar a year strategy, and the only way for returns to be linear and trading to be sub milli is market making/arbitrage.
That is a lot of half spreads...
I understand GP's tone wasn't exactly nice here. But here's the rub with RH's outage. RH is unfortunately in an industry (Finance, Healthcare, Aviation, Food, etc.) where people _need_ to trust them to be successful. The consequences of failure in these industries is very catastrophic not only for them but their clients. Sure failures happen but the scale at which RH has failed and the lukewarm response they've put out has pissed off people. I don't recall any brokerage, old or new, that has failed so catastrophically and has responded to it so poorly. If you think you have a worse example, I am all ears.
I’m taking my account off their platform.
Your smaller point about prioritization is spot on though. I dont believe Ive seen any similar incidents lead to business ending outcomes. I personally point to sony or, more recently, equifax as examples of the disparity between actual business impact and technical abhorrence. In light of that why is it worth trying to preemptively solve technical challenges instead of business needs? Every calorie spent on “what if” subtracts from “whats needed.”
In case anyone is interested: https://www.amazon.com/Show-Stopper-Breakneck-Generation-Mic...
I think I speak for everyone here if I say that, if that report is public and interesting, everyone on this thread will be happy to get you a drink.
Their success helped to pressure companies such as TD and Schwab to mostly get rid of commissions as well, which is great for the average trader
I think Robinhood has a lot of problems, but to say they're not pushing any boundaries ignores the huge changes they've brought to the industry.
The fact that Robinhood is telling people anything about the outage is only because they are the company they are, operating in the startup world/mentaity.
To the people thinking they should be compensated in some way...If you are doing >$1m daily volume, maybe you can contact them to see what they can do but even then, I doubt it. The way this should be handled is to have multiple executing brokers. You can implement offsetting positions if needed and transfer positions when your main account becomes available, if you are using a broker that can clear. Right now it seems Robinhood is working to implement clearing but you could still go to neutral or put on your positions.
Yep. Intercontinental Exchange and Eurex, two huge capital markets exchanges, routinely have multi-hour outages and don't even acknowledge that they've happened, let alone explain them.
Anyone who has used RH regularly should be well aware of how inept it is. Any spikes in volume or volatility, even on a single stock, bring it to it's knees pretty often. Like not just the last week, but even during calm periods. I've personally lost 20-30% on positions solely because RH was bugging out, thankfully I use RH just for "fun trades" usually <$100.
I cannot fathom having the balls to trade any real amount of money on the platform while being aware of these long term issues.
On the flipside I feel for new users and perhaps even generally inactive users who weren't aware of RH's incredible flakiness. I'd imagine (or hope to) the losses of most of those users were small, assuming they were new or casual and just testing the waters.
Even if one of my small plays hit it big on RH, the money would just go to my main account on TD (which has been smooth all week shy of a few hiccups Fri morning during record volume). It's been obvious for a long time that RH should not and cannot be trusted. If you're trading options with a $60K account on RH, well, I don't even have words for that level of ignorance.
Problems with my data I can tolerate up to a point. Problems with my money I absolutely can not tolerate. As you said, it's unfathomable how people can trade money on a platform that's flaky.
Complete outages are rare, and well-publicised, but things go wrong a lot more[1] than you might think without any communications to customers that anything is wrong, sometimes outright denying[2] that there's a problem.
I think your point is that it's a very different mindset to the native internet world, and that is certainly true!
There are no public details about the root cause.
I think RH is bad for people in general, but this pile-on is outrageous.
RH has constantly had issues at least since I started using it over a year ago. I didn't notice it really at first, but I also didn't know much about anything trading related back then. It didn't take long though for me to have my first "incident" where my market orders were seemingly vanishing into the abyss as the underlying moved. I'm not talking seconds, I'm talking minutes. For a market order on high liquidity options. Never mind trying to get filled at anything besides the ask (buying) or bid (selling).
RH has had serious underlying issues for a long time now. This incident didn't happen in vacuum. The writing has been in huge block letters on the wall for a long time.
1. Dealing with other people's money 2. Monitoring/managing other people's health
It is confirmed they are worse than virtually any reputable brokerage. It might not be their fault directly but its 2020, not 1998
Personally it doesn't pass the smell test for me. The load was much higher the previous week and load problems go away once the load disappears. They probably had a lot less load the rest of the day, so the fact they were down the entire day suggests it was something else. I would need a fully transparent post mortem before I believed anything they said.
We wrote a bit about this here: https://landing.google.com/sre/sre-book/chapters/addressing-...
I would strongly caution anyone who thinks this subject is trivial, just add a bit of load shedding and you're done. I wrote a bit about my team's work (including a simplified view of some of the considerations that go into how we do retries) here: https://landing.google.com/sre/sre-book/chapters/handling-ov...
Monday morning puts were down - it was obvious the market was recovering in a big way. Instead of cutting losses at ~20% in the morning they lost ~99% of their position. Some lost 100% since the options expired EOD.
Robinhood makes the most money than any known firm on Wall Street by getting paid specifically to leak user's trades to other traders.
SEC requires a periodic report on that which shows compensation.
Can't believe people are still buying Robinhood's pitch of misdirection.
https://cdn.robinhood.com/assets/robinhood/legal/RHS%20SEC%2...
https://www.google.com/url?sa=t&source=web&rct=j&url=http://...
Former market maker here.
Retail flow is low risk. If I buy $100mm of institutional flow, I could get a bunch of corporate hedging orders. Or I could make a single bet against George Soros. With retail, one tends to find lots of small orders. Even if there are some with high information, i.e. they're smart money and I'm going to lose money trading against them, they're small enough to be manageable.
Retail is also low information. At an old job, we bought a prominent retail broker's options flow. The number of in-the-money unexercised options that would come through that pipe was mind-blowing. (Today, whoever was buying Robinhood's flow likely got the same.)
Even for Cloudflare, I thought the company will get sued out of existence after the proxy data leak, but finance industry/SEC etc is a completely different ballgame.
Just look at the top questions in their email:
* Are the funds in my account safe? Yes, your funds are safe.
* Was my personal information affected? No, your personal information was not affected.
* Can I use my Robinhood debit card? Yes. If you have a debit card, you should have been—and should still be able to—use your card, but you may have had issues receiving notifications, viewing your balance, and seeing transactions in your app.
------------
The real question is: How is Robinhood compensating for the missed trades?
Stop asking yourself the easy questions, RH.
Even if the trades were well-defined at the time the outage occurred, there would still be an asymmetry between people demanding compensation on their profitable trades while eschewing losses on their bad trades. It's doubtful any brokerage would be willing to eat that.
Execution risk is a risk.
> During periods of heavy trading and/or wide price fluctuations ("Fast Markets"), there may be delays in executing your order or providing trade status reports to you. […] Schwab is not liable to you for any losses, lost opportunities or increased commissions that may result from you being unable to place orders for these stocks through the Electronic Services.
The reason nobody will be compensated here is due to two things,
(1) There is no way to determine what a fair execution would have been, since clients couldn't submit orders in the first place.
(2) Clients will adversely select their losing trades for corrections and this would bankrupt Robinhood in about five minutes.
Source: work at a wholesaler.
https://www.schwab.com/public/schwab/nn/agreements/schwab_br...
Maybe in some cases they go above and beyond their account agreement if they like you as a customer, but according to the agreement you sign with them its not their problem if things go bad in this way.
On the flip side, clients have no guarantee that there would have been a counterparty for their order.
E.g. Buying TD in Canada, and wanting to sell on NYSE for US$.
It's no different than you breaking your phone or losing your network connection. Nothing is guaranteed to work all the time. RH might face fines for the extended nature of the outage though, specially since they've managed to avoid them for plenty of past mistakes so far.
It follows that Robinhood must never reimburse for outages.
This blog post doesn't appear to say anything. It's not an apology, it's not an explanation, it doesn't say what they're going to do in response.
This is after the incident in which there was no status updates or support availability for multiple hours of time. Why can't they commit to updates every hour or every 30 minutes?
Unless I have an SLA with a provider outlining penalties, they don't owe me anything if they go down. How is this any different?
They may not have a legal/contractual obligation here, but that doesn't mean that treating their customers poorly is without consequence.
While RH's ToS does theoretically absolve them of technical issues, they are obligated to comply with 'best execution' securities mandates, no? Separately, it'd be extremely bad for business if they refused compensation.
The point is moot anyway, since they're offering "case-by-case" compensation.
Arbitration is forced, but Robinhood is on the hook for the fees for everyone who decides to arbitrate. Robinhood users might not get anything, but they can still cause pain.
On the advice of any good lawyer.
Of course, no one complains when RH makes a mistake in the client's favor.
So people who were going to continue to sell off got lucky that they couldn't make that trade, and people who were going to buy got unlucky?
Does anyone seriously expect compensation, or think that it's deserved, or is it group wishful thinking? How would it even work? Would they just take people's word for their supposed intent? Or are people wanting some sort of "here's a gift card" type deal?
This is not to defend RobinHood - I've personally kept my money with well-established companies cause conservative, old, proven systems seem like a good thing for a product in this space - but shit happens, no? There will be more good days, and more bad days, in the market, it's a long-run game anyway, and it's pretty easy to vote with your wallet in this space.
Of course the problem with the "compensate me" arguments is that a lot of people were going to make decisions that would have turned out poorly yesterday (indeed, the market is balanced and every transaction has a counterparty), though of course with the amazing clarity of hindsight few would recognize or admit that. So if they need to compensate for illusory lost trades, do some people have to pay them for losses they would have incurred?
[I get that there are some complex options that can legitimately be all downside when trading isn't available, but that's a less common option]
Actually, if anyone knows of another broker who _doesn't_ charge these, please let me know. If you're first for the broker I'll give you $20 for the tip.
If it's chump change you're trading, sure, use RH.
If it's serious money, the $0.65/contract or whatever pays for itself many times over. Even if it's just the ability to regularly get filled between the spread it pays for itself.
Why? Well that tiny DNS server has certain capacity constraints and if you don’t cache DNS lookups by using a http/https agent for example (in NodeJS) you wind up looking up the same dns info over and over and churning sockets like it’s going out of style. If you run really really hot the poor thing falls over (rightly so).
The limits are high and DNS is fast so you usually don’t notice but when you are under load bugs like this come out of the woodwork. When it falls down you look up the AWS docs, lean back in your chair upon finding this isn’t an “elastic” part of AWS and say “FUUUUUUUUCK” so loud it can be heard from outer space.
If you are Robinhood though don’t you have some former Netflix SRE/DevOps beast on staff that knows this and so you run your own DNS and monitor it?
Apparently not on Linux! https://stackoverflow.com/questions/11020027/dns-caching-in-...
Every Unix system having a local caching DNS proxy was and is as much a norm as every Unix system having a local MTS. A quarter of a century ago, this would have been BIND and Sendmail. Things are more variable, now.
To illustrate that this was considered the norm, here is a random book from the 1990s. Smoot Carl-Mitchell's _Practical Internetworking with TCP/IP and UNIX_ says, quite unequivocally:
> You must run a DNS server if you have Internet connectivity. The most common UNIX DNS server is the Berkeley Internet Name Daemon (BIND), which is part of most UNIX systems.
People sometimes think that this is not the case nowadays, and the fact that a computer is a personal computer magically means that a Unix or Linux-based operating system should offload this task and not perform it locally. They are wrong, and that is DOS Think. Ironically, they don't even get to play the resource allocation card nowadays. The amount of memory and network bandwidth that needs to be devoted to caching proxy DNS service on a personal computer is dwarfed by the amounts nowadays consumed by WWW browsers and HTTP(S).
There's no similar argument for a node in a datacentre.
Ideally, not only should every machine have a (forwarding/resolving) caching proxy DNS server, every organization (or LAN, or even machine) should have a local root content DNS server. A lot of (quite valid) DNS lookups stop at the root with fixed or negative answers. Stopping that from leaving the site/LAN/machine is beneficial.
Ironically, putting a forwarding caching proxy DNS service on the local end of any congested, slow, expensive, or otherwise limited link is advice that I and others have been handing out for over 20 years. It's exactly what one should be doing with things like Amazon's non-local proxy DNS server limited to 1024 packets/second/interface.
* http://jdebp.uk./FGA/dns-server-roles.html#ChoosingProxy
So the question is not whether there a local DNS cache mechanism exists. It's whether it's set up by the company dishing out the VMs, and if not why not. Amazon provides instructions on how to add dnsmasq, and clearly labels this as how to reduce DNS outages. So it's not even the case that Amazon is wrongly discouraging having local caching proxy DNS servers.
* https://aws.amazon.com/premiumsupport/knowledge-center/dns-r...
Every DNS request for external domains turns into 10 if you don't explicitly configure FQDNs (dot at the end). This is because in the default configuration the resolver runs with ndots 5 to search all the possible internal Kubernetes and cloud-provider names. Then you have lookups for IPv4 and IPv6 in parallel. So for every external name you look up, you storm the upstream DNS with 10 requests for non existing domains.
Furthermore, the current default DNS service in Kubernetes doesn't have any kind of caching for these kinds of lookups (especially not NXDOMAIN) enabled.
But like I said, this is one of the first issues you hit running Kubernetes on Amazon. It is widely known and can easily be fixed by scaling up some more instances, changing ndots settings, using FQDNs or configuring caching. There is no way that this was the issue, it is plastered all over the internet, the logs are clear and the fixes can be implemented in minutes.
It also doesn't go down completely, the rate-limiter is packets/s on the interface.
And now with the fed rate cut the interest on cash is only 1.3%, with more cuts expected later in the year, which was the last big differentiator. I don’t see how they don’t see massive net withdrawals going forward.
This isn't really an issue because the fed rate cut impacts everyone. Other institutions will cut their interest rates as well. I know of a few banks (Canadian) that have already lowered their GIC rates.
If anything, this is actually good for RH. Now instead of comparing 1.8% at RH and 1% at another Financial Institution, you're comparing 1.3% and 0.5% -- a much bigger multiple.
Founders should be fired. CTO/CIO should be replaced.
Based on the in information from Robinhood's careers site, their platform is largely based on the following technology stack:
- Python, Django, Django Rest Framework
- Go
- PostgreSQL
- Container and container orchestration technologies (Docker, Kubernetes)
- Microservice-oriented architectures and related OSS technologies (Kafka, Celery/RabbitMQ, nginx, Redis, Memcached, Airflow, Consul)
- Cloud-native infrastructure (AWS, GCP)
- Infrastructure as Code and configuration management (Terraform, SaltStack, Ansible, Chef, Puppet)
- CI/CD and test automation frameworks (Cypress.io, Jenkins, Appium, UIAutomation, Bazel)Why would you use RH instead of a normal, mainstream brokerage like Vanguard, Fidelity, etc that already has (1) an app and (2) commission-free trades?
As a secondary answer, normal, mainstream brokerages have pretty bad tech, tbh. I don't expect it to be worse than Robinhood in terms of things like security, and I expect UX to be worse. (Side note: I just discovered that Vanguard actually has a secret security key option hidden under Account maintenance, so I can finally switch from sms 2fa. +1 to Vanguard.)
It looks like you still need security codes setup:
"You'll need to register for both security codes and security keys, however. That's because keys and codes go hand in hand—if you lose your key or don't have it, we'll need to send you a code in order for you to log on. In addition, you'll always need a code to access your accounts from a mobile device."
If an attacker can skip the security key you might as well not use one.
The best do not go down like that.
What a sad press release, I am sure people at their corporate office were sweating over this. The long and short of it is that users trusted the service would work and had possibly a great deal invested only to get a comment when everything breaks down deflecting blame "OMG we weren't prepared for what our users did!"
We live in a sad state of software. I expect things like this and the Equifax scandal to continue if things like software security, reliability, and performance aren't taken into account.
Does the name still stand?
I'm just a spectator but I can not imagine that this was somehow caused by a DNS failure.
Another give away that this is a lie is that support emails were getting a stock postfix error message which means that MX records at least were resolving.
I would think Vanguard did that already. Most people should be trading ETFs, not individual stocks.
How does "free" = "democratizing"? Stocks have been easily accessible for years to retail investors.
> their presence pushed a lot of big players to adopt the same offering
Misleading, big brokers were already going down this path.
Here's some more inside info ...
If your "financial app" provider doesn't have a banking charter, run. None of the recent trendy fintech companies have a charter, and are thus clown cars.