“ - Ctrl-r for reverse-search through history - typing 'ps' to find the process status utility (of course) - pressing Enter,....and realizing that Ctrl-r actually found 'stopserver.sh' in history instead. (There's a ps inside stoPServer.sh)”
Didn't notice I was still SSH'ed into "the" server which was at the time a single point of failure for my entire project, and as a lowly not-an-IT-person-just-a-developer in our corporate environment, I didn't have access to the machine to go power it back on. And the IT people I knew who could help had gone home for the day.
Felt super dumb writing that up in the downtime log the next day.
Having read this article, it makes me super glad I'm working on very niche slow-paced stuff which, when goes down for ~12 hours, is a minor annoyance to our users rather than "you're costing us millions of $currency per minute" :-)
> The post is quite poor and suffer a lot from hindsight bias.
> Following article is so much better
> https://www.kitchensoap.com/2013/10/29/counterfactuals-knight-capital/That is to say I have repeated what I was told. And it’s funny — I was all set to go to battle stations over this: 1) The close is not the same as how bad it was intraday. 2) Yes, the SIPC and CFTC have controls and he was able to access his account eventually after the profit opportunity was gone. 3) He was a sophisticated investor, and if Knight had retail accounts he might have been with them.
But in retrospect it’s too clever - it’s much more likely in my estimation that the dude in question with whom I worked was simply full of shit. He tells a bad beat story and it’s not like any of us asked to see statements. It never even occurred to me to doubt it until now.
Source: used to work in algo trading for GS, so this was my job, also had friends at knight during this debacle.x
The article states “The NYSE was planning to launch a new Retail Liquidity Program (a program meant to provide improved pricing to retail investors through retail brokers, like Knight)”
This pretty strongly implies Knight was a retail broker.
I assume I’m missing something- can you clarify?
Automate deployment? Fine but boring. That's the prevailing dogma today. I don't remember where the devops hype train was in 2012. Package management had already been a solved problem for years even though it was (and continues to) be regarded as involving too much "icky reading" and a repository system using plain directories on vanilla webservers; all way too unoptimized for resume padding.
Learn how to identify and manage risk like an engineer? Understand how business process and software can implement risk controls and mitigations?
I kid, so I don't cry.
The lesson from this article is kind of funny
>It is not enough to build great software and test it; you also have to ensure it is delivered to market correctly so that your customers get the value you are delivering
While true, I don't see any indication this was great software or that it was properly tested.
>Had Knight implemented an automated deployment system – complete with configuration, deployment and test automation – the error that cause the Knightmare would have been avoided.
Or to put it another way - had Knight implemented a higher quality deployment system than the quality of any of their other systems, they might have avoided this issue.
These stories are never about a single thing gone wrong. The whole point about critical systems is that you should need dozens of things to go wrong for them to fail, and then you should fail safe.
Deployment is a component. Monitoring is a component. They are also OpEx and therefore "inferior"
Anyway, the guy told me that they had multiple big red physical kill switches so that they could immediately turn things off if shit ever hit the fan with their systems.
If you have ever spent time in Michigan you'll notice that the manufacturer test vehicles have a big ass red button on the dashboard to kill the vehicle in case something goes wrong.
I cannot imagine doing anything remotely close to this sort of thing without a big ass red kill switch on my desk.
This may have something to do with the fact that killing a HFT bot without some kind of orderly wind-down might leave you with some very expensive open positions.
It's good though; poor decisions must have a cost. The only way to enforce good engineering practices that are human time intensive is for there to be a cost not to.