Classic arrogance and naivety from a Casey follower. Name call all you want, you can't hand wave the reality that the business determines the requirements, and in my industry they don't care about performance until it's a noticeable problem. Oh and the requirements they gave you are solving problem X when they really want to solve problem Y, so your optimal solution to problem X needs to be deleted.
I write correct, readable code as performant as it can be in the time I'm allotted. Call me incompetent, but it's what I'm hired to do.
All this bickering and harsh feelings are stemming from the author's inability to understand that different industries have different priorities.
Yeah, I've done several dozen 10x or more performance improvements on our codebase. It's not always trivial, but most of the time it's not super-hard either.
In fact just today I did a 10x speedup of a query. After a couple of hours analyzing the issue, the fix was relatively simple: populate some temp tables before running the main query. A bit more complex than just running the query, but not terribly so.
Why hadn't we done that before? Because a customer suddenly got 1000x the volume of the previously largest user of that module, and so performance was suddenly not acceptable. It's 5 years since we introduced the module...
To a first approximation, the reason modern software is slow isn't due to failure to optimize this algorithm or that code path, but rather the entire pancake stack of slow defaults — frameworks, runtimes, architectures, design patterns, etc. — where, before you even sit down and write a line of "business logic" code, or code that does something, you're already living inside a slow framework written in a slow language on top of a slow runtime inside a container with a server-client architecture where every RPC call is a JSON blob POSTed over HTTP or something. This is considered industry standard.
The "business requirements" guy is basically saying, I have to ship this thing by friday, I'm just going to pick the industry standard tools that let me write a few lines of code to do the thing I need to do. Ok, but that's the tradeoff he's making. He's deciding to pick up extremely slow tools for the sake of meeting his immediate deadline. That decision is producing unacceptable results.
It's not enough just to say people have different priorities. Selecting an appropriate point on multivariate system of tradeoffs is part of the skill of being a programmer. And if there's no point on the curve that delivers acceptable results in all categories — if, given a certain set of tools, it's not possible to ship quickly and deliver acceptable performance — then it should be an impetus for the programmer, the craftsman, to find better tools, improve his skills, push the "production curve" outward, until he can meet all the requirements.
For instance, a large percentage of modern programmers don't really know how to program from first principles, and tell the computer to do precisely and only the thing it needs to do. Essentially they only know how to glue tools together. Then in their head they're like, well gee, given that skillset, I could either (1) spend a bunch of time optimizing "hot spots," writing crazy algorithms, heroically trying to fight through all that slowness... or I could just (2) deliver the business logic and call it a day. Then they call this "prioritizing business requirements." No, there's a third, alternative, better option, which is to use better tools, which might initially be harder and more time consuming and less ergonomic to use, and then learning to get good with those tools, putting in the practice, recognizing patterns, thinking faster over time, coding faster... all of this is part of what mastering the trade of programming is about.
At the end of the day, there is just an ethic of self improvement and craftsmanship that is totally missing from programming today, and it surfaces whenever this debate comes up.
The problem with this line of reasoning, eg "if you write slow code you're incompetent", is that it applies to everything that programmers do - if you write slow code you're incompetent, if you write buggy code you're incompetent, if you write undocumented code you're incompetent, if you write untested code you're incompetent, if you write code slowly you're incompetent, if you write code that doesn't fulfil all the requirements perfectly you're incompetent, and so on for each and every measure someone dreams up to measure how good code is.
You'll very quickly find there are no competent developers.
The other things are just proxies for the real measures, that people made up, and in fact are often harmful to the main goal. Like "documenting code" and "writing tests" a lot of the time are just cargo culting to make people feel like they're being responsible and following "best practices" without actually improving the measures that matter. I think that the other unlisted metrics in your "every measure someone dreams up..." are likely to fall under this category.
There isn't an infinite number of possible measures like you're suggesting, there's a finite number and a rather small number at that. You can definitely be really good or really bad at quickly shipping performant bug-free code that does its job. The problem in this debate is that one side is completely ignoring one of these measures, and trying to claim that it's because they have to prioritize the other ones, and that this is just an inevitable tradeoff, rather than that we lack the skill as an industry to do all of these things at an acceptable level. Being a good programmer may involve more axes than being a good chess player, but I think the claim that there are so many axes that it negates the existence of programming competence reductios to absurdum pretty quickly.
Working with your type sucks. Confidently incorrect all the fucking time.
Sure, and what I'm saying is that there are many scales, and all of us are at the incompetent end of some of them.
Yeah, people like that exist... That has a strong and reverse correlation with competence.
When FB dove into the numbers, they found they could save 50% in hardware costs. That _does_ matter.
It may be the case that your company has a 20k server rack to support 20M of sales and so it doesn't matter. But unless you've actually looked up the cost you shouldn't be making the claim because it's unbacked; it's just a bad faith arguement.
The cost to rewrite your system to be twice as fast is about the same if you have 20k of hardware costs or 20M, but one saves 10k and the other 10M. Only one offsets the cost of the programmers
You don't need to do a deep analysis to tell if optimization is needed or not, it is enough to assume best-case improvements and compare it to your FTE + overhead cost.
Do you have many of your servers run CPU-bound C++ apps you wrote? If not, don't bother eliminating class hierarchies, optimize your core logic instead.
Do your webapps spend most time waiting on database and microservices? See if you can eliminate or cache those, the wins are going to be much bigger than rewriting it in Rust.
Is your GraphQL compiler too slow? Unless you are FAANG with hundreds of thosands of developers and hundreds of people to spare, you will get much more bang-for-the-buck with some smart caching or just getting a bigger machine for CI jobs.
According to levels.fyi, the average senior SW engineer in San Francisco is $312k/year. With overhead, the actual cost is likely $500k/yer or so. There is _a lot_ of servers you can buy for that money before you can justify maintaining your own custom version of the existing software solution.
Is it terrible that the app takes minutes to add a few thousand numbers? Of course it is. But it does not matter, the customer is used to software being terrible so he won't waste the time and money to switch to another software that is probably also terrible.
On a related note: https://xkcd.com/359/
Does Atlassian look like they care about the performance of their products? Is it a significant factor in their sales?
Does the average ERP or CRM system compete on performance?
[1] https://www.atlassian.com/blog/platform/cloud-performance-up...
startups
The crud app at your startup probably doesn’t need to worry about cutting optimizing every last millisecond off of page load or scaling across some k8s cluster or whatever. But I’ve stopped using (or never seriously tried) otherwise good services because of flow-stopping performance problems i.e. multi-second page loads, stuttering pages, battery drain, etc.
You could say that are the targets for "not care for performance"... and they ARE. *A LOT*.
One of my main competitive advantages? Speed. I work on niches were my app has 2/3 reports and my competitors +300, and the users use my app instead. Like, for example, a query instead of using +20 minutes work in 6 sec(1).
Mind you: That was before I do Rust, and start pay more care about performance. Long ago!
(1) Today I tried to hit the <1sec. Still could be better. But like this post say, many other stuff also need to be better so have the balance across the broad is challenging.
This video was clearly about debunking your excuses, which he did succinctly.
Yes, it would be great if all developers could write performant code, but let's face it - there's only so many hours in a day and days in a week. Developers already struggle to keep up with all the required knowledge. It's not that people don't want to be competent. We're building more and more complex things while expanding the number of people employed building software which means average skill level is probably slightly decreasing.
So if most people unlearned the _more complex_ way of programming, they might end up with simpler _and_ faster code.
There are plenty of devs that are just there for the paycheck and have minimal passion or interest in learning more.
The entire startup ecosystem is based on extracting from the economy people who are motivated, do their work efficiently and learn high skills, and giving them millionaire-or-bankrupt stakes to ensure they’re committed.
They are paid to do a job. Just have to specify what that job is in a correct way.
Dismissing people as just being there for the paycheck - that's you and every other developer.
The number of people who produce useful OSS projects because they just like the project and aren't filling out resume points to hope bigtech senpai notices them is astronomically small, and usually the sort of political project nobody wants to touch.
If you care about performance from the beginning, you would never even get to the point where you’re saying “oh crap, our messenger app can barely run, let’s rewrite it in C”.
The point is that performance is so far down the list of things to prioritize that [company of choice] made it to [revenue] without having to care at all about it. It wasn't worth them focusing on until they had already acquired a very large marketshare. Only when already large and successful did the scope and complexity of their system impact performance enough to bother focusing on it, at which point they begrudgingly did.
----
Performance is simply a requirement for the product, some products require very good performance, some don't.
Where you fall is a discussion to be had, because like ALL product requirements, it comes with a cost to develop and maintain.
I don't drive an F1 car to the grocery store. I don't take my minivan to the track.
The article above is bad - it treats a conversation about product requirements as an antagonistic space, where voices that may prioritize a feature other than performance aren't making judgements about how to allocate limited resources, but "excuses"... Worse - he cherry picks the most extreme take of those value judgements for his examples as easy targets to attack.
No. The authors point is that you don’t have to drive an F1 car to the grocery store. All you have to do is stop slashing your tires, dumping sugar in your gas tank and driving a Kia.
The amount of mental gymnastics you people go through to keep your delicate little worldview from crumbling is truly a sight to behold. Thank god we have people like Casey in this world to balance out the bell curve.
Need to loop over some order lines and find distinct article numbers? Use a hash-based set with O(1) access, not just a list which will have O(n). If not you'll end up writing an O(n^2) routine for no good reason, which will work swimmingly on you 10 line test order and cause grief in production.
I don't think a lot about performance most of the time, just enough to try to avoid silly stuff.
IMHO the key to not writing silly code is to have a good mental model of how your code works and how it fits into a bigger picture which includes OS, hardware, network and other services (like a DB). This model should be updated from time to time based on benchmarks and metrics from production services but once you have this knowledge it takes almost no additional time to apply this in practice and even if you'll be taking shortcuts to save development time you'll be better aware of this tradeoff.
However n^2 algorithms will bite you. They may be outside of your everyday working set, and even dogfooding you may not have larger datasets than all of your users. At some point one user will hit a dimension that you "expected to always be small" and get bad performance.
If one of these O(1)N algorithms ends up in the hot path you can always optimize it when profiling points it out. But O(N)N algorithms will leap off the cold path and become hot when some user starts using the "wrong" data patterns and the developers may never be aware, or at least not aware until it is too late.
On one hand, Google created the JS-Engine V8 which finally made the web fast. But on the other hand, did nothing to fix the RAM issues that came with that until recent years.
So I see it as "Let's do performance only when it's an advantage for us". Which all of the OP examples falls to. OP want to conclude from that that performance=advantage so everyone need to work on it all the time (because who will ignore advantages?), but all we see is if(performance=advantage){refactor}. Which does not support his conclusion.
Example cases are startups, monopoles, high cost (like refactoring bank code) etc. which have other silos of advantage.
But even if that turns out to be true, it still means programmers have to care about performance! It just means they need to learn two modes of programming: “throw-away”, and “performant”. There would still be no excuse for dismissing performance as a critical skill, because you always know the throw-away version has to be replaced with a more performant version in short order.
That kind of argument is great. We should have it. What we should not have are excuses — claims there is no argument to be had, and that performance somehow won’t matter anywhere in a product lifecycle, so developers simply don’t have to learn about it.”
Now in the public cloud my gimmick is to switch VMs over to current-generation AMD EPYC models, and then collect my pay-check for the "performance tuning".
The fruit is hanging so low that I'm trampling over it. There's no actual fruit picking going on.
Any tips?
My general principle: don’t give uni-directional advice when optimizing a u-shaped loss function.
I usually find that those failing to advocate the nuanced position “you can spend too much AND too little time on perf, here is how to prioritize” are not adding useful information to the conversation.
The truth is, most startups don’t need to worry much about perf. It’s a feature that your customers don’t usually ask for at first. At the other end of the scale, giant companies invest huge sums in taming performance. And your own situation will have more parameters than just that one simplified spectrum.
Measure ROI honestly, and prioritize accordingly!
It makes sense that people want to use a programming language that allows for rapid prototyping if they want to get V1 out of the door. That's fine. But that doesn't explain why languages like Python and Ruby are incredibly slow in the first place. These are 20 year old languages that people have invested a ton of effort into. And despite a full decade of trying to make the languages faster they're still slow to point of near unusability.
These languages, along with PHP and Javascript, were designed by people who paid zero attention to the performance consequences of the design decisions they made. Today, after industry having spend untold billions (yes, with a B) on trying to get the performance to an acceptable level getting anything written in these interpreted languages to run at an acceptable speed is painful.
We are collectively wasting untold hours because our programming environments are bad, the languages are bad, the libraries we use are bad, and the standards are bad. This is not to say it's all hopeless, but we could be doing a lot better if we didn't focus so much on "moving fast and breaking things" and pursued quality instead.
That's just ONE out of the five excuses he mentioned. And just because you haven't met people who claim performance never matters doesn't mean they don't exist.
> Yadda yadda
You're basically agreeing with Casey but make it sound as if he's saying the opposite from what you're saying.
If you want to understand what he means with this watch the refterm series. https://www.youtube.com/watch?v=pgoetgxecw8
If you say that maintainability, extensibility, legibility etc. etc. is more important than performance then you are, in fact, saying that performance is NOT important.
But those are mostly just the excuses that Casey is ranting about.
There's nothing inherent in writing fast code that makes is not maintainable, not extensible, not legible not etc., not etc.
In fact, the things that make code less maintainable and less legible are often the things that make is slower.
Things like replacing new Foo() with FooFactory and replacing FooFactory with FooFactoryProvider and then encoding logic in XML files.
Those are things that people do way more often that writing SIMD intrinsic.
And there are those who way overemphasize supposed "legibility" over performance.
In JavaScript land you get legion claiming that the world will end if they rewrite unnecessarily slow map(), filter(), forEach() etc. with a regular for loop.
I read the article and didn't find it convincing, I would argue that it's clearly not always the most important thing. This isn't an "excuse" it's a calculation that teams make, the author feels that people make the tradeoff at the wrong point but instead of making that argument he frames decisions not to prioritise performance as "excuses" which is bullshit. There's always more performance optimisations one can make and there always comes a point where it just doesn't make sense to do so for all manner of reasons.
A trivial example: I have a script which downloads a few thousand GIS Shape files and converts them into geoJSON. It runs automatically once a month, usually whilst I sleep. A run takes about 5 minutes at the moment but there are a couple of things I could do to make it run in a fraction of that time but then the script would be two or three times longer and more complex, and I'd have to spend a couple of hours writing and testing code, (there'd also be some edge cases that I'd need to account for which the current setup allows me to ignore). I judge that to be a waste of time which would make anyone who has to take ownership of this script in the future's life more difficult. So that's my "excuse" and I'm sticking to it.
If anything, the article points out the accuracy and value. of the five metrics for evaluating performance needs over other business needs.
> For example, one argument would be that the evidence I’ve presented here is consistent with a strategy of quickly shipping “version one” with poor performance, then starting work on a high-performance “version two” to replace it. That would be completely consistent with the evidence we see.
> But even if that turns out to be true, it still means programmers have to care about performance! It just means they need to learn two modes of programming: “throw-away”, and “performant”. There would still be no excuse for dismissing performance as a critical skill, because you always know the throw-away version has to be replaced with a more performant version in short order.
> That kind of argument is great. We should have it. What we should not have are excuses — claims there is no argument to be had, and that performance somehow won’t matter anywhere in a product lifecycle, so developers simply don’t have to learn about it.
Who is he even talking to at this point? He's arguing that performance has more than 0% relevance to software, which no-one on disagrees with.
>claims there is no argument to be had, and that performance somehow won’t matter anywhere in a product lifecycle
No-one says performance never matters. People disagree on what performance merits optimization. A 5s to 1s page load improvement is massive, a .5s to .1s improvement starts hitting diminishing returns in user experience.
And the research Facebook (and others) have done clearly shows that a half–second loading time on a webpage loses them a lot of money. So much money that they were willing to have dozens to hundreds of engineers work for years to fix it across all of their apps and their web servers and apis. Presumably it costs a small–time business a similar percentage, but if that’s not enough money to justify paying a developer to fix the problem then by all means spend the money elsewhere.
Generally you will be arguing with a person who is considering the opportunity cost, but has considered it so many times and performance always lost, so they start saying (but not meaning literally) never.
To this person you need to make the argument, "This time is different because..." and avoid the strawman arguments from the article.
I recently “inherited” a couple of back-end services when a developer left our company. It turned out that the code was terrible and that they haven’t used, any, or our helper tools. Since we use Typescript everywhere ignoring our quite opinionated and a little fascist linter rules is almost impossible, but the developer in question had the authority to turn it off, which they did and in doing so shot themselves completely in the foot. The back-end services were developed in JavaScript more than Typescript and since both our linters and usual teat pipelines were disabled, and since it’s software that has been developed over almost a years worth of changes, it was just horrible. We’re talking loops comparing values, that were probably there once but are now just sorting things as undefined === undefined kinds of terrible.
The performance was also atrocious. Basically what the service did was gather info on a couple of thousand projects and link them with tens of thousands documents in Sharepoint, but because it was build wrong, it wasn’t pulling the correct documents and it was taking 5-10 minutes each run time. It’s now running at around 10 seconds for its complete run time. Which is a massive performance improvement, and it’ll be even better once I finish building the cashing. So you might think I’m inclined to agree with the article, but I didn’t rewrite it because of its poor performance, I rewrote it because it didn’t work correctly and the performance gains were simply a happy “coincidence”.
This is because the performance didn’t really matter. Yes, it was costing us at most $77 a over our 3 year Azure contract, but the time I spent rebuilding it cost the company almost exactly $1500. Those $1500 were well spent because it wasn’t working, but would they have been well spent in terms of performance? Not really. That being said, it wouldn’t take a lot of those services to become expensive, so it’s not like the author is really wrong either. It’s just that I’m confused with whom he is arguing.
In my career I've meet people who genuinely don't care about performance. For them code either works or doesn't. Back in the days when everyone used bare hardware if a developer would push a change which suddenly requires 10x more hardware (or makes 10x more requests to other's team service) it would break the system and force to do something about it - either order more hardware (takes time) or fix the software (usually faster). Nowadays cloud would auto-scale and such change would go unnoticed, the only difference will be the next month's cloud bill.
> [bad developer burns a year worth of salary building something that doesn't work]
The cost you cite for the rewrite implies a week or two worth of work. Why did this developer spend an entire year on it?
Really bad developers are really bad, yes. Double work is very expensive.
> I didn’t rewrite it because of its poor performance
> ... massive performance improvement, and it’ll be even better once I finish building the cashing.
So even with the massive performance improvement, you still need to improve the performance even more? It sounds like you would of had to rewrite parts of the system even if it was decently written to begin with.
The usual application where GC is most relevant is connected to 10s of services over some network protocol, and often has to load each business object/entity it operates on into memory to check permissions, sometimes even issue another network call, etc. Loading that entity from a list of pointers is absolutely no bottleneck in such a setting, at all - the way you might speed up the app is doing the work on the database itself, which is already written in some low-level language, efficiently.
Your anecdote is just a badly written application, I have plenty examples of very performant ones written in managed languages.
> The usual application where GC is most relevant is connected to 10s of services over some network protocol, and often has to load each business object/entity it operates on into memory to check permissions, sometimes even issue another network call, etc. Loading that entity from a list of pointers is absolutely no bottleneck in such a setting, at all - the way you might speed up the app is doing the work on the database itself, which is already written in some low-level language, efficiently.
No my example isn't a complaint on it using GC language. It is that it is loading and unloading business entities multiple times and doesn't take into account that operations could be done linearly much faster instead of hunting a field in object graph to accumulate in various ways. That is how e.g. numpy works if you make sure that the array you do your operation on is of the same type.
> Your anecdote is just a badly written application, I have plenty examples of very performant ones written in managed languages.
Yes following the ideas of what is considered SOLID principals. The objections isn't managed vs. unmanaged it is going full OO and not giving a shit about how computers work.
If the app would scale a lot I would still need to go into all that trouble but at this early stage the benefits of really simple infrastructure is immense.
It can also scale your power usage up from 1KW to 1000KW.
If you have a datacenter that is 99% idle, you may be able to maximize utilization by switching to Python.
But Ockham Razor tells us they did spend significant effort on performance because they had a reason to do it.
I believe the average developer should care somewhat about performance, and depending on their industry they might need to care a lot, but I'm not so convinced for the average case.
The average developer is not working on FAANG-sized codebases. Also, I'd imagine any large systems built up over the years that are refactored would likely see great performance gains. That's just the nature of long-term software.
Here's the author's five points, and how at least one of the examples he gives proves the reasons.
No need. These companies operate on the leading edge of hardware performance, on purpose. They can't just go out and buy faster hardware, it doesn't exist. Google even builds their own, just to optimize for their uses.
Too small. Again, at the scale of Facebook or Netflix, a 5% performance gain translates to an enormous advantage, which leads directly to the next point.
Not worth it. Here again, we're talking about saving millions of dollars but only because the systems are so enormous.
Niche. Facebook, Twitter, Netflix, and Uber's performance needs are a niche of their own.
Hostpot. Here we can get to a specific example the author quotes. "Cutting back on cookies required a few engineering tricks but was pretty straightforward; over six months we reduced the average cookie bytes per request by 42% (before gzip). To reduce HTML and CSS, our engineers developed a new library of reusable components (built on top of XHP) that would form the building blocks of all our pages."
So their Facebook does have a hotspot, it just happens that it's a very large spot on a colossal size system.
Finally, the author says, "If you look at readily-available, easy to interpret evidence, you can see that they are completely invalid excuses, and cannot possibly be good reasons to shut down an argument about performance."
I'm still looking for the evidence.
Wow. You know you have made bad choices when you need a desperation play like that.
I feel like pointing this out has become table stakes in any conversation anywhere. Thinking of printing it out on a card and carrying it around.
Two examples:
1. I've seen people mentioning that following good programming practices make the code slower, and by removing them you can have improvements around 40%. That sounds like a great number, until you realize the real bottleneck are other things (e.g. database queries, network latency, etc). When you calculate the overall improvement for the request, the gains are negligible.
2. There are some frameworks that market themselves as crazy fast: "If you use us your app will boot almost instantaneously!". Looks cool, until you realize that a good pipeline will gradually rollout a new version and this will take time. Usually it comes with monitoring the new version for a while and then after it's deemed healthy we switch versions completely. Now instead of waiting a few minutes + 10 seconds, you will wait only a few minutes, which doesn't make much difference.
Performance gains will come with tradeoffs and, before committing to that, it's a good idea to evaluate what are the real benefits of doing the changes we're planning to do.
In reality their project was death by a thousand...neigh million or billions...of cuts. Poor technology choices. Poor algorithm choices. Incompetent usage (e.g. terrible LINQ usage everywhere, constantly). This was the sort of project where profiling was almost impossible because any profiling tool barfed up and gave up at every tier.
Profiling the database was an exercise in futility. Profiling the middle tier was a flame graph that was endless peaks. Profiling the front-end literally crashed the browser. I ended up having to modify Chromium source to be able to accurately get a bead on how disastrously the Angular app was built.
This is common. If performance doesn't matter to a team, it will never be something that can be easily fixed. Maybe you can throw a huge amount of money at the problem and scale up and out to a ridiculous degree for a tiny user base, but making an inefficient platform efficient is seldom easy.
Yes at Facebook with end user facing software it is crucial to get performance right. If you're running payroll at a 3 person company it doesn't matter if the software is inefficient. Most of the time it's Excel, and that is not the most efficient way to do those calculations. But it's not worth investing in a better solution until processing payroll is a problem.