It’s as much an example of how far world class talent can go, as it is about doing more with less.
I feel like it’s similar to how people point to Craigslist as evidence that you can still build sites in Perl - ignoring the fact that Craigslist has Larry Wall on a retainer.
Running highly scalable monoliths is easy! As long as you’re willing to hire some of the five to ten people in the world who are capable of advancing the state of the art of development on that technology stack…
Stack Overflow didn't need these optimizations. They could have just deployed 20 servers instead and still been profitable. People optimized just because they like to.
I truly believe that being able to design and run a modular monolith application effectively (not talking about the 'hyperscale' scenario here) should be a prerequisite for designing and running a set of interconnected microsservices. The challenge is similar, but dealing with modular monoliths has the advantage of not having to deal with the uncertainty of networking programming (i.e. remote calls, network error handling, distributed transactions).
Looks like its expanded a little since then
Which, to be clear, is not intended to be a negative statement about that "other stuff". It really depends. Some is. But I've also seen things just done poorly by applying tools wrong, e.g. ORM misuse leading to thousands of queries that should have been one OUTER JOIN.
But I don't think you need engineers of their unique calibre to get most of what they got. It's probably an exponential thing, if you have some merely good engineers you could maybe achieve 80% of their performance. The last 20% are just much more costly.
I know that in many cases simple != easy but I can't help feeling sad while reading this.
When I started my career cloud wasn't yet mainstream bu as a beginner I was able to deploy and configure a nginx proxy and loadbalance between 2-3 backend servers without too much effort. It wasn't some kind of rocket science.
I guess the current issue is that cloud has been marketed so much that nobody who's just starting out in the industry even has a second thought about using it by default. What can I say, great job from the cloud providers in capturing their customers as soon as they get in front of the store.
> EF Core 6.0 performance is now 70% faster on the industry-standard TechEmpower Fortunes benchmark, compared to 5.0.
> This is the full-stack perf improvement, including improvements in the benchmark code, the .NET runtime, etc. EF Core 6.0 itself is 31% faster executing queries.
> Heap allocations have been reduced by 43%.
> At the end of this iteration, the gap between Dapper and EF Core in the TechEmpower Fortunes benchmark narrowed from 55% to around a little under 5%.
https://devblogs.microsoft.com/dotnet/announcing-entity-fram...
Again, this isn't to take anything away from Dapper. It's a wonderful query library that lets you just write SQL and map your objects in such a simple manner. It's going to be something that a lot of people want. Historically, Entity Framework performance wasn't great and that may have motivated StackOverflow in the past. At this point, I don't think EF's performance is really an issue.
If you look at the TechEmpower Framework Benchmarks, you can see that the Dapper and EF performance is basically identical now: https://www.techempower.com/benchmarks/#section=data-r21&l=z.... One fortunes test is 0.8% faster for Dapper and the other is 6.6% faster. For multiple queries, one is 5.6% faster and the other is 3.8% faster. For single queries, one is 12.2% faster and the other 12.9% faster. So yes Dapper is faster, but there isn't a huge advantage anymore - not to the point that one would say StackOverflow has tuned their code to such an amazing point that they need substantially less hardware. If they swapped EF in, they probably wouldn't notice much of a difference in performance. In fact, in the real world where apps, the gap between them is probably going to end up being less.
If we look at some other benchmarks in the community, they tell a similar story: https://github.com/FransBouma/RawDataAccessBencher/blob/mast...
In some tests, EF actually edges past Dapper since it can compile queries in advance (which just means calling `EF.CompileQuery(myQuery)` and assigning that to a static variable that will get reused.
Again, none of this is to take away from Dapper. Dapper is a wonderful, simple library. In a world where there's so many painful database libraries, Dapper is great. It shows wonderful care in its design. Entity Framework is great too and performance isn't really an interesting distinction. I love being able to use both EF and Dapper and having such amazing database access options.
No doubt EF has probably gotten to that level since MS has done a stellar job with .NET core of relentlessly slimming things down and improving performance.
Thread says SO allocates 1.5TB RAM to SQL Server. Sounds wise.
If the data is sitting in memory, and you've tuned extracting the data from memory as fast as possible, job done.
It's almost always relatively normal sized services split by functional area e.g. Auth, Cache etc.
At that point there is no 'cloud' design that can help. Its either one database (or maybe just shard everything onto thousands of distributed nodes)
But the point I am trying to make is that kubernetes and microservices etc are based on idea of winners - power laws. One tweet everyone wants to read. One search term, one viral video.
Then again. This is just a question of taste - the taste of the dev lead. What (s)he feels is best approach. Take another company doing the same thing and different approach might emerge.
This question does not appear to be about programming, Closed.
hyped-up technologies
subjective, Closed
problems caused by over-engineering
Opinion-based, Closed.
If you need that level of performance you need to go bare-metal, and this is where you'll hit a lot of roadblocks (yet they will be happy to spend 10-100x more money trying to make do with the cloud).
My current hobby is to try and run monolithic apps like these on serverless services like cloud run. There's still some pain related to attaching persistent storage to a container but otherwise it feels like a great option.
So if you were to implement this same architecture using Kubernetes or Serverless it would be as equally simple as a bunch of Ansible or Puppet scripts.
If you want to run it on Kubernetes I hope you know how to install/maintain K8S on-prem, because there's no way you're going to get this level of performance from any cloud provider (not at a sane price anyway).
Fro my limited experience many engineers fall in the trap of adding accidental complexity to an otherwise simple architecture just by trying to use the latest/coolest cloud architecture trend.
Monolith in the cloud on kubernetes? Speak no such abomination. Of course we have to do microservices, the more the better. How can we scale otherwise?
SQL DB? What is this, 2010? Of course we're going to use Cosmos DB, how else could we get "single-digit millisecond response times, automatic and instant scalability, along with guarantee speed at any scale".
Of course I'm exaggerating for dramatic effect but I rarely see teams disciplined enough to keep cloud architectures simple and clean.
It isn't clear to me this is a model that would work elsewhere, or should be held up as something to be replicated.
Did they save time? Did they save money? Did this help make SO a wildly successful company? Did it allow them to deliver features to customers faster?
If you're still growing and more interested in delivering tons of features quickly, and/or don't have the ability to attract world leading talent, then a more complicated architecture with clear boundaries is often a better call than delivering relatively few features with obsessive rigor in a monolithic codebase.
Servers:
SQL Servers (Stack Overflow Cluster)
2 Dell R720xd Servers
SQL Servers (Stack Exchange “…and everything else” Cluster)
2 Dell R730xd Servers, each with:
Web Servers
11 Dell R630 Servers
Service Servers (Workers)
2 Dell R630 Servers
1 Dell R620 Server
Elasticsearch Servers (Search)
3 Dell R620 Servers
HAProxy Servers (Load Balancers)
2 Dell R620 Servers
Redis Servers (Cache)
2 Dell R630 Servers
VM Servers (VMWare, Currently)
2 Dell FX2s Blade Chassis, each with 2 of 4 blades populated
4 Dell FC630 Blade Servers (2 per chassis)
2 Equalogic SAN PS6000-series
Machine Learning Servers (Providence)
2 Dell R620 Servers
Machine Learning Redis Servers (Still Providence)
3 Dell R720xd Servers
LogStash Servers
6 Dell R720xd Servers
HTTP Logging SQL Server
1 Dell R730xd
Development SQL Server
1 Dell R620
Network:
2x Cisco Nexus 5596UP core switches (96 SFP+ ports each)
10x Cisco Nexus 2232TM Fabric Extenders (2 per rack)
2x Fortinet 800C Firewalls
2x Cisco ASR-1001 Routers
2x Cisco ASR-1001-x Routers
6x Cisco 2960S-48TS-L Management network switches (1 Per Rack)
https://nickcraver.com/blog/2016/03/29/stack-overflow-the-ha...Is there a particular reason to suggest a change to the architecture?
[1] https://twitter.com/sahnlam/status/1629713954225405952/photo...
It's easy to interpret that as "stackoverflow should change to be like this", but I think it was meant to be more like "If I had to guess how stackoverflow works, this is what I think it would look like".
It's amazing how much performance and scalability you can get out of computers, if you don't burden them with 100x overhead caused by shoveling data between microservices all the time :-)
It's easy to interpret that as "stackoverflow should change to be
like this", but I think it was meant to be more like "If I had to
guess how stackoverflow works, this is what I think it would look
like".
That's not a better interpretation. It says something (something not good) about the mindset of modern software engineers that the first thing they think of when they look at a website like StackOverflow is a n-layer microservice architecture, with more moving components than a Swiss chronometer.I have a subjective feeling that Stack Overflow is down a lot more than other websites. I don't see that ever mentioned in the discussion of cloud vs on-prem which makes the discussion seem lacking.
Randomly, packets time out on the internet, I would take this random dashboard with a grain of salt, we cannot be sure SO had a outage just because one request happen to fail.
I have personally seen Stack Overflow be "under maintenance" or straight up down a lot more than I have seen entire us-east-1 down.
A hidden taken away is that NVMe storage databases are so fast, they are comparable to in-memory (redis) databases these days.
Yes, but maybe not as much as you’d think.
I've always heard (and it made sense to me) that to reduce latency of requests from across the globe, you might want to have read replicas or caches spread on global infrastructure. Then how is it that stack overflow is fast here when the db is on-prem, 7 seas across from me? Any amount of RAM should not account for the distance, right?
This is one advantage of server-rendered HTML (though that's not the only option you have).
It also helps that StackOverflow is light on interactivity. You load a page, read for a minute, then maybe click a vote button or open a textarea to discuss. As long as the text and styles load quickly, you won't notice if progressive enhancement scripts take a little more time to load.
One of the only well known sites to do so, I think?
At KotlinConf in April I'll be giving a talk on two-tier architecture, which is the StackOverflow simplicity concept pushed even further. Although not quite there yet for social "web scale" apps like StackOverflow, it can be useful for many other kinds of database backed services where the users are a bit more committed and you're less dependent on virality. For example apps where users sign a contract, internal apps, etc.
The gist is that you scrap the web stack entirely and have only two tiers: an app that acts as your frontend (desktop, mobile) and an RDBMS. The frontend connects directly to the DB using its native protocols and drivers, the user authentication system is that of the database. There is no REST, no JSON, no GraphQL, no OAuth, no CORS, none of that. If you want to do a query, you do it and connect the resulting result stream directly to your GUI toolkit's widgets or table view controls. If what you want can't be expressed as SQL you use a stored procedure to invoke a DB plugin e.g. implemented with PL/Java or PL/v8. This approach was once common - the thread on Delphi the other day had a few people commenting who still maintain this type of app - but it fell out of favor because Microsoft completely failed to provide good distribution systems, so people went to the web to get that. These days distributing apps outside the browser is a lot easier so it makes sense to start looking at this design again.
The disadvantages are that it requires a couple more clicks up front for end users, and if they have very restrictive IT departments it may be harder for them to get access to your app. In some contexts that doesn't matter much, in others it's fatal. The tech for blocking DoS attacks isn't as good, and you may require a better RDBMS (Postgres is great but just not as scalable as SQL Server/Oracle). There are some others I'll cover in my talk along with proposed solutions.
The big advantage is simplicity with consequent productivity. A lot of stuff devs spend time designing, arguing about, fighting holy wars over etc just disappears. E.g. one of the benefits of GraphQL over plain REST is that it supports batching, but SQL naturally supports even better forms of batching. Results streaming happens for free, there's no need to introduce new data formats and ad-hoc APIs between frontend and DB, stored procedures provide a typed RPC protocol that can integrate properly with the transaction manager. It can also be more secure as SQL injection is impossible by design, and if you don't use HTML as your UI then XSS and XSRF bugs also become impossible. Also because your UI is fully installed locally, it can provide very low latency and other productivity features for end users. In some cases it may even make sense to expose the ability to do direct SQL queries to the end user, e.g. if you have a UI for browsing records then you can allow business analysts to supply their own SQL query rather than flooding the dev's backlog with requests for different ways to slice the data.
Our main production "infra" was a load-balanced pair of medium CPU front-end servers and a high-memory back-end for the SQL server. Theirs was approximately 20x the size, and a more "traditional" cloud microservices, etc. infrastructure. Optimization makes all the difference. So many of the "extras" just add unnecessary complexity, just like avoiding those "extras" probably does when they actually are required.
But ultimately, a db like SQL Server or Oracle will just let you use lots of connections without breaking a sweat. They're both threaded and fully async, it's a much more efficient model.
That's a little bit arrogant no?