Graphql is super easy to understand, easy to deploy, easy to scale and easy to grow.
It’s not perfect - the lack of namespaces can be a pain, a few more standard types would be good, and mutations feel a bit under baked - but there’s much to love, and very little to dislike.
I don't know. They seem to be satisfied customers, and were simply optimizing an already working pipeline in anticipation of saving money.
As a side note, I haven't seen a single "We tried GraphQL and it failed us" story on HN. Not that they don't exist, of course. It's just that there doesn't seem to be much debate about its promise.
Why would it "fail". It is just one of many possible protocols to query data. It is like arguing about using this computer language vs that computer language. Bar difference in performance they would all work.
I love me some graphql, but that seems to be a very low figure. I’m curious how complex the queries are and what else these servers are doing.
Engineers are expensive, and growing more so every year. It's hard to justify time spent to optimise rather than throwing more instances at it. The cloud has made this worse in a way, since provisioning more hosts can be done so easily.
Not many engineers even have the skill to identify and resolve performance problems, so again, people just keep adding more machines. Long term, the problem slowly builds all over the system and the bill becomes mind-boggling.
I do think that we (the software folks) don't help ourselves here. We build frameworks and tools that are still far too hard to inspect. What to watch (and how to optimise) in production, is often never considered deeply when building or documenting the hot new thing.
It's perfectly reasonable, once the bill starts to catch up to your budget you spend time on optimizations :)
EDIT: the only solid argument against throwing machines at the problem is that: scaling something across multiple servers is hard. If you spent energy on performance, maybe you didn't have to.
I also agree that most of the devs I have ever worked with, in the UK, have little to no idea about how to actually test performance effectively.
Even though I am personally really interested in performance, even using a cool tool like Resharper profiler takes some time to get your head round.
Looking at this numbers makes me think that a single instance of properly written server running on a single dedicated piece of hardware can handle this without breaking a sweat. My servers for example handle thousands of requests per second. It looks to me like one giant waste of human and hardware resources. Not very "green" approach I would say.
The fact that you can get X 100K requests/per second best-case is not really the point. The point is if I don't want to write hand-cranked code for every kind of possible query, I take a performance hit as a result.
Not sure how easy it would be for them to identify poorly performing queries and split them out into their own optimised code.
This is how we end up with the architectures consuming orders of magnitude more computing resources and giant management overhead. Just because someone wants to be spared from a bit of thinking.
I can see how GraphQL would work for orgs with the massive scale like FB/Google/Insert your favorite. For the most of rest of the world it is nothing but unneeded overhead on resource both human and computing.
And of course cloudy people like Amazon would love you to use all this tech. The more you slow down your application the more resources you will be leasing from them so they get more money
But it‘s an issue that can be solved.
I do not need to try it. I know what it takes to parse/validate this kind of queries and then manage to get and assemble the results from numerous sources.
>"But it‘s an issue that can be solved."
No. This issue will not be solved as in general it is a problem of mapping one storage / functionality format to end client format. It can be easily solved for particular situations by writing custom servers (this is for example one of the things I do) but doing it generically introduces overhead / costs that are very unhealthy for a normal businesses.
And it is of course bad as it wastes energy.
I think it's regrettable that all the big money got behind GraphQL instead of aiming for solutions which provide resource granularity and shift decision-making to the client side. Who is better placed to know what resources they want than the client? A big advantage of HTTP/REST is that it either serves individual resources or a limited number of different collections of resources and it lets clients do the heavy lifting of figuring out which resources they need and how they want to combine them. Caching REST endpoints is straight forward and resilient to DDoS attacks because the variations in responses is strictly limited.
Also, it makes sense to move processing to clients when those processing costs are imperceptible to users.
Did you read the article? Most of the issues weren't related to GraphQL, they were just Node issues/optimizations.
> I think it's regrettable that all the big money got behind GraphQL instead of aiming for solutions which provide resource granularity and shift decision-making to the client side.
This is the stated intent of GraphQL. Literally the reason it exists.
Common ways to solve that are to whitelist the allowed queries or to cache at the resolver level instead of the query level.
With GraphQL the client specifies exactly what it needs. It‘s as granular as you can imagine, unlike REST.
What might be possible is double-level caching, so you cache underlying data and then query from that, the results of which are also cached.
I know, premature optimization and all that. But I think at the point where you're going for microservices because of scaling you really should also look into the lower level issues like that from the start. You should notice that the shiny library you're using is 100x slower than just writing plain code. And you should be aware of excessive allocations in hot paths.
If you combine this with API-gateway you’ve got caching (and potentially token auth) for free.
Over the years we had packed the server with almost all the graphQL optimizations we could find on the internet. The blog outlines some of the key optimizations we had put in to improve the performance of our application code (Which doesn't have a lot to do with GraphQL, as most people have already commented). I want to still give a bit of an "insider's perspective" as much as I can, so here it goes —
1. The graphQL team that did the optimizations had two engineers who were actively working on it. It seemed like a futile project at first. The goal was to find low-hanging fruits (if any) and prepare for our peak season (IPL 2021) but eventually, find other long-term alternatives. Killing graphQL altogether and moving that logic on the clients was still on the table. Fortunately, the team did a fantastic job of optimizing it so much that we are now committed to supporting it long-term.
2. We try to keep our microservices as discrete, pointed, and as unopinionated as possible. We also indulge the clients by letting them query huge amounts of data at once. All this makes our graphQL layer seriously complex. There is a huge amount of computation that happens on this layer. To get some perspective our /health call to the server is 10x faster than the most requested graphQL query. Needless to say, it's not a fair comparison because unlink the query, health doesn't make any network calls, or has any practical CPU load.
3. We have caching implemented on our graphQL clients, however, the reason we get such a high request rate, is because our concurrency is also very high. A typical user is barely making 10 requests in a minute but overall we achieve millions of requests in a second.
4. As a part of the long-term strategy, we did consider using Rust as our choice of the stack. We had heard a lot of noise about how RUST was beating all the benchmarks. So we did some POCs internally and implemented a part of our graphQL service in Rust. What we learned was that the Rust implementation was ~2.5x faster than our node.js implementation and also consumed relatively less memory. This was fine but wasn't good enough for us to migrate our large node.js codebase, and learn a completely new stack. Building a team with domain expertise in Rust in India is particularly hard.
5. It might seem like we are not pushing the production servers hard enough, you'd be surprised to know that it's true! Because our traffic is very unpredictable we like to maintain a comfortable CPU utilization for every possible extreme scenario that our Data Science team can predict. The risk of our edge layer going down is seriously revenue hitting. So even when our benchmarks say we can push the systems 5x more, the final call remains with Site Reliability Teams and the risk appetite we have for that particular game.
6. The blog briefly also talks about using multiple ELBs, to which we distribute traffic using DNS. The problem with DNS is that it doesn't guarantee a truly uniform distribution of the traffic. Even with a very low TTL, sometimes we observe a difference of more than 20% in requests/sec between two ELBs at an instant. This and other infrastructure-specific nuances have to be considered by the SRE teams to estimate capacity on production.
7. Lastly, the servers we use on production are small machines — 8 cores for the majority of our stack. This lies in the goldilocks area where we get the best cost to performance ratios. Scaling down or up the machine type has a significant impact on the cost.
It's been a journey of love and hate with graphQL and we continue to invest in making our edge robust and even faster. Feel free to connect with us on — https://twitter.com/D11Engg
This is the new link. Some how the old link doesn't work if you have the app installed.