1) GraphQL almost always means everything is POST, removing CDN and browser caching of GET-like requests is gone (and ServiceWorker caching just got much more complicated, nigh-impossible if CORS is involved). Everyone says "oh, clients can do better caching", as if that's not true without GraphQL. Still, the caching I mention might be trivial and mostly worthless. I'd just like to see some actual inspection of the issue.
2) The models I've seen work well if your frontend is largely a thin skin over the services with minimal business logic of their own. (this isn't GraphQL directly, but the client libs that use it, but those exist because talking GraphQL without them is more effort). Which is, of course, what we really want. Business logic in the front end is always a painful idea. But it also definitely happens, for real business reasons - are we making those cases harder? How much so? With REST we have a lot more flexibility, it seems, even if we choose to avoid using most of it.
Its easy to argue that the main advantage of GraphQL is to reduce overfetching fields on the data you need for your views. That's a great advantage. But how much of the performance advantage gained from this is offset by the substantially reduced backend cacheability of these requests? I would guess a ton, especially with highly complex views that require lots of database pulls.
That isn't to say simple caching strategies aren't still possible (you can encode GraphQL requests into GETs and just cache that url at the CDN layer, this is part of the spec AFAIK). But when you have an open API serving many users, where you can't predict what fields they're going to ask for (or even the ORDERING of those fields in their request, which would change the request body despite the response being the same!), this has to be a problem that crops up pretty quick.
There's no HTTP-level solution to this. I doubt there's any solution that would work well enough to be worth implementing. Which leads me to believe that its an intrinsic problem in GraphQL; the more freedom you give clients to request whatever they want, the harder it becomes to guarantee performance for the requests they're making. And GraphQL gives clients all the freedom in the world.
Oh, and don't even get me started about the fact that because GraphQL stitches together essentially depth-unlimited data from your data graph in one request, there's no way to express different TTLs on each item returned, on the backend or the frontend. If you've got data that could TTL for 24 hours, but another piece of data that TTLs for 60 seconds, you essentially have to specify the cache-control to account for the smallest TTL.
Overall, I think GraphQL is fine. But I also believe what we'll slowly discover is that "Company X" simply fucked up their REST API, then will look to GraphQL to solve all their problems. And it might solve some of them, but then they'll have an even more complex system in place with even harder problems. Facebook can solve those problems; us small shops can't. Better hope Facebook shares their solutions with the world.
> Any API change had to be deployed simultaneously to all services using that API to avoid downtime, which often went wrong and resulted in long release cycles.
There’s an easy solution to that problem, which is a versioned API.
This! So very much this!
1. Browser caching is replaced by the graphql library (at least in the case of Apollo which is what I've used most). Which, as far as I've seen, is a fine one to one replacement in a lot of cases because it's still an in-memory cache that doesn't hit the network. Except it's even smarter than caching based on url and Cache-Control headers and such, because it can deeply introspect the payload and is able to return a cached version sometimes even if it hasn't made the exact same request before.
You're right that you don't get caching between sessions and won't automatically get CDN caching. It hasn't been a problem for me, and to be honest I don't know that I've ever worked on an app that had a whole lot of static data payloads where caching them at this level is critical for performance. For any kind of dynamic data that's unique per user you don't really need it. Things like cdn caching of images and js and css bundles and such still work just fine obviously. But I'd be surprised if there's no way to handle the CDN case in graphql. You can probably configure the client to do queries over GET and mutations over POST, though not sure offhand that that's gonna be the best way.
One similar issue, is things like the browser network tab become a little harder to deal with. But so far I've found the GraphQL developer tools are good enough to more than offset these limitations.
2. If anything, I think it's actually easier to err in the opposite direction, i.e. have your backend be a a thin wrapper, because you can let frameworks basically auto-map your ORM to GraphQL and then write all your business logic on the frontend. But ultimately it's really just as flexible as REST, and up to you to develop your own patterns for how much logic to put in the frontend vs backend.
Basically, the downsides that I've run into so far have been surprisingly few, and in return you get strong typing of your API layer, plus depending on your client you get a lot of really cool normalizing and de-duping in your frontend store for free. That latter part is probably the biggest game changer IMO.
In regards to point 2: GraphQL definitely doesn't make it harder to implement front-end logic. I'm not sure how to back up this claim because it's not apparent to me why one would think that GraphQL complicates this.
1) This is more of a client implementation issue than a GraphQL issue; there's nothing about the GraphQL spec that mentions what HTTP method to use. Most people use POST because it makes the most sense in the general case but there's no reason you can't move the query from a POST body to a GET query string.
2) This question seems odd to me. The real win from GraphQL is a reduction in overfetching - if you're tailoring a REST endpoint specifically to your needs the response should be functionally identical to a GraphQL response (i.e. only the data you need). These are just paradigms for passing data - the one you choose shouldn't have much impact on how said data is used.
There’s disagreement on whether it is within the http spec (https://stackoverflow.com/questions/978061/http-get-with-req...), but you can also send a body with a GET request, and you wouldn’t be alone in that; Elastic uses it, too (https://www.elastic.co/guide/en/elasticsearch/reference/curr...)
This means, you can cache and you can do whatever you want with caching rules (source: We are running it in production for a long time)
2) This is largely up to you. You can have it as a thin skin or you can rely more on your server side logic.
For us, we chose to have the frontend only handle presentation logic, the rest is handled on the server side.
For example, a lot of our stuff relied heavily on ACL, the frontend needs to expect to NOT get any data on specific fields, that's the only responsible for it. The rest is on the server side.
So I've heard, which is why I chose my wording carefully. Does anyone actually do that?
and Relay Modern has a PR open for it.
Doing something similar without one of these graphql client libraries shouldn't be too tough.
The second point I apparently failed to convey is that these are CONCERNS of mine, not conclusions. I'm looking to find reviews that explore these so I can reach (preliminary) conclusions, but so far I largely see one-line reassurances. Which are great, and I appreciate it, but they don't leave me feeling like I've really done my due diligence on my concern.
GraphQL does this, yes, but it's not particularly _smart_ about how caching works or how to avoid the Select N+1 problem. Their solution* is the blunt hammer that is Facebook's dataloader project which is basically: aggressively cache data model, pretend databases and joins and SQL doesn't exist, throw away any hope for ACID/consistency. Dataloader for example exposes all sorts of new and exciting types of inconsistency. This is hand-waved away because, I guess, consistency is boring and user expectations are low or irrelevant. (A comment with a missing edge to a post is invisible, a post with a missing edge to a comment has 0 comments. It'll all work itself out in the end.)
Curiously, Facebook went a long way down the road to fixing _this exact problem_ on the backend with a library called Haxl, written in Haskell. Haxl allows expressing relations between multiple data stores in a way that _looks_ like using an ORM, but under the hood creates a query and obviates the Select N+1 problem: a function which appears to select a post and for each comment retrieve an edge to the person who posted it will perform a single SELECT against the database, maintaining consistency with that store. There's no fundamental reason that couldn't be written in most dynamically typed languages or ORMs (though Haskell provides some really nice type level guarantees).
What's bizarre to me is that the former took off, and the latter is largely unknown outside the Haskell community.
* - Other ORMs have recognized this, and there are efforts underway for GraphQL backends in Python (Graphene) and Ruby, at least, to solve this.
1. If the dataloader isn't the sole service with database connections, the cache will be invalid when other services interact with the database.
2. Even if the dataloader is the sole mechanism for accessing the database, you have to figure out how to scale that to multiple nodes and maintain cache coherency on each.
3. Even if you run just a single dataloader instance or figure out how to ensure cache coherency, that layer is still oblivious to triggers on the database and so you had better not use any advanced functionality there.
4. Even if you strip away all of the low level SQL features and treat the database as a dumb searchable KVS with a single dataloader instance (or cache coherent layer in front of it), then you are still performing two queries and mutations which occur in parallel with queries can result in non-repeatable reads or phantom reads because the default in many GraphQL packages is that each query processed with sub-queries runs with no transaction wrapped around it.
5. Even if you ensure that every GraphQL gets a unique transaction, that doesn't mean the DataLoader cache is _coherent_ with the database transactions, and I haven't seen any papers or effort to verify that, so there's no guarantee parallelization can't result in dirty reads.
6. Okay, so you have a single threaded, single instance dataloader instance with a mutex around a database connection that runs every GraphQL query's subqueries in a transaction...
This is all fine if you're dealing with, well, comments and posts or other trivium in which consistency isn't an issue. Which actually happens to be the type of problems many large successful companies have to deal with.
But if you are dealing with financial data, medical data, scheduling of resources, anything where the equivalent paradigm of "my friend posted but I don't see it yet, therefore I can't comment on her post" or "my post loaded but I don't see my friend's comment on it yet" is an issue makes it a minefield for consistency.
"GraphQL is a query language for your API, and a server-side runtime for executing queries by using a type system you define for your data. GraphQL isn't tied to any specific database or storage engine and is instead backed by your existing code and data."
And as far as I'm concerned, this is exactly what I think graphql is and should be. It doesn't say anything about caching or solving the N+1 problem or ACID/consistency.
You say "GraphQL does this, yes, but it's not particularly _smart_ about how caching works or how to avoid the Select N+1 problem.". But the whole point of graphql is to just be a typed query layer and use whatever strategy makes the most sense for your application. I feel like it's like saying "I'm surprised that JSON took off even if it's not smart enough to do X"... where actually JSON took off _because_ it doesn't try to do all of those features.
EDIT: /s/package/engine
It's probably handwaved away because Facebook finds eventual consistency plus pubsub for updates to be good enough most of the time, and wants to shift the memory and CPU costs of calculating JOINs into the easily scalable GraphQL layer instead of the data store.
Their rule of "AVOID NESTING OBJECTS, JUST RETURN IDS OF RELATED OBJECTS" definitely is missing out one of the core benefits of GraphQL: being able to fetch related resources in a single request. Feels to me like they decided to switch to GraphQL cause it's supposedly better but are still utilizing it exactly like you would a REST API.
https://blog.apollographql.com/optimizing-your-graphql-reque...
In real-world UIs, I've found that queries rarely end up being more than a few levels deep and are relatively easily optimised as long as your internal APIs can handle batches (easy for entities, harder for pagination). Additionally, even though the only-return-IDs-for-relations pattern means you can't utilise joins effectively, the upside is that you end up with much simpler database queries that area easier to optimise at scale. My rule of thumb was that as long as the query representing an entire screen could typically return in sub 100ms in production, it was acceptable (this was without any caching at the GraphQL level, which I had planned but left the company before I could implement it).
Exactly my thought. From article, about REST API:
> Any API change had to be deployed simultaneously to all services using that API to avoid downtime, which often went wrong and resulted in long release cycles. Using GraphQL in a single API gateway, we would simplify the service landscape drastically.
It smells like a query language from a distance, but when you get close it smells much more of SOAP.
For instance say I'm receiving a list of widgets, but I only need the red ones. Unless the API developers explicitly foresaw the need to include a color filter, I can't filter that on their side. I have to get all colors of Widgets and filter it myself. The amount of unessessary data then can really multiply when you're getting the children and children's children of those widgets.
Having ported a number of APIs connectors from REST to GraphQL, I can say it has certainly greatly reduced the number of requests I've needed to make, but has often also greatly increased the amount of actual bytes I've received, particularly the bytes I don't need.
This smell is something I just can't move past when it comes to graphql.
> VERVE EVENTS is the global market leader in word-of-mouth sales in the live entertainment industry. We use networks of advocates to sell products and experiences to their friends in exchange for rewards such as free tickets and backstage passes.
...
> POLLEN is a community of influential young people who are passionate about sharing the best events. We handpick Members and, through our tools and support, make it easy for them to share their passion.
...
This sounds like a hip, trendy veneer atop the concepts of affiliate sales, and offshore farms of fake review writers. I suspect that their landing page is vague by design.
Perhaps that's the intent here ¯\_(ツ)_/¯
What's less common is using the GraphQL server for service-to-service communication, though i've been aware of people using it this way for my entire 3 years with GraphQL. I'm not yet convinced it's superior to alternative solutions to this problem (like gRPC, Thrift, other API Gateway patterns like JSON-API), but it could well be. I'm still happy using GraphQL in its sweet spot (and not-coincidentally what it was designed for), which is building an API for 1st-party websites and apps.
For example, in one case we had around a dozen services with fairly typical REST APIs, a few of which we wanted to pull large inter-related sets of information from. Using GraphQL allowed us to:
- Have a single large-but-human-readable query to retrieve all report data.
- Analyze deep nesting up front to determine opportunities for caching and eager loading.
- Abstract details regarding what data was coming from what service, plus handle any quirks (e.g. inconsistent auth strategies) at the GraphQL layer.
Without having to make many modifications to the underlying services. We also saw a two-order-of-magnitude performance improvement that would otherwise have required building out a lot of service-specific awareness into our reporting service.
Granted reporting is a unique case, but the simplified gateway layer and potential for query analysis are also major advantages as the number of API consumers grows.
- Slides: http://pyparis.org/static/slides/Patrick%20Arminio-1cba4f64....
- Vidéo: https://www.youtube.com/watch?v=IA1TuKfVTlg&feature=youtu.be