undefined | Better HN

0 pointscodenesium5y ago0 comments

When your backend is sql how can you even do this efficiently? Databases require indexes. If you can query anything then there are performance bombs all over the place. It's different if you're querying elastic I guess.

0 comments

5 comments · 3 top-level

jkrems5y ago· 2 in thread

> If you can query anything [...]

The answer is simple: You can't. GraphQL _in general_ doesn't allow arbitrary queries. It allows arbitrary output field selection. But the filters are very explicit. It's more "pre-aggregation of request waterfalls and masking of outputs" than "querying a database".

Doesn't stop people from exposing their SQL databases directly from GraphQL by generating a "free-for-all" schema. And when they do - yep, that's definitely a performance bomb and not a good use of GraphQL.

dmitriid5y ago

> The answer is simple: You can't. GraphQL _in general_ doesn't allow arbitrary queries.

It really does. Surely, it somewhat limits the data that you get from it by defining a schema. But the moment you allow any nesting/connections between data in that schema, hello n+1 problem.

And then every discussion of this problem on HN or elsewhere exposes the ugly truth: almost everyone uses GraphQL as a REST endpoint in production by limiting the actual queries you can run and curbing nesting.

andrewingram5y ago

The n+1 problem has solutions though. The most well-known solutions may not suit your architecture, but please can we stop pretending they don't exist?

GraphQL has been public since June 2015, and there's been at least one solution to the n+1 problem (Dataloader) since September 2015. If you were using pure REST endpoints (just resources, no nesting/traversal) this is the exact problem you'd be punting over to the client to solve -- all that GraphQL is doing here is moving it back onto the server. The actual amount of work is the same, you just get faster response times.

Most implementations of GraphQL I've seen in different languages provide some variation on the Dataloader pattern. I'll fully concede it can be a hassle to set it up correctly, but it works.

1 more reply

erikpukinskis5y ago

I never fully solved this problem, so don’t trust me, but I can tell you what I learned...

I think for one thing you can’t really rely on joins for query efficiency, because as you say there are too many combinations so it’s impossible to optimize everything.

Instead you have to try to query each data type separately. So you get a query for users. You do an SQL call and gather up a bunch of requests for offices, and then you do a single request to your office backend.

I think the best case is something like n SQL queries per request, where n is the depth of the tree you are querying (users->office->address is depth 3).

That means you’re doing all your queries after the first one by ID (not by arbitrary columns). So you have to have some way to “pre-join” your tables. You can do this either by optimistically joining your data to everything around it (query the node plus all of its edges) or you need to store your edges in your data model (which I have to assume is what FB does).

In the end your resolvers need to be using some standardized way of grabbing objects by is (or edge), something like https://github.com/graphql/dataloader

Whether it’s possible to do this efficiently I don’t know. At my last job we messed it up, and then we started applying a strategy like I described above, but then I switched jobs.

Would love to hear from others who have dealt with the same challenges.

flashgordon5y ago

So SQL is not a database :). It is a data access DSL that is implemented by databases. SQL being untyped I dont think is true - the table schemas are types (albeit basic product/record types). Inferring the type of a result is quite reasonable if you start with the schemas. SQl suffers from a UX problem for sure.

j / k navigate · click thread line to collapse