The answer is simple: You can't. GraphQL _in general_ doesn't allow arbitrary queries. It allows arbitrary output field selection. But the filters are very explicit. It's more "pre-aggregation of request waterfalls and masking of outputs" than "querying a database".
Doesn't stop people from exposing their SQL databases directly from GraphQL by generating a "free-for-all" schema. And when they do - yep, that's definitely a performance bomb and not a good use of GraphQL.
It really does. Surely, it somewhat limits the data that you get from it by defining a schema. But the moment you allow any nesting/connections between data in that schema, hello n+1 problem.
And then every discussion of this problem on HN or elsewhere exposes the ugly truth: almost everyone uses GraphQL as a REST endpoint in production by limiting the actual queries you can run and curbing nesting.
GraphQL has been public since June 2015, and there's been at least one solution to the n+1 problem (Dataloader) since September 2015. If you were using pure REST endpoints (just resources, no nesting/traversal) this is the exact problem you'd be punting over to the client to solve -- all that GraphQL is doing here is moving it back onto the server. The actual amount of work is the same, you just get faster response times.
Most implementations of GraphQL I've seen in different languages provide some variation on the Dataloader pattern. I'll fully concede it can be a hassle to set it up correctly, but it works.
I think for one thing you can’t really rely on joins for query efficiency, because as you say there are too many combinations so it’s impossible to optimize everything.
Instead you have to try to query each data type separately. So you get a query for users. You do an SQL call and gather up a bunch of requests for offices, and then you do a single request to your office backend.
I think the best case is something like n SQL queries per request, where n is the depth of the tree you are querying (users->office->address is depth 3).
That means you’re doing all your queries after the first one by ID (not by arbitrary columns). So you have to have some way to “pre-join” your tables. You can do this either by optimistically joining your data to everything around it (query the node plus all of its edges) or you need to store your edges in your data model (which I have to assume is what FB does).
In the end your resolvers need to be using some standardized way of grabbing objects by is (or edge), something like https://github.com/graphql/dataloader
Whether it’s possible to do this efficiently I don’t know. At my last job we messed it up, and then we started applying a strategy like I described above, but then I switched jobs.
Would love to hear from others who have dealt with the same challenges.