Understanding the N + 1 queries problem (opens in new tab)

(ananthakumaran.in)

111 pointsananthakumaran3y ago92 comments

92 comments

59 comments · 14 top-level

samwillis3y ago· 12 in thread

Something interesting to consider with N+1 queries, these warnings only really apply to remote database servers, as in not on the same machine.

If you are using SQLite, or another in process database, N+1 isn't an issue at all. So with the increased use of SQLite as an "edge" database it's something to consider.

"Many Small Queries Are Efficient In SQLite": https://www.sqlite.org/np1queryprob.html

lowercased3y ago

It's not a problem until it is.

I was brought in to a problem to help with a migration gone south. Main dashboard page was taking 'too long'.

A system had been acquired, then 'moved' to the 'new cloud architecture'. The main dashboard code made an assumption that all db queries were 'localhost' but after it was moved, the db engine was across a network (a google MySQL database and app code on GCE instance, IIRC).

The dashboard was making around 8000 queries. Yes, that's sort of overwhelmingly bad. When they were all 'local', most were a fraction of a millisecond. Dashboard load time was typically ~1 second or so. Going across a network, this ballooned to 20-30 seconds (again IIRC) - some clients had load times of more than a minute. Each db connection was 2-3ms, and the code was so bad that each section of code was making its own connections - single screen might make a hundred or more database connections.

A few weeks of code tracing, rewriting and lots of data caching, I got dashboard down to 5-8 seconds for most clients (who were still angry about it).

To be clear, the entire process of acquiring/purchasing this system. Very little due diligence was done on company A's part. Company A bought company B, and company B staff was all let go.

In any event... I'm a fan of db/app talking on 'localhost', but you can't always assume that will always be the architecture, and you should still aim to be aware of 'too many queries' and guard against when you can.

n_e3y ago

Although the problem will be a lot less severe than with remote servers, this is still sub-optimal:

- the data passed from one query to the next still needs to move from the database process to the service process and back

- the queries will always be executed in the order they are in the code, denying the optimizer the opportunity to execute the full query in the best order

Arch-TK3y ago

With SQLite there is no "database process" unless you are explicitly only using a dedicated process to make the queries through which isn't necessary with SQLite anyway.

That being said, the problem here is not whether N+1 is a problem or not, but rather if, given the immense amount of unnecessary complexity that using an ORM brings, it is appropriate to use an ORM.

jerf3y ago

There is still also non-zero overhead associated with making queries in general, in both the querying and query-answering process.

The ceiling of the range where you can get away with this without user-visible performance impact will be much higher, and the relative performance difference may be smaller, but in general fewer queries for the same data will still be better in general.

Even with an in-process DB, you're still essentially making a sort of context switch.

1 more reply

davnicwil3y ago

Sub optimal in one regard, but if segmenting queries makes for simpler, easier to read and easier to debug code, then you're optimising dev time. Often this is the right tradeoff to make.

drowsspa3y ago

You're still trading O(1) by O(N)... SQLite still has to read from the disk, it's still multiple orders of magnitude more expensive than a memory lookup.

hinkley3y ago

On the one hand the individual reads will result in less blocking of write operations. On the other the read consistency behavior of the system changes by interlacing writes into the middle of your query.

hgsgm3y ago

You can cache the database or part of it in memory.

1 more reply

kawsper3y ago

It's out of fashion today, but N+1 queries are also useful (and preferred) if you're doing russian doll caching which were popular at Basecamp.

It lets you only query and cache the resources that have changed, not the whole dataset: https://blog.appsignal.com/2018/04/03/russian-doll-caching-i...

hinkley3y ago

I've never set out to implement russian doll caching, so I could be talking out of my butt, but I've worked on several projects that arrived at RDC by just throwing more and more caches at everything, and the fatal flaw there is that nobody ever goes back and determines if Cache C is undermining the effectiveness of Cache A, or if Cache C obsoletes Cache A, and should have been presented as a replacement instead of a supplement. Each cache is another failure point, and if you go down the caching rabbit hole then the appropriateness of each cache is a question that is never definitively answered. The answer changes with new features, every time your cluster grows, or shrinks, or upgrades to new instance types. It becomes a tax on your project that is either paid as you go or all at once when your complacency results in a production issue, or magnifies one.

1 more reply

JamesSwift3y ago

People still use russian doll caching, but its a bit of a pain honestly due to having to make sure to call `touch` in all the appropriate places, which is surprisingly a lot of places. Its very easy to forget to do so and end up with stale caches.

ilyt3y ago

The only reason to ignore that is if you hate the future you that will have to deal with it

acjohnson553y ago· 12 in thread

I can't help but feel this problem is indicative of incidental complexity in how we develop web applications. Not saying the PHP glory days were better, but there's something to be said for removing the layers of abstraction between the data and the presentation. Make the database query using SQL directly, and then inject the results into the HTML template to be delivered to the browser. Obviously, there were many issues here, like how easy it was to leave applications open to SQL injection attacks.

But it has been interesting to see the tide turn back towards server-side rendering, relying on partial DOM replacement for client-side updates. For web apps that don't have massive numbers of UI states (like a document editor), it seems like people are rethinking the wisdom thick client-side JavaScript applications, which seem to be one of the main motivators for REST API layers, and the need to efficiently fulfill N+1 queries.

Although, I do remember dealing with the N+1 problem when doing Django server-side apps more than a decade ago, before the dominance of client-side apps. I guess it was more the rise of MVC architecture and the active record pattern (https://en.wikipedia.org/wiki/Active_record_pattern) that brought the N+1 problem, more so than client-side apps.

Arch-TK3y ago

This is specifically just a problem with all ORMs. They attempt to solve the object-relational mismatch and fail because these two concepts (sets of tuples and graphs) are completely orthogonal.

fabian2k3y ago

This is not a problem with all ORMs. This is caused by lazy loading the related data when it is accessed, which is not how all ORMs work. If your ORM requires you to define the included related data explicitly, you won't have this particular problem.

1 more reply

jbverschoor3y ago

No it's not.. you can and will easily have the same problem with SQL if you're not fetching beforehand..

I bet wordpress (or many plugins) is/are so slow exactly because of this.

ORMs will even make it easier to write more efficient queries is many cases.

3 more replies

avereveard3y ago

proper orm allow you to specify how to process relationships, best one on a per query basis and not on a per model basis. then they aggregate the table resulting from the join presenting a deduplicated view of the root object and all the related one fetched.

1 more reply

hgsgm3y ago

Tuples are Objects Graphs are Relations.

ORM (or GTM if you like) helps you navigate your Graph of Tuples.

2 more replies

acdha3y ago

I saw this tons back then, too. Even back in the paleo-web you’d see people writing nested for loops, sometimes obscured by functions or a class structure but sometimes just the adjacent HTML.

As you noted, this is one of the reasons why client-side JavaScript often falls far short of the envisioned benefits. In both cases, however, I would suggest that while the trend is real and should inform your architecture the most important thing is to routinely use your monitoring. Different apps have different performance challenges but they all need to be observable and I’ve seen so many times where people wasted tons of time and resources shooting in the dark because they didn’t have granular monitoring or thought it was too hard.

One of the best example I had was a while back when I inherited a Django codebase which was too slow. One of the developers had spent a couple days rewriting everything in Jinja2 because “Django is slow”, introducing the bugs you’d expect and leading to lots of new custom code to maintain (such as 3 versions of the human size formatting function). Performance didn’t budge. Six months later, I saw the MySQL query counter spiraling up on every page view, installed Django Debug Toolbar, looked at the queries, and spent an afternoon reverting all of the templates and fixing the N+1 queries which had always been the problem. That was tens of thousands of lines of code churn all wasted for what was eventually maybe 50 lines changed against the original codebase.

One thing I introduced which worked well was a test hook which failed based on the query counts for a view. That caught most N+1 issues and is conveniently easy to implement with almost any model.

mireq23y ago

I spent couple days rewriting everything from Django to Jinja2 and it was really big difference with 10x speedup (from 500ms for invoice list to 50ms). It was big e-commerce system with every ORM query optimized, selecting only required fields, prefetching related relations or selecting related. Django templates are really really slow and hard to debug. Rewrite was really great decision in this case.

1 more reply

serverholic3y ago

At my first job out of school we didn't use an ORM and just queried with raw SQL. I had never heard of the N+1 query problem because it wasn't something we really had to worry about.

After that I joined a startup that used rails and I had quite the education in ORMs and N+1 queries. The ORM felt quite restricting and I felt a lot less confident in my code.

ramchip3y ago

I really like the compromise taken by Ecto (in Elixir): higher level than writing SQL directly, but without lazy loading, callbacks, and other features that make it difficult to see what a piece of code is really doing.

brightball3y ago

There's a Ruby gem called Bullet that identifies and warns developers about N+1 problems. You can also have it fail tests if detected.

I don't know if the approach is possible with every ORM or if it's just leveraging some Ruby perks, but I can't think of a good reason why you wouldn't use the equivalent everywhere.

https://github.com/flyerhzm/bullet

pmontra3y ago

> Make the database query using SQL directly, and then inject the results into the HTML template to be delivered to the browser.

This is what I do for the most complex queries, where translating them to the ORM would be a time consuming pain. What I do normally is think in SQL and write in the ORM language. After all the API of the database is SQL, not the ORM, and I already know SQL.

In this way it's obvious that you have to write includes(:comments), because it's a join, and it's obvious that the original

  object.votes.count

would generate one query each time it is called. I write the separate queries and compose the results instead. I'm not using BatchLoader or other gems. Using directly the ORM is enough.

If this query turns out to be important for the performances of the application, I'll think about a way to write it with a single query in SQL to make the database do all the work. Then translate it in ActiveRecord or leave it in SQL if it takes too long. Sometimes it's not obvious how to do it, a problem common to all ORMs of all languages.

Edit: somebody called saila gave an example of such a query in the comments one hour before I wrote my reply.

imtringued3y ago

A lot of complex computed properties can easily be made efficient by having "WITH" statements.

The with query starts with the ID of the object to which the computed property is attached to. So in this case you would do the same thing as the author and write

WITH comment_vote_count as select comment_id, COUNT(vote_id) from comment left join vote ...

The ORM would just treat this like a normal table and do a join based on the comment ID except the ORM provides a convenient API to do this, which it currently does not...

saila3y ago· 6 in thread

You could probably get this down to two queries, one for posts and one for comments, if you aggregate the vote count when retrieving the comments. I think this is pretty easy to do with most ORMs.

You could also get it down to 1 query using SQL. This is one way to do it based on the schema in the article [postgres, not well tested]:

    with
      latest_posts as (
        select * from post limit 3
      ),
      latest_comments as (
        select
          c.*, count(v.id) as votes
        from
          comment c
        left join
          vote v on v.comment_id = c.id
        where
          c.post_id in (select id from latest_posts)
        group by
          c.id, c.content
      )
    select
      p.*, json_agg(c)
    from
      latest_posts p
    left join
      latest_comments c on c.post_id = p.id
    group by
      p.id, p.title, p.content

    # NOTE: fixed SQL bug noted by @rurabe

Off the top of my head, I'm not sure how you would (or if you could) do this with ActiveRecord, SQLAlchemy, or the Django ORM, but it's probably more complicated than just writing the SQL.

To be clear, I'm not anti-ORM and use them all the time, but it really helps to understand SQL well when using them and to know when it's appropriate to switch to SQL.

rurabe3y ago

One neat trick that I think is relatively lesser known is that you can select arbitrary sql expressions in ActiveRecord and those values are made available on the instances.

(Also I think the above sql needs to be tweaked since you need the votes count grouped by comment not by post)

A one to many relationship in pure SQL is an awkward fit with a Rails app as it requires serializing (at least) the many as json. Then there's this weird conceptual gotcha where one resource is an AR instance and another is a pure hash.

I'd probably make a scope and association to help out here:

    class Comment
      scope :with_vote_count, ->{ joins(:votes).select('comments.*').select('count(votes.*) as vote_count') }
    end

    class Post
      has_many :comments
      has_many :comments_with_vote_counts, ->{ with_vote_counts }, class_name: 'Comment'
    end

    # in controller
    @posts = Post.includes(:comments_with_vote_counts).limit(3).order(:created_at: :desc)

    # in view/serializer, posts and comments are both AR instances
    @posts.each do |post|
      post.comments.each do |comment|
        comment.vote_count # => Integer
      end
    end

This should give you 2 queries, one to load the posts, then one to load the comments and vote counts for the relevant posts. Controller stays nice and slim and the complexity is delegated to sql via the join scope, without any other dependencies.

* edited for HN code block syntax

VWWHFSfQ3y ago

Django will do something similar (possibly a little more elegantly) if one is familiar with how to use the Prefetch APIs [1]:

    Post.objects.order_by("-created_at").prefetch_related(
        Prefetch(
            "comments",
            queryset=Comment.objects.annotate(
                vote_count=Count("votes")
            ),
        )
    )[:3]

This will generate the following two queries:

    SELECT
        "post"."id",
        "post"."created_at",
        "post"."title",
        "post"."content"
    FROM "post"
    ORDER BY "post"."created_at" DESC
    LIMIT 3;

    SELECT
        "comment"."id",
        "comment"."post_id",
        "comment"."content",
        COUNT("vote"."id") AS "vote_count"
    FROM "comment"
    LEFT OUTER JOIN "vote"
        ON ("comment"."id" = "vote"."comment_id")
    WHERE "comment"."post_id" IN (3, 2, 1)
    GROUP BY
        "comment"."id",
        "comment"."post_id"

[1] https://docs.djangoproject.com/en/4.1/ref/models/querysets/#...

1 more reply

gnuvince3y ago

> To be clear, I'm not anti-ORM and use them all the time, but it really helps to understand SQL well when using them and to know when it's appropriate to switch to SQL.

When I did web development, I saw it as a "hack" and a "failure to write clean code" whenever I reached for raw SQL. This is of course not true at all, but it was a powerful psychological blocker and I'd spend too much time trying to figure how to get the ORM to do what I wanted instead of writing the SQL myself and moving on to the next problem.

joshuahedlund3y ago

> but it really helps to understand SQL well when using them and to know when it's appropriate to switch to SQL.

I agree. I often feel like I benefited by starting my web career pre-ORM and only learning to use them a few years in, so I can appreciate and use both. I sometimes wonder if it’s harder for new devs to acquire the same kind of experience.

pphysch3y ago

It's pretty straightforward in Django. The key is being comfortable with writing custom Manager/QuerySet methods.

You could do something like `Post.objects.latest().annotate_comments()` which would resolve almost exactly to the query you wrote above.

saila3y ago

Using a typical set of Post & Comment models, where Comment has a foreign key to Post, I couldn't figure out how to do this with just a single query in Django. Using prefetch_related, the 2-query version is pretty straightforward:

    from django.db.models import Count, Prefetch
    from myproject.models import Post, Comment

    # This will fetch the 3 posts first and then the comments for those posts
    query = Post.objects.prefetch_related(
        Prefetch("comments", queryset=Comment.objects.annotate(Count("votes")))
    )
    posts = query[:3]

How would you reduce this to one query using the Django ORM?

2 more replies

kstrauser3y ago· 3 in thread

I really wish this been originally called the “1+N problem”, not “N+1”. That naming makes it much clearer to me.

Izkata3y ago

I swear that's what it was called when I was first introduced to it a decade ago. When I first saw one of the posts focused on it on here I didn't initially recognize it as referring to the same thing.

kstrauser3y ago

I don't remember how I heard it originally, but I wouldn't have recognized it as the same, either. To me, "N+1" implies you're already doing N queries and now you're running 1 more. That's a different class of problem than "you were running 1 query, and now you're running more than 1."

pharmakom3y ago

Or even the “1 then N problem” since we determine the N from the 1.

keltex3y ago· 3 in thread

I don't know Rails Active Record. But if the ORM is anything like others I am more familiar with (Django / Python or Linq / C#) can't you do a join and just have a single query? Or use raw SQL if performance is an issue?

acjohnson553y ago

The problem tends to come up when you pass models around and subsequent logic traverses relationships. It's fundamentally an issue of treating Active Record-style models as though they were in-memory objects.

jbverschoor3y ago

No it's not. If one forehand you do not know which "object graph" you need, you'll run in this problem no matter what the underlying tech is.

At least some with ORMs you can specify afterwards (when passing the activerecords) that you want to have prefetched (inner or outer joins, caching). Sometimes it's done automatically, because you can simply detect when you're in an N+1 query loop.

n_e3y ago

I don't know ActiveRecord either, but it appears so https://guides.rubyonrails.org/active_record_querying.html#j...

adamzapasnik3y ago· 3 in thread

This is what I struggle a lot with in Rails.

No good, community backed serialisation gem. AMS is a mess, other ones are not maintained. And I'm not a fan of JSON api spec's serialisation either.

But also AR doesn't have any easy tools to construct complex queries/multi queries. It works for basic and medium stuff, but even this very common count problem is a disaster to deal with. Sure. you can use Arel and some other gems, but these aren't good solutions for someone that wants to get things done. Makes me wonder how others deal with these problems tbh.

bhaak3y ago

IME if ActiveRecord is not sufficient, you go directly to SQL.

find_by_sql gives you enough freedom to get everything out of the db into a ActiveModel object.

jbverschoor3y ago

bs, there are a few very good serialization gems. ActiveRecord and Hibernate/JPA work a lot better than concatenating your own SQL strings.

If there's something that really doesn't benefit from your model (reports), then you'd fallback to either SQL, or still use ActiveModel + the aggregates

adamzapasnik3y ago

Please list the good serialisation gems, as I can't find any ;)

What if you have a complex/dynamic query, how do you build it? You said yourself that AR works better than concatenating SQL strings, but AR doesn't even support CTEs atm and building complex queries is not trivial and sometimes even possible without just SQL strings...

2 more replies

simonw3y ago· 2 in thread

There's another option with many databases these days: you can often use aggregation functions to return the related data as part of a single query, even across many-to-many tables.

I wrote up how to do that using JSON aggregation functions in both SQLite and PostgreSQL for example: https://til.simonwillison.net/sqlite/related-rows-single-que...

panzerboiler3y ago

How would you add also the count of the votes of each comment in the aggregation, as per the example in the article?

simonw3y ago

Lots of ways to do that, one way would be using a CTE like this one: https://lite.datasette.io/?install=datasette-pretty-json&sql...

    with comment_vote_counts as (
      select
        comment_id,
        count(*) as vote_count
      from
        votes
      group by
        comment_id
    ),
    comments_with_vote_counts as (
      select
        id,
        post_id,
        content,
        coalesce(vote_count, 0) as votes
      from
        comments
        left join comment_vote_counts on comments.id = comment_vote_counts.comment_id
    )
    select
      posts.id,
      posts.title,
      posts.content,
      json_group_array(
        json_object(
          'id',
          comments_with_vote_counts.id,
          'content',
          comments_with_vote_counts.content,
          'votes',
          comments_with_vote_counts.votes
        )
      ) as comments
    from
      posts
      join comments_with_vote_counts on comments_with_vote_counts.post_id = posts.id
    group by posts.id

1 more reply

ydnaclementine3y ago· 2 in thread

Would this not be solved with adding `votes` to the `includes`? Something like:

```

Post.includes(comments: :votes)

```

Similar stackoverflow: https://stackoverflow.com/a/24397716

rurabe3y ago

The problem here is that you are loading all the votes as AR instances which is fine at small scale, but as your app gets larger, loading and instantiating thousands of Vote instances just to then break them down into an integer will start to drag on your controller.

If you can count in the database itself it's a big win. Although no doubt your solution is cleaner code.

ramchip3y ago

Exactly this. Combine with Bullet[1] to detect problems early.

[1] https://bhserna.com/tools-to-help-you-detect-n-1-queries.htm...

pharmakom3y ago· 1 in thread

This comes up in GraphQL, not just ORMs. A beautify solution is Facebook’s Haxl. Less beautiful is data-loader.

eezing3y ago

Correct. While data-loader facilitates data loading, type resolvers in GraphQL is where the solution starts.

bfung3y ago· 1 in thread

> breadth-first loading. The ideal solution requires us to load the data in a breadth-first approach, but unfortunately, this is harder to write because it does not compose well.

The author finds the simplest and efficient solution, but continues to over engineer for blog content :P

“Composing” is overrated in this case.

ananthakumaranOP3y ago

This is a contrived example, probably not a real-world use case. The kind of issues I am dealing at work is much more complicated, usually involves more than 3 or 4 tables, serializers are referred by multiple other serializers etc. There is usually business logic involved as well in the query construction. I understand things can be improved, but it's not as simple as writing few queries by hand. ActiveRecord/ActiveModelSerializer provides good composability, but fails to handle N + 1 queries optimally, which is what I am trying to explain.

eloisius3y ago

I wish I'd had an opportunity to use Phoenix in production before I got out of web dev, because the way the Ecto ORM obviated this entire class of error was beautiful. Instead of lazy loading, there's a neat grammar for preloading the entire graph of related records that you want.

TexanFeller3y ago

Understand N+1 before you try GraphQL.

funnyfoobar3y ago

Somewhat deviating, but relavent. If we use counter cache that is to keep vote_count on comments table, the include(:comments) solution would work fine.

https://scoutapm.com/blog/how-to-start-using-counter-caches-...

pmg1023y ago

We solved the N+1 queries problem where I work by raising the level of abstraction from "queries plus serialisation" to "what shape data is required". We open sourced the solution at https://www.django-readers.org/.

j / k navigate · click thread line to collapse

92 comments

59 comments · 14 top-level

samwillis3y ago· 12 in thread

Something interesting to consider with N+1 queries, these warnings only really apply to remote database servers, as in not on the same machine.

If you are using SQLite, or another in process database, N+1 isn't an issue at all. So with the increased use of SQLite as an "edge" database it's something to consider.

"Many Small Queries Are Efficient In SQLite": https://www.sqlite.org/np1queryprob.html

lowercased3y ago

It's not a problem until it is.

I was brought in to a problem to help with a migration gone south. Main dashboard page was taking 'too long'.

A few weeks of code tracing, rewriting and lots of data caching, I got dashboard down to 5-8 seconds for most clients (who were still angry about it).

To be clear, the entire process of acquiring/purchasing this system. Very little due diligence was done on company A's part. Company A bought company B, and company B staff was all let go.

n_e3y ago

Although the problem will be a lot less severe than with remote servers, this is still sub-optimal:

- the data passed from one query to the next still needs to move from the database process to the service process and back

- the queries will always be executed in the order they are in the code, denying the optimizer the opportunity to execute the full query in the best order

Arch-TK3y ago

With SQLite there is no "database process" unless you are explicitly only using a dedicated process to make the queries through which isn't necessary with SQLite anyway.

That being said, the problem here is not whether N+1 is a problem or not, but rather if, given the immense amount of unnecessary complexity that using an ORM brings, it is appropriate to use an ORM.

jerf3y ago

There is still also non-zero overhead associated with making queries in general, in both the querying and query-answering process.

Even with an in-process DB, you're still essentially making a sort of context switch.

1 more reply

davnicwil3y ago

Sub optimal in one regard, but if segmenting queries makes for simpler, easier to read and easier to debug code, then you're optimising dev time. Often this is the right tradeoff to make.

drowsspa3y ago

You're still trading O(1) by O(N)... SQLite still has to read from the disk, it's still multiple orders of magnitude more expensive than a memory lookup.

hinkley3y ago

hgsgm3y ago

You can cache the database or part of it in memory.

1 more reply

kawsper3y ago

It's out of fashion today, but N+1 queries are also useful (and preferred) if you're doing russian doll caching which were popular at Basecamp.

It lets you only query and cache the resources that have changed, not the whole dataset: https://blog.appsignal.com/2018/04/03/russian-doll-caching-i...

hinkley3y ago

1 more reply

JamesSwift3y ago

ilyt3y ago

The only reason to ignore that is if you hate the future you that will have to deal with it

acjohnson553y ago· 12 in thread

Arch-TK3y ago

This is specifically just a problem with all ORMs. They attempt to solve the object-relational mismatch and fail because these two concepts (sets of tuples and graphs) are completely orthogonal.

fabian2k3y ago

1 more reply

jbverschoor3y ago

No it's not.. you can and will easily have the same problem with SQL if you're not fetching beforehand..

I bet wordpress (or many plugins) is/are so slow exactly because of this.

ORMs will even make it easier to write more efficient queries is many cases.

3 more replies

avereveard3y ago

1 more reply

hgsgm3y ago

Tuples are Objects Graphs are Relations.

ORM (or GTM if you like) helps you navigate your Graph of Tuples.

2 more replies

acdha3y ago

I saw this tons back then, too. Even back in the paleo-web you’d see people writing nested for loops, sometimes obscured by functions or a class structure but sometimes just the adjacent HTML.

One thing I introduced which worked well was a test hook which failed based on the query counts for a view. That caught most N+1 issues and is conveniently easy to implement with almost any model.

mireq23y ago

1 more reply

serverholic3y ago

At my first job out of school we didn't use an ORM and just queried with raw SQL. I had never heard of the N+1 query problem because it wasn't something we really had to worry about.

After that I joined a startup that used rails and I had quite the education in ORMs and N+1 queries. The ORM felt quite restricting and I felt a lot less confident in my code.

ramchip3y ago

brightball3y ago

There's a Ruby gem called Bullet that identifies and warns developers about N+1 problems. You can also have it fail tests if detected.

I don't know if the approach is possible with every ORM or if it's just leveraging some Ruby perks, but I can't think of a good reason why you wouldn't use the equivalent everywhere.

https://github.com/flyerhzm/bullet

pmontra3y ago

> Make the database query using SQL directly, and then inject the results into the HTML template to be delivered to the browser.

In this way it's obvious that you have to write includes(:comments), because it's a join, and it's obvious that the original

  object.votes.count

would generate one query each time it is called. I write the separate queries and compose the results instead. I'm not using BatchLoader or other gems. Using directly the ORM is enough.

Edit: somebody called saila gave an example of such a query in the comments one hour before I wrote my reply.

imtringued3y ago

A lot of complex computed properties can easily be made efficient by having "WITH" statements.

The with query starts with the ID of the object to which the computed property is attached to. So in this case you would do the same thing as the author and write

WITH comment_vote_count as select comment_id, COUNT(vote_id) from comment left join vote ...

The ORM would just treat this like a normal table and do a join based on the comment ID except the ORM provides a convenient API to do this, which it currently does not...

saila3y ago· 6 in thread

You could probably get this down to two queries, one for posts and one for comments, if you aggregate the vote count when retrieving the comments. I think this is pretty easy to do with most ORMs.

You could also get it down to 1 query using SQL. This is one way to do it based on the schema in the article [postgres, not well tested]:

    with
      latest_posts as (
        select * from post limit 3
      ),
      latest_comments as (
        select
          c.*, count(v.id) as votes
        from
          comment c
        left join
          vote v on v.comment_id = c.id
        where
          c.post_id in (select id from latest_posts)
        group by
          c.id, c.content
      )
    select
      p.*, json_agg(c)
    from
      latest_posts p
    left join
      latest_comments c on c.post_id = p.id
    group by
      p.id, p.title, p.content

    # NOTE: fixed SQL bug noted by @rurabe

Off the top of my head, I'm not sure how you would (or if you could) do this with ActiveRecord, SQLAlchemy, or the Django ORM, but it's probably more complicated than just writing the SQL.

To be clear, I'm not anti-ORM and use them all the time, but it really helps to understand SQL well when using them and to know when it's appropriate to switch to SQL.

rurabe3y ago

One neat trick that I think is relatively lesser known is that you can select arbitrary sql expressions in ActiveRecord and those values are made available on the instances.

(Also I think the above sql needs to be tweaked since you need the votes count grouped by comment not by post)

I'd probably make a scope and association to help out here:

    class Comment
      scope :with_vote_count, ->{ joins(:votes).select('comments.*').select('count(votes.*) as vote_count') }
    end

    class Post
      has_many :comments
      has_many :comments_with_vote_counts, ->{ with_vote_counts }, class_name: 'Comment'
    end

    # in controller
    @posts = Post.includes(:comments_with_vote_counts).limit(3).order(:created_at: :desc)

    # in view/serializer, posts and comments are both AR instances
    @posts.each do |post|
      post.comments.each do |comment|
        comment.vote_count # => Integer
      end
    end

* edited for HN code block syntax

VWWHFSfQ3y ago

Django will do something similar (possibly a little more elegantly) if one is familiar with how to use the Prefetch APIs [1]:

    Post.objects.order_by("-created_at").prefetch_related(
        Prefetch(
            "comments",
            queryset=Comment.objects.annotate(
                vote_count=Count("votes")
            ),
        )
    )[:3]

This will generate the following two queries:

    SELECT
        "post"."id",
        "post"."created_at",
        "post"."title",
        "post"."content"
    FROM "post"
    ORDER BY "post"."created_at" DESC
    LIMIT 3;

    SELECT
        "comment"."id",
        "comment"."post_id",
        "comment"."content",
        COUNT("vote"."id") AS "vote_count"
    FROM "comment"
    LEFT OUTER JOIN "vote"
        ON ("comment"."id" = "vote"."comment_id")
    WHERE "comment"."post_id" IN (3, 2, 1)
    GROUP BY
        "comment"."id",
        "comment"."post_id"

[1] https://docs.djangoproject.com/en/4.1/ref/models/querysets/#...

1 more reply

gnuvince3y ago

> To be clear, I'm not anti-ORM and use them all the time, but it really helps to understand SQL well when using them and to know when it's appropriate to switch to SQL.

joshuahedlund3y ago

> but it really helps to understand SQL well when using them and to know when it's appropriate to switch to SQL.

pphysch3y ago

It's pretty straightforward in Django. The key is being comfortable with writing custom Manager/QuerySet methods.

You could do something like `Post.objects.latest().annotate_comments()` which would resolve almost exactly to the query you wrote above.

saila3y ago

    from django.db.models import Count, Prefetch
    from myproject.models import Post, Comment

    # This will fetch the 3 posts first and then the comments for those posts
    query = Post.objects.prefetch_related(
        Prefetch("comments", queryset=Comment.objects.annotate(Count("votes")))
    )
    posts = query[:3]

How would you reduce this to one query using the Django ORM?

2 more replies

kstrauser3y ago· 3 in thread

I really wish this been originally called the “1+N problem”, not “N+1”. That naming makes it much clearer to me.

Izkata3y ago

kstrauser3y ago

pharmakom3y ago

Or even the “1 then N problem” since we determine the N from the 1.

keltex3y ago· 3 in thread

acjohnson553y ago

jbverschoor3y ago

No it's not. If one forehand you do not know which "object graph" you need, you'll run in this problem no matter what the underlying tech is.

n_e3y ago

I don't know ActiveRecord either, but it appears so https://guides.rubyonrails.org/active_record_querying.html#j...

adamzapasnik3y ago· 3 in thread

This is what I struggle a lot with in Rails.

No good, community backed serialisation gem. AMS is a mess, other ones are not maintained. And I'm not a fan of JSON api spec's serialisation either.

bhaak3y ago

IME if ActiveRecord is not sufficient, you go directly to SQL.

find_by_sql gives you enough freedom to get everything out of the db into a ActiveModel object.

jbverschoor3y ago

bs, there are a few very good serialization gems. ActiveRecord and Hibernate/JPA work a lot better than concatenating your own SQL strings.

If there's something that really doesn't benefit from your model (reports), then you'd fallback to either SQL, or still use ActiveModel + the aggregates

adamzapasnik3y ago

Please list the good serialisation gems, as I can't find any ;)

2 more replies

simonw3y ago· 2 in thread

There's another option with many databases these days: you can often use aggregation functions to return the related data as part of a single query, even across many-to-many tables.

I wrote up how to do that using JSON aggregation functions in both SQLite and PostgreSQL for example: https://til.simonwillison.net/sqlite/related-rows-single-que...

panzerboiler3y ago

How would you add also the count of the votes of each comment in the aggregation, as per the example in the article?

simonw3y ago

Lots of ways to do that, one way would be using a CTE like this one: https://lite.datasette.io/?install=datasette-pretty-json&sql...

    with comment_vote_counts as (
      select
        comment_id,
        count(*) as vote_count
      from
        votes
      group by
        comment_id
    ),
    comments_with_vote_counts as (
      select
        id,
        post_id,
        content,
        coalesce(vote_count, 0) as votes
      from
        comments
        left join comment_vote_counts on comments.id = comment_vote_counts.comment_id
    )
    select
      posts.id,
      posts.title,
      posts.content,
      json_group_array(
        json_object(
          'id',
          comments_with_vote_counts.id,
          'content',
          comments_with_vote_counts.content,
          'votes',
          comments_with_vote_counts.votes
        )
      ) as comments
    from
      posts
      join comments_with_vote_counts on comments_with_vote_counts.post_id = posts.id
    group by posts.id

1 more reply

ydnaclementine3y ago· 2 in thread

Would this not be solved with adding `votes` to the `includes`? Something like:

```

Post.includes(comments: :votes)

```

Similar stackoverflow: https://stackoverflow.com/a/24397716

rurabe3y ago

If you can count in the database itself it's a big win. Although no doubt your solution is cleaner code.

ramchip3y ago

Exactly this. Combine with Bullet[1] to detect problems early.

[1] https://bhserna.com/tools-to-help-you-detect-n-1-queries.htm...

pharmakom3y ago· 1 in thread

This comes up in GraphQL, not just ORMs. A beautify solution is Facebook’s Haxl. Less beautiful is data-loader.

eezing3y ago

Correct. While data-loader facilitates data loading, type resolvers in GraphQL is where the solution starts.

bfung3y ago· 1 in thread

> breadth-first loading. The ideal solution requires us to load the data in a breadth-first approach, but unfortunately, this is harder to write because it does not compose well.

The author finds the simplest and efficient solution, but continues to over engineer for blog content :P

“Composing” is overrated in this case.

ananthakumaranOP3y ago

eloisius3y ago

TexanFeller3y ago

Understand N+1 before you try GraphQL.

funnyfoobar3y ago

Somewhat deviating, but relavent. If we use counter cache that is to keep vote_count on comments table, the include(:comments) solution would work fine.

https://scoutapm.com/blog/how-to-start-using-counter-caches-...

pmg1023y ago

j / k navigate · click thread line to collapse