If you are using SQLite, or another in process database, N+1 isn't an issue at all. So with the increased use of SQLite as an "edge" database it's something to consider.
"Many Small Queries Are Efficient In SQLite": https://www.sqlite.org/np1queryprob.html
I was brought in to a problem to help with a migration gone south. Main dashboard page was taking 'too long'.
A system had been acquired, then 'moved' to the 'new cloud architecture'. The main dashboard code made an assumption that all db queries were 'localhost' but after it was moved, the db engine was across a network (a google MySQL database and app code on GCE instance, IIRC).
The dashboard was making around 8000 queries. Yes, that's sort of overwhelmingly bad. When they were all 'local', most were a fraction of a millisecond. Dashboard load time was typically ~1 second or so. Going across a network, this ballooned to 20-30 seconds (again IIRC) - some clients had load times of more than a minute. Each db connection was 2-3ms, and the code was so bad that each section of code was making its own connections - single screen might make a hundred or more database connections.
A few weeks of code tracing, rewriting and lots of data caching, I got dashboard down to 5-8 seconds for most clients (who were still angry about it).
To be clear, the entire process of acquiring/purchasing this system. Very little due diligence was done on company A's part. Company A bought company B, and company B staff was all let go.
In any event... I'm a fan of db/app talking on 'localhost', but you can't always assume that will always be the architecture, and you should still aim to be aware of 'too many queries' and guard against when you can.
- the data passed from one query to the next still needs to move from the database process to the service process and back
- the queries will always be executed in the order they are in the code, denying the optimizer the opportunity to execute the full query in the best order
That being said, the problem here is not whether N+1 is a problem or not, but rather if, given the immense amount of unnecessary complexity that using an ORM brings, it is appropriate to use an ORM.
The ceiling of the range where you can get away with this without user-visible performance impact will be much higher, and the relative performance difference may be smaller, but in general fewer queries for the same data will still be better in general.
Even with an in-process DB, you're still essentially making a sort of context switch.
It lets you only query and cache the resources that have changed, not the whole dataset: https://blog.appsignal.com/2018/04/03/russian-doll-caching-i...
But it has been interesting to see the tide turn back towards server-side rendering, relying on partial DOM replacement for client-side updates. For web apps that don't have massive numbers of UI states (like a document editor), it seems like people are rethinking the wisdom thick client-side JavaScript applications, which seem to be one of the main motivators for REST API layers, and the need to efficiently fulfill N+1 queries.
Although, I do remember dealing with the N+1 problem when doing Django server-side apps more than a decade ago, before the dominance of client-side apps. I guess it was more the rise of MVC architecture and the active record pattern (https://en.wikipedia.org/wiki/Active_record_pattern) that brought the N+1 problem, more so than client-side apps.
I bet wordpress (or many plugins) is/are so slow exactly because of this.
ORMs will even make it easier to write more efficient queries is many cases.
ORM (or GTM if you like) helps you navigate your Graph of Tuples.
As you noted, this is one of the reasons why client-side JavaScript often falls far short of the envisioned benefits. In both cases, however, I would suggest that while the trend is real and should inform your architecture the most important thing is to routinely use your monitoring. Different apps have different performance challenges but they all need to be observable and I’ve seen so many times where people wasted tons of time and resources shooting in the dark because they didn’t have granular monitoring or thought it was too hard.
One of the best example I had was a while back when I inherited a Django codebase which was too slow. One of the developers had spent a couple days rewriting everything in Jinja2 because “Django is slow”, introducing the bugs you’d expect and leading to lots of new custom code to maintain (such as 3 versions of the human size formatting function). Performance didn’t budge. Six months later, I saw the MySQL query counter spiraling up on every page view, installed Django Debug Toolbar, looked at the queries, and spent an afternoon reverting all of the templates and fixing the N+1 queries which had always been the problem. That was tens of thousands of lines of code churn all wasted for what was eventually maybe 50 lines changed against the original codebase.
One thing I introduced which worked well was a test hook which failed based on the query counts for a view. That caught most N+1 issues and is conveniently easy to implement with almost any model.
After that I joined a startup that used rails and I had quite the education in ORMs and N+1 queries. The ORM felt quite restricting and I felt a lot less confident in my code.
I don't know if the approach is possible with every ORM or if it's just leveraging some Ruby perks, but I can't think of a good reason why you wouldn't use the equivalent everywhere.
This is what I do for the most complex queries, where translating them to the ORM would be a time consuming pain. What I do normally is think in SQL and write in the ORM language. After all the API of the database is SQL, not the ORM, and I already know SQL.
In this way it's obvious that you have to write includes(:comments), because it's a join, and it's obvious that the original
object.votes.count
would generate one query each time it is called. I write the separate queries and compose the results instead. I'm not using BatchLoader or other gems. Using directly the ORM is enough.If this query turns out to be important for the performances of the application, I'll think about a way to write it with a single query in SQL to make the database do all the work. Then translate it in ActiveRecord or leave it in SQL if it takes too long. Sometimes it's not obvious how to do it, a problem common to all ORMs of all languages.
Edit: somebody called saila gave an example of such a query in the comments one hour before I wrote my reply.
The with query starts with the ID of the object to which the computed property is attached to. So in this case you would do the same thing as the author and write
WITH comment_vote_count as select comment_id, COUNT(vote_id) from comment left join vote ...
The ORM would just treat this like a normal table and do a join based on the comment ID except the ORM provides a convenient API to do this, which it currently does not...
You could also get it down to 1 query using SQL. This is one way to do it based on the schema in the article [postgres, not well tested]:
with
latest_posts as (
select * from post limit 3
),
latest_comments as (
select
c.*, count(v.id) as votes
from
comment c
left join
vote v on v.comment_id = c.id
where
c.post_id in (select id from latest_posts)
group by
c.id, c.content
)
select
p.*, json_agg(c)
from
latest_posts p
left join
latest_comments c on c.post_id = p.id
group by
p.id, p.title, p.content
# NOTE: fixed SQL bug noted by @rurabe
Off the top of my head, I'm not sure how you would (or if you could) do this with ActiveRecord, SQLAlchemy, or the Django ORM, but it's probably more complicated than just writing the SQL.To be clear, I'm not anti-ORM and use them all the time, but it really helps to understand SQL well when using them and to know when it's appropriate to switch to SQL.
(Also I think the above sql needs to be tweaked since you need the votes count grouped by comment not by post)
A one to many relationship in pure SQL is an awkward fit with a Rails app as it requires serializing (at least) the many as json. Then there's this weird conceptual gotcha where one resource is an AR instance and another is a pure hash.
I'd probably make a scope and association to help out here:
class Comment
scope :with_vote_count, ->{ joins(:votes).select('comments.*').select('count(votes.*) as vote_count') }
end
class Post
has_many :comments
has_many :comments_with_vote_counts, ->{ with_vote_counts }, class_name: 'Comment'
end
# in controller
@posts = Post.includes(:comments_with_vote_counts).limit(3).order(:created_at: :desc)
# in view/serializer, posts and comments are both AR instances
@posts.each do |post|
post.comments.each do |comment|
comment.vote_count # => Integer
end
end
This should give you 2 queries, one to load the posts, then one to load the comments and vote counts for the relevant posts. Controller stays nice and slim and the complexity is delegated to sql via the join scope, without any other dependencies.* edited for HN code block syntax
Post.objects.order_by("-created_at").prefetch_related(
Prefetch(
"comments",
queryset=Comment.objects.annotate(
vote_count=Count("votes")
),
)
)[:3]
This will generate the following two queries: SELECT
"post"."id",
"post"."created_at",
"post"."title",
"post"."content"
FROM "post"
ORDER BY "post"."created_at" DESC
LIMIT 3;
SELECT
"comment"."id",
"comment"."post_id",
"comment"."content",
COUNT("vote"."id") AS "vote_count"
FROM "comment"
LEFT OUTER JOIN "vote"
ON ("comment"."id" = "vote"."comment_id")
WHERE "comment"."post_id" IN (3, 2, 1)
GROUP BY
"comment"."id",
"comment"."post_id"
[1] https://docs.djangoproject.com/en/4.1/ref/models/querysets/#...When I did web development, I saw it as a "hack" and a "failure to write clean code" whenever I reached for raw SQL. This is of course not true at all, but it was a powerful psychological blocker and I'd spend too much time trying to figure how to get the ORM to do what I wanted instead of writing the SQL myself and moving on to the next problem.
I agree. I often feel like I benefited by starting my web career pre-ORM and only learning to use them a few years in, so I can appreciate and use both. I sometimes wonder if it’s harder for new devs to acquire the same kind of experience.
You could do something like `Post.objects.latest().annotate_comments()` which would resolve almost exactly to the query you wrote above.
from django.db.models import Count, Prefetch
from myproject.models import Post, Comment
# This will fetch the 3 posts first and then the comments for those posts
query = Post.objects.prefetch_related(
Prefetch("comments", queryset=Comment.objects.annotate(Count("votes")))
)
posts = query[:3]
How would you reduce this to one query using the Django ORM?At least some with ORMs you can specify afterwards (when passing the activerecords) that you want to have prefetched (inner or outer joins, caching). Sometimes it's done automatically, because you can simply detect when you're in an N+1 query loop.
No good, community backed serialisation gem. AMS is a mess, other ones are not maintained. And I'm not a fan of JSON api spec's serialisation either.
But also AR doesn't have any easy tools to construct complex queries/multi queries. It works for basic and medium stuff, but even this very common count problem is a disaster to deal with. Sure. you can use Arel and some other gems, but these aren't good solutions for someone that wants to get things done. Makes me wonder how others deal with these problems tbh.
find_by_sql gives you enough freedom to get everything out of the db into a ActiveModel object.
If there's something that really doesn't benefit from your model (reports), then you'd fallback to either SQL, or still use ActiveModel + the aggregates
What if you have a complex/dynamic query, how do you build it? You said yourself that AR works better than concatenating SQL strings, but AR doesn't even support CTEs atm and building complex queries is not trivial and sometimes even possible without just SQL strings...
I wrote up how to do that using JSON aggregation functions in both SQLite and PostgreSQL for example: https://til.simonwillison.net/sqlite/related-rows-single-que...
with comment_vote_counts as (
select
comment_id,
count(*) as vote_count
from
votes
group by
comment_id
),
comments_with_vote_counts as (
select
id,
post_id,
content,
coalesce(vote_count, 0) as votes
from
comments
left join comment_vote_counts on comments.id = comment_vote_counts.comment_id
)
select
posts.id,
posts.title,
posts.content,
json_group_array(
json_object(
'id',
comments_with_vote_counts.id,
'content',
comments_with_vote_counts.content,
'votes',
comments_with_vote_counts.votes
)
) as comments
from
posts
join comments_with_vote_counts on comments_with_vote_counts.post_id = posts.id
group by posts.id```
Post.includes(comments: :votes)
```
Similar stackoverflow: https://stackoverflow.com/a/24397716
If you can count in the database itself it's a big win. Although no doubt your solution is cleaner code.
[1] https://bhserna.com/tools-to-help-you-detect-n-1-queries.htm...
The author finds the simplest and efficient solution, but continues to over engineer for blog content :P
“Composing” is overrated in this case.
https://scoutapm.com/blog/how-to-start-using-counter-caches-...