MongoDB Is Abusing JSON (opens in new tab)

(smsohan.com)

43 pointssm_sohan13y ago58 comments

58 comments

44 comments · 15 top-level

benologist13y ago· 7 in thread

You picked an ugly mongo query, and there are many. You compared it to a concise SQL query, and there are many that are not.

MongoDB's limit(x) and skip(y) are a shitload nicer than most of Microsoft's ideas about pagination. It was only in SQL Server 2012 that they came up with "OFFSET" instead of "google it".... http://stackoverflow.com/questions/2244322/how-to-do-paginat...

andrewmunsell13y ago

I literally just ran into the same issue with SQL Server. Since we're running an older version, I was very confused when I found the alternative to MySQL's "LIMIT"-- it wasn't pretty.

benologist13y ago

I like how they timed something everyone has wanted for like a million years with the switch to per-core pricing lol.

kogir13y ago

What was wrong with ROW_NUMBER()? It worked just fine for me, and partitioning was frequently useful.

corresation13y ago

There are extremely few modern cases where pagination at the database layer is a good approach.

kogir13y ago

How about any cases where you don't want to transfer all the rows you're skipping over the wire? As in, nearly all cases?

1 more reply

alanctgardner213y ago

This is an interesting statement to me. My goal in retrieving data is normally to reduce the result set as fast as possible. If I only need a subset of the data, I would do the bare amount of joining, aggregation, etc. before paginating.

I assume the alternative is paginating server-side, which wastes some network bandwidth and processing time on the server.

1 more reply

dpe8213y ago

How else do you do it? Select the entire table and filter it at the application layer?

1 more reply

mglukhovsky13y ago· 5 in thread

Here's what this query looks like in RethinkDB (also based on JSON documents):

  r.table('orders')
   .pluck('cust_id','ord_date','price')
   .groupBy('cust_id','ord_date', r.sum('price')).
   .filter(r.row('reduction').gt(250))

We use the hard-coded attribute 'reduction' because groupBy automatically gets compiled to our distributed map-reduce infrastructure. There is currently no as command (though it could easily be simulated with map). I'll add a GitHub issue for this shortly, since we should add sugar for it.

greendestiny13y ago

The beauty of JSON is you could incredibly easily write such jquery like chained expression to generate the mongo query. JSON is a good ascii representation of structured data thats compact, quite readable and reasonably human editable.

Its not an abuse of JSON to use it as a way of representing queries, but its probably a shame that Mongo haven't provided a better way of generating queries.

sm_sohanOP13y ago

Totally agree with you.

istvanp13y ago

What are the main differences between MongoDB and RethinkDB? Things that you hear people care about? I don't use Mongo but I've seen projects that use it. I ask because I want to store & retrieve some analytics.

After seeing the screencast of RethinkDB it seems to me that it's just like Mongo but a much better API and easier handling of sharding and replication. The only con I see of using RethinkDB is that it's very new (so is Mongo...) and probably is little too early for production. Basically I hope not to find worse surprises down the road than I hear of Mongo :P

coffeemug13y ago

We're going to publish some comparisons in a few days, but basically you nailed it (both pros and con). We're trying to build a stellar data-oriented development environment/query language, and a stellar scalability infrastructure. All the main pieces are there, and Rethink is a lot of fun to use. The rough edges will be polished out in over the next few months.

Goopplesoft13y ago

Most ORMs use dot notation, even for mongo (e.g. mongo engine) to build queries. I think the mongo queries are more true to what it should be since dot notation in most languages implies a function/method is being called and order matters in those cases. Here it's just acting like a query builder that runs with .run().

Under the same effect I'm not a fan of mongo style query({}).limit(count).skip(num) format and usually I move the limit and skip into the initial query.

cheald13y ago· 3 in thread

Aggregation is absolutely one of Mongo's weaknesses. It's not great at ad-hoc aggregation like MySQL or whatnot is, and the fact that it tends to lend itself to denormalized data makes SQL-style reporting clunky at best.

It does a lot of things better than SQL, too. Consider, for example, the query "Give me a list of all posts with all of these tags, by any of these authors, sorted by post date descending"

    db.posts.find({
      tags: {$all: ["foo", "bar", "baz"]},
      author: {$in: ["Joe", "Jane"]}
    }).sort({post_date: -1})

In SQL, you'd end up with something like:

    SELECT posts.* FROM posts
      INNER JOIN post_tags t1 ON t1.tag = "foo" AND t1.post_id = posts.id
      INNER JOIN post_tags t2 ON t2.tag = "bar" AND t2.post_id = posts.id
      INNER JOIN post_tags t3 ON t3.tag = "baz" AND t3.post_id = posts.id
      WHERE posts.author = "Joe" or post.author = "Jane"
      ORDER BY post_date DESC;

Its strength is denormalization; since you can denormalize entire lists or maps of data into a document, and then index and query on them, you can end up performing queries that would be ridiculously ugly and tedious in SQL.

djb_hackernews13y ago

not to take away from any of your great points in this post but isn't it the same as:

    SELECT posts.* FROM posts
      INNER JOIN post_tags pt ON pt.post_id = posts.id AND pt.tag IN ('foo', 'bar', 'baz')
      WHERE posts.author IN ('Joe', 'Jane')
      ORDER BY post_date DESC;

My SQL is rusty, I could be missing something but they seem essentially equivalent if you use the SQL helpers such as IN.

cheald13y ago

No, because that'll select any post that contains any of those three tags, not the posts that contain all three tags. AND vs OR.

It's worth noting that I threw the SQL query a bone by denormalizing post_tags into one table. In a properly relational DB, you'd have a tags table, a posts table, and a post_tags join table, so the query gets even hairier (or you have to do two queries).

    SELECT posts.* FROM posts
      INNER JOIN post_tags pt1 ON pt1.post_id = posts.id
      INNER JOIN tags t1 ON t1.tag = "foo" and pt1.tag_id = t1.id
      INNER JOIN post_tags pt2 ON pt2.post_id = posts.id
      INNER JOIN tags t2 ON t3.tag = "bar" and pt2.tag_id = t2.id
      INNER JOIN post_tags pt3 ON pt3.post_id = posts.id
      INNER JOIN tags t3 ON t3.tag = "baz" and pt3.tag_id = t3.id
      WHERE posts.author = "Joe" or post.author = "Jane"
      ORDER BY post_date DESC;

Yikes.

It's also worth noting that this generates a massive temp table to be sorted, which is very likely going to end up causing you to have to do a filesort. In practice, you'd probably break this down into 3 queries (forgive my mixing languages):

    $tag_ids = SELECT id FROM tags WHERE tag IN ("foo", "bar", "baz")

    $post_ids = SELECT posts.id FROM posts
      INNER JOIN post_tags pt1 ON pt1.post_id = posts.id and pt1.tag_id = $tag_ids[0]
      INNER JOIN post_tags pt2 ON pt2.post_id = posts.id and pt2.tag_id = $tag_ids[1]
      INNER JOIN post_tags pt3 ON pt3.post_id = posts.id and pt3.tag_id = $tag_ids[2]
      WHERE posts.author = "Joe" or posts.author = "Jane"

    $posts = SELECT posts.* FROM posts WHERE id IN ($post_ids) ORDER BY post_date DESC;

Easily doable in both languages, but Mongo's denormalized structure makes this sort of use case a ton simpler.

2 more replies

gumbo13y ago

Hashes are very handy and that a strength of mongo. However as someone using mongo on a daily basis (and often from within the console), i need to admit that performing query is hard. However if you are using it from a app you are building there are many library that make it easier.

mattparlane13y ago· 3 in thread

I've been through the hassle of programatically piecing together complex SQL queries, and I'd far rather be able to just put together hashes that represent my query.

SQL was originally designed so that people who were savvy but not necessarily developers were able to query databases, but I can't think of the last time my boss would have wanted to run some random query against our production database.

msarchet13y ago

This is a daily occurrence at some companies

cerales13y ago

Surely by the time you're programmatically generating SQL queries you should be using either an ORM or some other kind of SQL expression language embedded in a more expressive language (such as SQLAlchemy or the myriad lisp DSPs for relational databases)?

sm_sohanOP13y ago

I am not sure of its origin. But these days you'll find good ORMs that would craft the query for you. My point is, the Mongo API seems to be more of a machine readable API with the $ keys used somewhat in a hacky way.

byoung213y ago· 3 in thread

There is always http://querymongo.com/ which will convert SQL to a MongoDB query.

cheald13y ago

I actually dislike this tool a fair bit, because it gets people to continue thinking of Mongo as "Mysql plus WEB SCALE" or whatever. It's a totally different database and doesn't do well with highly-normalized relational-style data, and the idea of an "automated converter" seems to reinforce the idea that it's just a drop in for MySQL that automatically solves all your scaling woes, when that's just utterly and completely false.

Trying to shove a MySQL square into the Mongo triangle just isn't going to work out that well.

Goopplesoft13y ago

You mean SQL not MySQL.

Also that might be one thing it does but it also allows people to transition from SQL queries they know to mongodb queries. It helps the learning process.

1 more reply

aioprisan13y ago

this is great, I've been looking for something like this

samarudge13y ago· 2 in thread

JSON is used as a query language because it's fast, easy to parse and easy to generate dynamically. If you have a query interface for users, SQL is probably a better choice, but Mongo chose JSON for performance reasons.

If you don't like dealing with it directly, use something like MongoEngine so you're not working with the raw queries, or if having readable, easy to understand queries is important, use a SQL database.

Everything is a compromise, with Mongo's query language you're sacrificing readability for performance.

( This is not a comparison of a SQL database to Mongo, just the time it takes for a SQL engine to parse the query into an execution plan )

marshray13y ago

> JSON is used as a query language

Maybe a little more accurate to say JSON is used as a base layer for the query language.

JSON is "JavaScript Object Notation". But the "meaning" of the query is in the objects being denoted, not the notation used to represent them as text. So comparing Mongo's use of JSON to SQL is apples-to-oranges.

We could encode SQL as JSON too:

    {"query": "SELECT * FROM things;"}

    {"query": [
        {"SELECT": "*"},
        {"FROM": "things"} ] }

without affecting the expressive power of the SQL language one bit.

sm_sohanOP13y ago

This doesn't have to be this way. The underlying machine friendly API can be hidden under a human friendly API abstraction.

pestaa13y ago· 2 in thread

TL;DR: JSON sucks for representing queries.

davidlumley13y ago

It's not JSON, but rather that MongoDB's query language sucks. I'm not entirely sure how to fix it - perhaps make it slightly more verbose and meaningful?

pestaa13y ago

Probably there are better ways to express a query in JSON compared to how MongoDB does it, but I'd take a step back and ask whether a nested map is the best approach to think about it.

1 more reply

andrewmunsell13y ago· 2 in thread

MongoDB is a NoSQL-type database, so it wouldn't make sense to have a SQL query interface... I think they did a good job with the API for not using SQL.

Plus, the API isn't really abusing JSON. It isn't pretty, but it's not abuse.

Firehed13y ago

People seem to be confusing "NoSQL" and "non-relational". Mongo happens to be both, but there's nothing fundamental that puts the two together. You may not get the full capabilities of SQL with non-relational data (JOINs, etc), but there's no reason that non-relational data stores couldn't parse normal SQL and execute the appropriate queries.

You can make a relational database that doesn't support the SQL syntax, and you can use SQL syntax to interact with schemaless data (for added fun, try throwing JSON in a mysql/postgres text field).

I'd agree with the article saying this is an abuse of JSON, though. It's a format to represent data; more accurately, potentially-nested key:value stores, arrays, and scalar types. A query is not data (unless you're one of those "my database has a 'queries' table" types)

nslater13y ago

How is a query not data?

1 more reply

gumbo13y ago· 1 in thread

I feel the same about the query language of Mongo. The first goal of a query language should be ease of use. With mongo having to type all those extra characters quotes, brakets, square brakets, colons is very annoying. You need to type a lot to get any reasonable output.

woah13y ago

You might like coffeescript.

    db.orders.aggregate [
      $group:
        _id:
          cust_id: "$cust_id"
          ord_date: "$ord_date"

        total:
          $sum: "$price"
    ,
      $match:
        total:
          $gt: 250
    ]

deepinsand13y ago· 1 in thread

I think it's great. I've spent too much time parsing SQL strings into well typed data structures, and you get it for free with Mongo.

anko13y ago

I kinda agree (about the free part), but I just wanted to say that if you are parsing sql strings you really should get an ORM. Plug in a well tested, optimised library and never think about parsing sql strings again. It also makes things easier to test because you get nice separation of concerns.

aidos13y ago

I'm just finishing off a project that was built using Mongo and I've run into this as well.

Other gotchas too, like feeling like you can store any old json structure in your db when you can't.

Dots are reserved because they're part of the query syntax. Fair enough, but it's pretty crappy to have to unpick a whole data structure because it was fine until a random bit of UGC was entered (that's where my last fews hours just went).

It does feel like the data and the query syntax are too crossed over to me.

taylorbuley13y ago

I don't think it's fair to take JSON out of context from the rest of the query API. Or maybe it's being overly generous, I'm not sure. Either way, this is the same way object literals are constructed in JavaScript. So is the beef w/JavaScript?

For me the Mongo shell is just enough so-called "richness" and "expressiveness" (Try it yourself: http://try.mongodb.org/). There's a certain magic to passing objects to functions (and being able to, say, read the body of a function by typing that object into the CLI).

maxharris13y ago

Meh. I count roughly the same number of tokens either way. I really don't think this is that big of a deal.

coenhyde13y ago

Mongodb queries are much easier to dynamically build than SQL. And this is so because Mongodb uses JSON for queries.

gummydude13y ago

have you seen Elasticsearch API?

j / k navigate · click thread line to collapse

58 comments

44 comments · 15 top-level

benologist13y ago· 7 in thread

You picked an ugly mongo query, and there are many. You compared it to a concise SQL query, and there are many that are not.

andrewmunsell13y ago

I literally just ran into the same issue with SQL Server. Since we're running an older version, I was very confused when I found the alternative to MySQL's "LIMIT"-- it wasn't pretty.

benologist13y ago

I like how they timed something everyone has wanted for like a million years with the switch to per-core pricing lol.

kogir13y ago

What was wrong with ROW_NUMBER()? It worked just fine for me, and partitioning was frequently useful.

corresation13y ago

There are extremely few modern cases where pagination at the database layer is a good approach.

kogir13y ago

How about any cases where you don't want to transfer all the rows you're skipping over the wire? As in, nearly all cases?

1 more reply

alanctgardner213y ago

I assume the alternative is paginating server-side, which wastes some network bandwidth and processing time on the server.

1 more reply

dpe8213y ago

How else do you do it? Select the entire table and filter it at the application layer?

1 more reply

mglukhovsky13y ago· 5 in thread

Here's what this query looks like in RethinkDB (also based on JSON documents):

  r.table('orders')
   .pluck('cust_id','ord_date','price')
   .groupBy('cust_id','ord_date', r.sum('price')).
   .filter(r.row('reduction').gt(250))

greendestiny13y ago

Its not an abuse of JSON to use it as a way of representing queries, but its probably a shame that Mongo haven't provided a better way of generating queries.

sm_sohanOP13y ago

Totally agree with you.

istvanp13y ago

coffeemug13y ago

Goopplesoft13y ago

Under the same effect I'm not a fan of mongo style query({}).limit(count).skip(num) format and usually I move the limit and skip into the initial query.

cheald13y ago· 3 in thread

It does a lot of things better than SQL, too. Consider, for example, the query "Give me a list of all posts with all of these tags, by any of these authors, sorted by post date descending"

    db.posts.find({
      tags: {$all: ["foo", "bar", "baz"]},
      author: {$in: ["Joe", "Jane"]}
    }).sort({post_date: -1})

In SQL, you'd end up with something like:

    SELECT posts.* FROM posts
      INNER JOIN post_tags t1 ON t1.tag = "foo" AND t1.post_id = posts.id
      INNER JOIN post_tags t2 ON t2.tag = "bar" AND t2.post_id = posts.id
      INNER JOIN post_tags t3 ON t3.tag = "baz" AND t3.post_id = posts.id
      WHERE posts.author = "Joe" or post.author = "Jane"
      ORDER BY post_date DESC;

djb_hackernews13y ago

not to take away from any of your great points in this post but isn't it the same as:

    SELECT posts.* FROM posts
      INNER JOIN post_tags pt ON pt.post_id = posts.id AND pt.tag IN ('foo', 'bar', 'baz')
      WHERE posts.author IN ('Joe', 'Jane')
      ORDER BY post_date DESC;

My SQL is rusty, I could be missing something but they seem essentially equivalent if you use the SQL helpers such as IN.

cheald13y ago

No, because that'll select any post that contains any of those three tags, not the posts that contain all three tags. AND vs OR.

    SELECT posts.* FROM posts
      INNER JOIN post_tags pt1 ON pt1.post_id = posts.id
      INNER JOIN tags t1 ON t1.tag = "foo" and pt1.tag_id = t1.id
      INNER JOIN post_tags pt2 ON pt2.post_id = posts.id
      INNER JOIN tags t2 ON t3.tag = "bar" and pt2.tag_id = t2.id
      INNER JOIN post_tags pt3 ON pt3.post_id = posts.id
      INNER JOIN tags t3 ON t3.tag = "baz" and pt3.tag_id = t3.id
      WHERE posts.author = "Joe" or post.author = "Jane"
      ORDER BY post_date DESC;

Yikes.

    $tag_ids = SELECT id FROM tags WHERE tag IN ("foo", "bar", "baz")

    $post_ids = SELECT posts.id FROM posts
      INNER JOIN post_tags pt1 ON pt1.post_id = posts.id and pt1.tag_id = $tag_ids[0]
      INNER JOIN post_tags pt2 ON pt2.post_id = posts.id and pt2.tag_id = $tag_ids[1]
      INNER JOIN post_tags pt3 ON pt3.post_id = posts.id and pt3.tag_id = $tag_ids[2]
      WHERE posts.author = "Joe" or posts.author = "Jane"

    $posts = SELECT posts.* FROM posts WHERE id IN ($post_ids) ORDER BY post_date DESC;

Easily doable in both languages, but Mongo's denormalized structure makes this sort of use case a ton simpler.

2 more replies

gumbo13y ago

mattparlane13y ago· 3 in thread

I've been through the hassle of programatically piecing together complex SQL queries, and I'd far rather be able to just put together hashes that represent my query.

msarchet13y ago

This is a daily occurrence at some companies

cerales13y ago

sm_sohanOP13y ago

byoung213y ago· 3 in thread

There is always http://querymongo.com/ which will convert SQL to a MongoDB query.

cheald13y ago

Trying to shove a MySQL square into the Mongo triangle just isn't going to work out that well.

Goopplesoft13y ago

You mean SQL not MySQL.

Also that might be one thing it does but it also allows people to transition from SQL queries they know to mongodb queries. It helps the learning process.

1 more reply

aioprisan13y ago

this is great, I've been looking for something like this

samarudge13y ago· 2 in thread

Everything is a compromise, with Mongo's query language you're sacrificing readability for performance.

( This is not a comparison of a SQL database to Mongo, just the time it takes for a SQL engine to parse the query into an execution plan )

marshray13y ago

> JSON is used as a query language

Maybe a little more accurate to say JSON is used as a base layer for the query language.

We could encode SQL as JSON too:

    {"query": "SELECT * FROM things;"}

    {"query": [
        {"SELECT": "*"},
        {"FROM": "things"} ] }

without affecting the expressive power of the SQL language one bit.

sm_sohanOP13y ago

This doesn't have to be this way. The underlying machine friendly API can be hidden under a human friendly API abstraction.

pestaa13y ago· 2 in thread

TL;DR: JSON sucks for representing queries.

davidlumley13y ago

It's not JSON, but rather that MongoDB's query language sucks. I'm not entirely sure how to fix it - perhaps make it slightly more verbose and meaningful?

pestaa13y ago

Probably there are better ways to express a query in JSON compared to how MongoDB does it, but I'd take a step back and ask whether a nested map is the best approach to think about it.

1 more reply

andrewmunsell13y ago· 2 in thread

MongoDB is a NoSQL-type database, so it wouldn't make sense to have a SQL query interface... I think they did a good job with the API for not using SQL.

Plus, the API isn't really abusing JSON. It isn't pretty, but it's not abuse.

Firehed13y ago

You can make a relational database that doesn't support the SQL syntax, and you can use SQL syntax to interact with schemaless data (for added fun, try throwing JSON in a mysql/postgres text field).

nslater13y ago

How is a query not data?

1 more reply

gumbo13y ago· 1 in thread

woah13y ago