MongoDB and Realm make it easy to work with data, together – MongoDB (opens in new tab)

(mongodb.com)

43 pointsmetheus7y ago18 comments

18 comments

12 comments · 5 top-level

lkrubner7y ago· 6 in thread

At some point I'll write up my notes about how I've been using MongoDB. I've basically given up on SQL databases. Every architecture, for every scale, from small to Enterprise, is really better handled by MongoDB, sometimes in conjunction with Kafka (since any sufficiently large operation is automatically heterogeneous and polyglot, with different database technologies).

When you're a small startup and you're just starting up, you can create a single MongoDB instance (ignore everything about you've heard about Web Scale) and stuff data into it as needed, without thinking much about the structure. You can add in contracts on your database functions, which slowly specify the contract, as you learn more about what your project is really about. To get a sense of that style of development, please see what I wrote in "How ignorant am I, and how do I formally specify that in my code?"

http://www.smashcompany.com/technology/how-ignorant-am-i-and...

MongoDB is great for ETL. You can pull JSON from 3rd party APIs and store it in its original form, then later transform it into the different forms you need.

In large Enterprises, you will inevitably be trying to get multiple services and databases to work together. The old style for dealing with this was the ESB (Enterprise Service Bus) or SOA (Service Oriented Architecture) but in recent years most of the big companies I've worked with have moved toward something like a unified log, as Jay Kreps wrote about in "The Log: What every software engineer should know about real-time data's unifying abstraction". If you haven't read that yet, go read it now:

https://engineering.linkedin.com/distributed-systems/log-wha...

In this context, MongoDB can offer a flexible cache for the most recent snapshot your service has built, based off of what it read from Kafka.

Some people are sabotaged by MongoDB, and they start treating canonical data as a cache. Obviously that leads to disaster. I believe this is what happened to Sarah Mei. Her experiences caused her to write "Why You Should Never Use MongoDB"

http://www.sarahmei.com/blog/2013/11/11/why-you-should-never...

The one rule I would suggest is that you always need to be clear, in your own head, which collections are canonical and which are cache. When I talk to teams who are new to this, I tell them to use a naming convention, such as adding a "c_" to the start of every collection that is canonical. All other collections can be assumed to be caches. And the great thing is, it is very cheap to create caches. You can have 20 caches for the same data, in slightly different formats. You can have one cache where the JSON is optimized to what the Web front-end needs, and another cache where the JSON is optimized for the mobile app, and another cache where the JSON is optimized for an API for external partners. Just don't fall into the trap that Sarah Mei mentions, where you treat everything as a cache. You need to be clear in your head which data is canonical. If you are using Kafka the way Jay Kreps mentions, then the data in Kafka is canonical and everything in MongoDB is a cache. But at smaller operations, I've used MongoDB to hold both the canonical data and the caches, in different collections.

tmountain7y ago

This strategy seems like it forgoes what I consider an important step in any project, which is, thinking critically about your data model and getting that right before you start building code on top of that structural foundation.

I could see doing what you're describing to build a prototype, which I would then extrapolate my learnings from, and subsequently toss out, but this seems like a dangerous way to get started with something that will end up in production (and potentially maintained for years to come), as it glosses over the importance of coming up with a really coherent data model, and let's face it, data is the heart and soul of most projects.

Am I wrong?

lkrubner7y ago

"I could see doing what you're describing to build a prototype"

It's very much for prototypes, and especially greenfield projects. If I was, instead, doing something like building a new service, inside an Enterprise that was already using something like the unified log architecture that Jay Kreps has described, then I would certainly think hard about what the schema would be for the particular service I was building -- after all, in such situations you're never going to pull all of the data out of Kafka, so you automatically have to figure out what part of the data you want. LinkedIn currently stores 900 terabytes of data in its Kafka instance, and I'm unlikely to write a new service that actually needs all of the 900 terabytes of data. So merely by thinking about the question "What of this data do I need?" I'm already implicitly thinking about a schema.

Having said all of that, how often have you written a service where you got the schema 100% correct on your first try, and no further changes to the schema were needed. Possibly you are smarter than I am, but I personally have never done that. All of my first attempts need later adjustment.

manigandham7y ago

Every architecture? I don't think so. You're conflating a whole bunch of topics here.

Document-stores have their uses but they are not the best for everything. Thinking about structure isn't hard and all modern RDBMS have JSON fields now if you need that flexibility, while still giving you ACID, transactions and the power of SQL.

SOA has nothing to do with ESB/distributed logs/event sourcing, and none of that has to do with MongoDB or document-stores. Event-sourcing is unnecessary for most, MongoDB is not a good event-sourcing system, and the point of materialized views on a stream is that they can be modeled in whatever database works best, not to just accept what the stream gives you.

The last part about treating a database as a cache is also strange. Use a cache if you need one, but that's a much more complex topic then just having a few collections that are caches. And again, it has nothing to do with MongoDB or document-stores being the correct architecture for everything.

lkrubner7y ago

I am not sure how you were confused by this:

"In large Enterprises, you will inevitably be trying to get multiple services and databases to work together. The old style for dealing with this was the ESB (Enterprise Service Bus) or SOA (Service Oriented Architecture)"

You write "SOA has nothing to do with ESB" yet both are attempts at "you will inevitably be trying to get multiple services and databases to work together" which was literally the sentence before the one you are reacting to.

As to this:

"SOA has nothing to do with ESB/distributed logs/event sourcing, and none of that has to do with MongoDB or document-stores"

The point of my comment is that MongoDB is so flexible it can replace other approaches to the problem of "get multiple services and databases to work together" including ESB and SOA.

At this point, I can not think of any reason to ever use an SQL database. Either your canonical data will be in Kafka, or it can go in MongoDB. There is no need for SQL databases, ever.

About this:

"MongoDB is not a good event-sourcing system"

Obviously, which is why I included a link to the Jay Kreps essay. Kafka is better for an event-sourcing system. I'm not sure how you misread that part.

About this:

"The last part about treating a database as a cache has nothing to do with databases"

It feels like you are almost deliberately trying to misread what I wrote. My whole point was that MongoDB is so flexible, it can work as a cache, and also as a store for canonical data (in those circumstances when you are not storing your canonical data in something like Kafka).

Is that more clear?

1 more reply

mercer7y ago

Would you say your advice equally applies to using Postgres with a JSONB column, which might make normalizing the data later on a bit easier (assuming it'd be kept in Postgres)? Or is there something specific to MongoDB that would make it a better choice?

krenoten7y ago

I make a lot of money fixing problems in storage layers that these kinds of ideas create. Thanks :]

zubairq7y ago· 1 in thread

Can someone confirm that Realm raised USD 40 M and was acquired for USD 39 M?

ljhaywar7y ago

TechCrunch is reporting those stats: https://techcrunch.com/2019/04/24/mongodb-to-acquire-open-so...

slau7y ago

As an ex-Realmer, I'd like to congratulate everyone involved in Realm in the past few years. Great engineering talent, extremely dedicated and motivated people. This is the company that has helped me understand the value of good marketing. I've also been inspired by what a good product owner can do.

Kudos to everyone. It's been a long road, and I'm glad that the codebase that initially started as a text editor finally found a new home.

jinjin27y ago

This is awesome! Two of my favorite products joining together. Now I just hope they keep their promise and keep investing in Realm.

VWWHFSfQ7y ago

I'm sure mongodb is fine now but it's too late. I don't care. I'll never use it again or recommend it to anyone

j / k navigate · click thread line to collapse

18 comments

12 comments · 5 top-level

lkrubner7y ago· 6 in thread

http://www.smashcompany.com/technology/how-ignorant-am-i-and...

MongoDB is great for ETL. You can pull JSON from 3rd party APIs and store it in its original form, then later transform it into the different forms you need.

https://engineering.linkedin.com/distributed-systems/log-wha...

In this context, MongoDB can offer a flexible cache for the most recent snapshot your service has built, based off of what it read from Kafka.

http://www.sarahmei.com/blog/2013/11/11/why-you-should-never...

tmountain7y ago

Am I wrong?

lkrubner7y ago

"I could see doing what you're describing to build a prototype"

manigandham7y ago

Every architecture? I don't think so. You're conflating a whole bunch of topics here.

lkrubner7y ago

I am not sure how you were confused by this:

As to this:

"SOA has nothing to do with ESB/distributed logs/event sourcing, and none of that has to do with MongoDB or document-stores"

The point of my comment is that MongoDB is so flexible it can replace other approaches to the problem of "get multiple services and databases to work together" including ESB and SOA.

At this point, I can not think of any reason to ever use an SQL database. Either your canonical data will be in Kafka, or it can go in MongoDB. There is no need for SQL databases, ever.

About this:

"MongoDB is not a good event-sourcing system"

Obviously, which is why I included a link to the Jay Kreps essay. Kafka is better for an event-sourcing system. I'm not sure how you misread that part.

About this:

"The last part about treating a database as a cache has nothing to do with databases"

Is that more clear?

1 more reply

mercer7y ago

krenoten7y ago

I make a lot of money fixing problems in storage layers that these kinds of ideas create. Thanks :]

zubairq7y ago· 1 in thread

Can someone confirm that Realm raised USD 40 M and was acquired for USD 39 M?

ljhaywar7y ago

TechCrunch is reporting those stats: https://techcrunch.com/2019/04/24/mongodb-to-acquire-open-so...

slau7y ago

Kudos to everyone. It's been a long road, and I'm glad that the codebase that initially started as a text editor finally found a new home.

jinjin27y ago

This is awesome! Two of my favorite products joining together. Now I just hope they keep their promise and keep investing in Realm.

VWWHFSfQ7y ago

I'm sure mongodb is fine now but it's too late. I don't care. I'll never use it again or recommend it to anyone

j / k navigate · click thread line to collapse