Startups should use a relational database (opens in new tab)

(raycmorgan.com)

57 pointsraycmorgan12y ago75 comments

75 comments

65 comments · 16 top-level

HorizonXP12y ago· 13 in thread

Serious question: what are NoSQL databases really good for? I'm only really used to relational DBs, and I'm unclear about which problems a NoSQL database is useful for.

BadassFractal12y ago

Take Redis for example: simple KV database with in-memory perf and a very comfortable API with option to flush data to disk periodically. Addresses a lot of interesting scenarios such as session storage, request throttling by a certain key etc. Not used as a replacement for a RDBMS most of the time, but rather as a specialized tool for a certain use case. At some point your RDBMS is already hammered hard enough, you don't need to dump everything into it, but ultimately you could, especially at first. Comes with clustering for free as long as you're aware of the risks and use it appropriately.

This is not to say you couldn't turn Postgres into something similar. Put the data on a ramfs, relax various write guarantees (lots of knobs in PG) etc.

nl12y ago

which problems a NoSQL database is useful for

It depends in which NoSQL database. Depending on your problem, you can probably find a NoSQL DB optimised for it. It will often be unclear if that NoSQL DB is actually better than a relational database until you try it.

Examples:

High write throughput: Cassandra

Simple key-value: Redis

Text search: Solr/Elastic Search

etc..

twic12y ago

Redis does more than simple key-value - rather than just reading and writing values, since values can be complex types like lists and dictionaries, it can insert into them, append to them, etc. It's still key-value, it's just not simple key-value!

That said, i would hesitate to describe Redis as a database at all. A key characteristic of databases is that they store every write in a durable way. Redis can checkpoint its state periodically, but as i understand it, it either can't or typically isn't used to safely keep every write. Redis is something in between a database and memcached. I doubt there's ever a situation where you have to choose between PostgreSQL/Cassandra/CouchDB and Redis; Redis is something you would use in addition to a database.

As for text search - RDBMSs have full text search, and at least in the case of PostregSQL, it seems pretty good - see slide 49 in http://es.slideshare.net/billkarwin/full-text-search-in-post... from 2009. You might not want to be leaning on your database for text search when you're at scale (for operational reasons more than performance ones), but it's a plausible way to start.

SiVal12y ago

Somewhat similar to the difference between C/C++ or enterprise Java and various dynamic or "scripting" languages in terms of structure, the "S" in SQL or NoSQL. The former tends to require more up-front structural design to do anything, which is a nuisance at first when you're not sure what you will end up needing. The latter tends to be quicker to throw together and repeatedly rearrange in the early days of its evolution, which is a huge advantage to a small startup trying to figure out what business it's in, but often runs into trouble later, once the needs have stabilized and the structure becomes more of a benefit than a cost.

For areas other than structure, the difference can be more like the difference between a general purpose programming language and a domain-specific language. SQL databases are general purpose, so they offer a large, nicely integrated package of features, while NoSQL databases often sacrifice some of those features to optimize for some niche. From this perspective, the SQL database can be the best choice for a startup trying to find its niche with the NoSQL chosen as an optimization once that niche has been identified.

The fact that a new business can't be sure what its business will end up being is an argument both for and against using a SQL database.

AznHisoka12y ago

I have been using redis as a write though cache(not just a read cache). So I used to have 1000 concurrent writes to my Postgres db. Obviously this had a negative impact on performance even with modern specs(64gb ram, 8 cores). I realized that a lot of those writes we're updating the same column for the same row but in different threads. So I basically use redis to buffer these queries and perform 1 single write instead of multiple writes for the same row.

robbles12y ago

What's the algorithm you use for consolidating those repeated writes? I've thought about doing something like this in the past, but was put off by the worry that there might be edge cases where data would be lost (e.g. insufficient writes to trigger a write through, many concurrent writes, etc.)

AznHisoka12y ago

So I actually store the rows as serialized JSON objects in my redis cache, and when a write occurs, I update the JSON representation in the cache (if it's not in the cache, I read it from the DB and store it there). After X hours, I have a daemon that goes through all expired keys, deserializes the JSON, and executes a transaction, updating 500 objects at a time. To prevent possible race conditions, you need a mutex to allow only 1 thread to modify the cache for that row.

Each object has a field that keeps track of what fields were updated, so I can construct an update query from that.

When I read a row from the DB, I first check if it's in the write through cache, if so I retrieve it there.

Of course, this means that all writes go through my cache. There can't be any other process that updates that table that avoids writing to the cache, or else we have database inconsistency.

This improved the load in my DB server by more than ten-fold, as well as the indexing time to my ElasticSearch server.

1 more reply

marcosdumay12y ago

When you remove non-local constrains (like foreign keys) you gain scalability, that enables companies like Google to query their data better.

Now, if you are doing anything more mundane (like, for example, logging every bank operation in a country) that's overkill, and non-local constrains are a great thing to have. Thus, I'd advise not using it unless you have a clear necessity.

And always keep in mind that changing yor data is hard, whatever technology you are usig. Relational databases making the change itself hard, while schemaless databases make maintaining the software that reads it hard. Schemaless does give you a small upfront benefit, for a huge cost later.

VLM12y ago

Via analogy:

RDBMS = swiss army chainsaw, can do anything although maybe painfully and slowly.

NoSQL = 5/16th inch torque pre-set wrench for tightening sma connectors to 10 inch-lbs torque. Can't do much else, but its near perfect at its tiny little area of expertise.

On a meta level its a battle between hardware and applications, where app demands have (momentarily, of course) fallen behind commodity hardware ability.

microtonal12y ago

For example, XML databases are handy when you have large XML datasets that you like to query. The database indexes the XML, allowing you to execute most XPath queries and XQuery programs quickly.

taspeotis12y ago

I had a quick Google and MSSQL [1], DB2 [2] and Oracle [3] all support indexing XML.

[1] http://technet.microsoft.com/en-us/library/ms191497.aspx

[2] http://publib.boulder.ibm.com/infocenter/db2luw/v9r5/index.j...

[3] http://docs.oracle.com/cd/B28359_01/appdev.111/b28369/xdb_in...

microtonal12y ago

It's not about indexing paths you indicate, XML databases are about indexing the whole structure (remember, XML is not relational, but are graphs). Besides that they provide XPath/XQuery processing and optimization. You can query large XML documents or sets of documents, like you'd query them with e.g. XQilla.

Of course, it is possible to implement all of this on top of existing database technology. E.g. Oracle's Berkeley DB XML is implemented on top of Berkeley DB. But, a relational database with some indexing of XML does not provide the same functionality as an XML database.

2 more replies

daliusd12y ago

Map-reduce style scalability. SQL databases are not good for that but can be used for that as well. However then your DB will be not normalized most probably.

But again different NoSQL DBs have different purpose.

robconery12y ago· 9 in thread

One thing that could likely get you fired rather quickly is running analytics on your live transactional system. Yes, your business needs to make decisions based on data, this is not terribly new. To think that you only have one data store is a bit short-sighted.

Many businesses (including startups) have moved to using document stores for high read environments and scraping nightly drops to their backend analytics systems. This is smart - you don't want to run summing/aggregation on a live transactional system for (hopefully) obvious reasons.

EDIT: it's also worth noting that map/reduce is typically much more powerful when aggregating large datasets. When trying to run analytics on top of a transactional system, developers like Ray here would end up with multiple joins and groupings - all of which slow everything down. Map/reduce certainly isn't perfect, but the author dismisses it as difficult witchcraft when, in practice, parallel execution of MR queries can greatly decrease resources and time to information.

I sort of think we've moved beyond this discussion.

nostromo12y ago

The disconnect between your comment and the article is the term "startup" now means giant companies like Airbnb and tiny two person companies that haven't yet created an MVP.

I think this article is targeted at the latter: pre-MVP and just post-MVP. For those startups, having two databases with one dedicated to a backend analytics system reeks of premature optimization.

karterk12y ago

Having taken this path from a 2-person running a "nobody cares about this" app to an app with some decent traction, there are always other things more worthwhile to do in the early stages than trying to get insights from the invariably small amounts of data in the DB. A 2-person company should be talking to users rather than trying to analyze patterns from database tables. Sample size is just too small, and you will be evolving so fast that the trends will almost be meaningless.

If you grow to be more than 2-people, then taking a mirror dump of the prod db to run queries against is pretty trivial effort.

vidarh12y ago

Think of it this way: You really ought to have a failove/disaster recovery copy of your database, and an easy way of making offsite backups.

Set up a slave of your MySQL or Postgres database (no the slave is not a backup, but dumping it regularly is an easy first step to a very basic backup setup), hosted in a different data centre, and if you're a two person company you now also have somewhere to run analytics whenever you feel ready.

It can be <1 hour effort and a few tens of dollars a month in extra costs for a small system, yet makes a tremendous difference in resilience and gives you that db to run analytics against "for free" whenever you do need it.

VLM12y ago

"having two databases with one dedicated to a backend analytics system reeks of premature optimization."

Its free software, you don't have to pay for two instances of Oracle.

One thing that will quickly kill a biz is combining the functions of PROD and DEV/TEST. Making the DEV/TEST box the DEV/TEST/REPORTS box is not a big deal, and you can't run a (real) biz without a DEV/TEST box.

waps12y ago

Having run the technical side of ~5 "small" businesses now (no more than $1.5M revenue), I disagree.

Eventually, nothing can match the performance of storing binary blobs on a cluster. But that only becomes worthwhile if you database is significantly larger than a terabyte. And I'm only talking about the operational "core" database, not your "data warehouse" (the log dumping ground, which should be split off when your database gets to be a few dozen gigs).

Meanwhile, mysql has big advantages :

1) can do basic optimization with "ALTER TABLE", even (mostly) live.

2) you can mix PROD and DEV/TEST (though obviously you need to use good judgement). Obviously you should also have a DEV/TEST instance for actual testing. Sometimes you want to run a test quickly against PROD though. Adding a slave, having it sync and then running against the slave is a joy.

3) creating reports is quick, customizable and everything you want.

4) It's "idiot-friendly". Employees can ramp up to the structure in a mysql db in 2 weeks flat. Try that with custom document stores.

5) It's typesafe and relational safe (if correctly designed), with the advantages that brings : significantly less weirdness in the database.

6) Phpmyadmin. Mysql workbench. Django. Php ...

I'm even going to argue that the GP's argument, that running analytics on PROD can get you fired, is not just wrong, it's actually an advantage of using mysql. (And the open source SAP database can run "live" analytics. You just can't believe how great that is for dashboards)

robconery12y ago

True - but the backend analytics system can be Excel :) which it often is. In fact in my "two person startup" (that was alive happily for 5 years) this is exactly what I did :). I do take your point, however. I still think it's wise to not have an admin backend that runs rollups on your live system (which we've all done).

hedgehog12y ago

It would be unusual that someone would run slow queries against the production master, typically you either use a replica or a backup restored into a separate environment. In my experience SQL has been pretty good for exploratory querying. By the time a company grows out of that setup they'll probably understand what tools will answer the questions they've got at scale.

marcosdumay12y ago

What I don't get about those discussions is why people assume databases that can ONLY do map/reduce are a good thing? SQL is perfectly able to express map/reduce operations, and they main relational databases handle it almost as well as it's possible.

robconery12y ago

Completely agree here. Your live transactional system needs to export reportable data - this can come in the form of a DB backup/download (which I've done many times) so you can run queries locally or by pushing CSVs using a cron job.

You can use a SQL query for this or a simple map/reduce - either way my argument is to focus on transactional design that works best for your system - don't conflate reporting needs with your live system (i.e. BA queries).

pkolaczk12y ago· 8 in thread

He forgot one of the very important reasons to use (some) NoSQL databases: high availability. Relational database systems are very poor at providing that. Most often the availability options are limited to resistance to node failures. RDBMSes have several SPOFs and must use failover which is not dependable, hard to test, and in many times needs manual intervention. Forget resistance to network partitions.

jahewson12y ago

CAP theorem tells us that you can't have availability without sacrificing consistency or partition tolerance, which means that there isn't a NoSQL database which can do that either.

It is not true that relational databases must have a single point of failure (SPoF) or must use failover: MySQL Cluster is a sharded multi-master distributed database without a SPoF.

On the other hand Redis, for example, is a master-slave failover NoSQL datastore.

pkolaczk12y ago

CAP theorem says it cannot be done at the same time. But it is perfectly fine to sacrifice consistency for availability at the time partition happens and restore consistency once the partition is fixed. Still better than nothing if revenue counts. Financial institutions do like that all the time.

gaius12y ago

None of what you wrote is true.

pkolaczk12y ago

And you provided no arguments at all. Sure, there are things like multi-master replication in RDBMS world, but I yet have to see a scalable system that utilizes it and at the same time is fully ACID compliant (which rules out async replication). We tried multimaster HA replication once, but the write throughput was terrible.

gaius12y ago

There is no possible argument to make. There are a million - probably more - highly available RDBMS systems all over the world handling real money and real goods and services, 24/7. You can argue if you wish that tables are not a good way to store particular data structures, fine. But the claim that RDBMSs are unreliable is just ludicrous and had been for 30 years.

1 more reply

coolsunglasses12y ago

Rarely matters for a startup.

pkolaczk12y ago

For the startup I once worked for, it mattered much more than we had thought at the beginning. The investors were smart enough to notice we had some considerable periods of downtime. Additionally, once we got first million of users (not really that much and nowhere near the scale of Google or FB) we ran into performance problems which couldn't be easily solved just by indexing, optimizing queries or adding more hardware, and "buying" a beefy Oracle superserver was not an option as we didn't have enough revenue yet. So we had to dump joins, relax transactions, denormalize a lot and ended with a half-baked, bug-ridden NoSQL store on top of PostgreSQL, that couldn't even do horizontal partitioning well. I wished we had a proper solution like Cassandra right from the start. It would save us lots of pain.

coolsunglasses12y ago

First million users. Come on. 99% of your audience is never going to have that problem.

Especially if they spend their early days fucking with a Cassandra cluster instead of talking to customers.

And it should be noted, you made it anyway.

When you make it by the skin of your teeth, that means you probably timed it right.

Preempting a problem far-ahead of time in startups means time and effort was wasted, especially if it was done before the existence of the problem was established.

It is unreal to me that people still can't figure out how to apply Maslow's hierarchy to startups.

2 more replies

lmm12y ago· 7 in thread

The unhappy truth is that for many startups, relational integrity and transaction safety are simply not very valuable. Customers of an early-stage startup are by definition willing to take a risk on whatever they're getting from that startup. So simply not thinking about these problems - accepting that occasionally a partial write will happen, or two writes will collide, or a migration will not quite work correctly and your pages will crash until it's fixed - is a worthwhile sacrifice to increase development speed.

stiff12y ago

This is just a very bad excuse for doing ridiculously shitty systems that stay shitty way after the startup phase. If you at all bother to understand your problem domain, and are not just a monkey at the typewriter, writing down a good domain model and adding the constraints is not going to decrease development speed, quite to the contrary, it's going to increase it, web developers often spend whole workdays just tracking down in the logs the "story" of some now angry customer that happened to violate some unformalized assumption of the system and got mishandled later in the process, especially if this customer happened to pay already. Not to mention that with a good domain model, the code for the individual functionality flows out naturally, while with a shitty one, you might end up with three times as much code for the same thing.

Also, those mistakes in modelling the domain and in enforcing the constraints are often there to stay and slowly become impossible to fix, once you have 10000 records that do not fall into a few well specified states, it's hard to go through all of them, find some common denominators, and migrate the database. Not to mention that with the mess people can do in the code, and with the messy stack in use today, it's easy to introduce bugs that might be hard for anyone to notice but seriously harm your business.

The amount of fashionable nonsense in software engineering seems to be higher than ever, unfortunately.

raverbashing12y ago

> If you at all bother to understand your problem domain

This is a startup. There is no problem domain. There is no spec. There's none for a startup. You may not even have code.

If you're doing a "startup" but have this ironed out, great, everything you said then applies, but it's not a startup, since you found your business model, it's a small company.

And I would love to enforce constraints on the DB but unfortunately, I already had "primary keys" that repeat, unbeknownst to the project customers.

marcosdumay12y ago

> This is a startup. There is no problem domain.

Is the startup just a game developers do on their free time? Because if it is trying to solve a problem, there is a problem domain, even if it's the wrong problem.

lmm12y ago

If you understand the problem well enough to write a good domain model then you're basically already there - which is to say, you can only know you're modelling the right domain after you've found product/market fit. But to be able to find that out you need some level of product, which often means having working code before you fully understand what your product really is. At that stage, a schemaless database is very helpful; working out your domain model in detail and typing it out twice would be a waste of time when it's likely to change quickly.

memracom12y ago

Why do you call it "software engineering"? Do they try to follow SEMAT? SWEBOK? If not then they probably are just software hackers and do not deserve to be called software engineers. A real software engineer is someone who follows rigourous engineering processes, does not reinvent the wheel and who does their work by applying the lessons of those who have gone before. It requires mental discipline to do this, in particular because you have to resist the urge to reinvent the wheel.

Reinventing the wheel is great fun. Without constraints you can appear to be amazingly productive. But it is all a fantasy because without constraints and a rigorous process you are probably piling up technical debt faster than anything else.

memracom12y ago

And what is wrong with pointing out that "software engineering" is not used to talk about engineering?

raverbashing12y ago

Exactly

Relational DBs may even be the wrong choice for some specific problems.

And about this quote from the article: "At some point, you will need to ask your primary database questions. If you chose the wrong database, this is where things get tricky. "

Yes, this is correct. However, I know how to read the manual of whatever db I'm using and maybe code something simple to process the output to the format of my liking.

RyanZAG12y ago· 7 in thread

Depends what your startup is doing. If you are only using your database to store some basic transactions, then a relational database is a very good fit. This is really the case for most startups tackling common problems. However, if your startup is tackling a problem with unique technical challenges, then you can't just ignore the issue. For example, a geo-location startup tracking the location in real time of users with a free app is simply not going to be able to use a relational database.

robbles12y ago

I don't see why not - use a relational database for storing long-term data, such as users, friendships, preferences, etc. and store ephemeral data such as location in a key-value store like Redis. Store summary statistics of the location data in your primary data store in scheduled background tasks.

Just because you're storing a huge amount of one specific type of data, that doesn't prevent you from taking advantage of the features of a relational database.

Sanddancer12y ago

Precluding yourself from using a relational database means that you're not going to be able to use some of the best tools available for geographical data. Things like PostGIS are built around, and heavily dependent on, the fact that it is Postgres. Now, you may not want to use an RDBMS for everything, but at the same time, you don't want to pull it completely out of the system either.

charliesome12y ago

> For example, a geo-location startup tracking the location in real time of users with a free app is simply not going to be able to use a relational database.

Why not?

nl12y ago

Rapidly growing, infrequently queried data is not the ideal scenario for most relational databases.

1) Relational databases typically aren't optimised for write-throughput. It's quite possible to do it, but you'll need fast and large disks (eg, FusionIO in a SAN or something).

2) Location-tracking applications typically don't require interactive queries - generally it is more a batch-based system that can be run offline.

Saying you are not going to be able to use a relational database is overstating it a bit in my view.

Clearly you can make it work, but something like Cassandra will give you better write thoughput, won't force you to rely on a SAN/NAS for data storage and will let you use Map/Reduce to batch process the data.

VLM12y ago

Total disagree. The most important job of a startup at startup time is probing the market.

If you have a billion customers each with a million records you are the Google Maps location thingy and are not a startup anymore and a relational solution may, or may not, work.

If you have ten users the ideal database is probably ten interns and some whiteboards. I'm not kidding.

I observe there is a common claim brought up every week on HN about fake it till you make it. You don't automate (insert menial task here) until you have ten customers.

I observe there is another common claim brought up every week on HN about how you need a massively scalable design which can never change from day one, because your schema, either formal or informal, is perfect and unchanging, LOL.

Those two stylistic outlooks are not compatible.

Start with something that has too many features, too many abilities, too much room for expansion, like a relational DB, and then later on if you need to, put some stuff on another platform. IF you need to. IF your company survives. Lots of IF.

RyanZAG12y ago

Fake it till you make it is why you'd use a quick and dirty mongodb with json dumped straight from your clients, though. Maybe we have different ideas on the speed of development on mysql vs mongodb though.

VLM12y ago

Also probably different outlooks on which letters to emphasize in CRUD, and probably different assumptions about what "all" startups are doing with a database, anyway.

yeukhon12y ago· 4 in thread

I don't know. It's a hard question. MongoDB is pretty much used in any hackathons simply because it's easy to setup, driver support is good, and schemaless. The last one is really why people use MongoDB over SQL DBMS. For startup, there might be a concern that schema migration is tough.

But one can argue that not careful with schema design can break api and make codebase messy.

I guess I will stick with the hard work now... I guess not careful with schema will definitely bite me.

microtonal12y ago

I find that a poor argument. One can use an ORM that automatically creates a schema based on classes. E.g. I like Ebean with DDL generation. You just write classes and add @Entity annotations. Ebean automatically creates the schema. Combine this with an embedded database, such as h2, and there is virtually nothing to set up.

Once you are out of the rapid iteration phase, you can take the latest Ebean generated schema and use to proper migrations for later changes.

est12y ago

Another reason to choose MongoDB is built-in array and nested dict support, with good enough indexing.

So you don't have to create bullshit m2m tables with tedious joins for a fucking tagging system

mwhite12y ago

Obligatory JSON and hstore in Postgres comment.

est12y ago

stable version, every value is string. No atomic incremental operations, no nesting, shitty index.

1 more reply

VLM12y ago· 1 in thread

There can only be one DB much like the LOTR can only have one ring. Why? Thats the only area the linked article falls down on. Its a pretty good article other than that.

So you properly normalized your entire system, customer billing transaction records all the way up to article tags. Then article tags gets too huge. So next version looks at RDBMS and Redis, and the next version after that only looks at Redis. Customer billing transactions remains on a "real" DB and the tag cloud lives on redis. And the problem with that is... what exactly?

Its obsolete thinking. I can't have two databases because we're a poor startup and the only databases that exist are DB2 and Oracle and everyone knows they're super expensive so super expensive times two is unaffordable. Dude, its almost 2014 not 1980, Postgres/mysql/redis its all free.

raycmorganOP12y ago

Thank you for your comment. I agree that there is nothing wrong with using a plurality of systems when needed. Your example of moving from one to another is great! Start with a simple system, and once you find bottlenecks, optimize with specialized stores.

dclara12y ago

I'm totally with you. We've experienced to have Object database or XML database and NoSQL database. Now we understand that relational database is just the right way to go for web applications, because it deals with structured data so well, keeps querying and sorting, filtering seamlessly and effortlessly. It's a must.

It is the same thing for choosing Linux distribution and JDK mode. See the references here:

http://bingobo.info/blog/table-of-contents.jsp

BTW, your title should have "relational" instead of "relation".

Gulthor12y ago

People too often forget about graph databases when talking about NoSQL solutions. Graph databases offer an interesting and elegant alternative to relational databases and I could definitely see a startup decide to use this kind of technology.

As far as I know, most graph databases support transactions and offer great scalability. Such databases are also schema-less and can be queried with Gremlin, a powerful graph traversal language (see www.tinkerpop.com).

With respect to scalability and transactions, Titan (http://thinkaurelius.com/) looks very promising: it supports various backends for storage (Cassandra, HBase, etc.) and indexing (currently Elastic Search and Lucene). Graph analytics can be done via Faunus (http://thinkaurelius.github.io/faunus/), backed by Hadoop.

There are other vendors out there (Neo4J, OrientDB, etc.) which offer interesting solutions worth looking at - I'm just a bit less familiar with them.

The major downside I see with graph databases is that most of them are fairly recent and their ecosystem is tiny (though growing). Should a startup venture on such young technologies, or stick to mature and battle-tested solutions (ie. relational databases)?

Could startups use this kind of graph "NoSQL" databases? I don't see why not. If your startup is some kind of social network, graph databases are certainly an option worth considering. If I were to create a startup, I'd hardly use a document database like MongoDB but I will really consider using a graph database. In the end, it's all about having the right tool in hand, and knowing how to assert what is "right" for you.

adamnemecek12y ago

A blog post titled "Startups should use NoSQL databases" in 3, 2, 1, ...

jwilliams12y ago

This comes up every now and then on HN. There are plenty of NoSQL horror stories.

Thing is. Most SQL database at scale is a bit of a horror too. Have you seen real-life production relational databases? Gawd. Hacks on hacks. Then you add another database. And another analytics database. And a bunch of point to point data feeds. Argh.

But hey. That's data.

If you think choosing SQL will solve your analytics woes down the line -- it's just not true. You're in for some pain no matter what you do.

... That's unless you get a porcelain schema first time. Which, if you're in a startup, probably means you're working on the wrong problem.

That's not an argument for using NoSQL (I used MongoDB daily, but I've got plenty of love for PostgreSQL). It's a rebuttal that SQL magically solves a different problem.

luzero12y ago

Startups should use the right tool.

A relational database might be just as wrong as a nosql one if all you need is redis.

sedlich12y ago

Strongly disagree with the article as simplification always looks shiny. Start-Ups should sit back for a few hour and days and invest the work to answer some serious questions as these http://nosql-database.org/select-the-right-database.html (there are other cataloges like this one).

Then you get a little closer to the truth.

lampe312y ago

I often read that argument that NoSQL Databases are Schemaless and yes the Database is but your Data is or it isn't. YOU must know your Data.

"All the while moving work onto the developers to standardize how they handle different migration cases."

I know a startup is fast and bla bla... BUT your team should know the tools that you are using... For me SQL DB's force me to add a new field and some kind of value and i don't like to be forced to a solution.

"In document stores, you have two choices: store related data as sub-documents, or store related data as separate documents with references. It is up to the developers to understand the trade-offs of both approaches. Selecting one over the other can lead to performance gains or issues, scalability issues and above all, make asking certain questions of the data a lot harder."

Again know the tools you are using. And for example MongoDB has good ORM's too.

"But that takes much more forethought and is dependent on a particular problem."

If your startup is doing something new and shiny you don't have the knowledge and forethought and you often dont know what particular problem will come at you.

Most of the point's look like: You learned at your University SQL now you know it(but in really life you don't) and now use it because you know how to normalize a Database. This argumentation is often used to say why java is so great or why javascript is bad.

I personally started with php then moved to rails and now to meteor(uses MongoDB) and we never before meteor could make so fast a good prototype which for a startup is very important.

So yeah if you are comfy with SQL use it if your comfy with NoSQL use it.

guard-of-terra12y ago

Don't they? For some tasks relational databases are good, for some they are worse. Call me captain.

However relational databases will have hard time with big data because your dataset is bigger than your database and you have no relational integrity.

joshguthrie12y ago

"Blog titles should stop using should."

j / k navigate · click thread line to collapse

75 comments

65 comments · 16 top-level

HorizonXP12y ago· 13 in thread

Serious question: what are NoSQL databases really good for? I'm only really used to relational DBs, and I'm unclear about which problems a NoSQL database is useful for.

BadassFractal12y ago

This is not to say you couldn't turn Postgres into something similar. Put the data on a ramfs, relax various write guarantees (lots of knobs in PG) etc.

nl12y ago

which problems a NoSQL database is useful for

Examples:

High write throughput: Cassandra

Simple key-value: Redis

Text search: Solr/Elastic Search

etc..

twic12y ago

SiVal12y ago

The fact that a new business can't be sure what its business will end up being is an argument both for and against using a SQL database.

AznHisoka12y ago

robbles12y ago

AznHisoka12y ago

Each object has a field that keeps track of what fields were updated, so I can construct an update query from that.

When I read a row from the DB, I first check if it's in the write through cache, if so I retrieve it there.

Of course, this means that all writes go through my cache. There can't be any other process that updates that table that avoids writing to the cache, or else we have database inconsistency.

This improved the load in my DB server by more than ten-fold, as well as the indexing time to my ElasticSearch server.

1 more reply

marcosdumay12y ago

When you remove non-local constrains (like foreign keys) you gain scalability, that enables companies like Google to query their data better.

VLM12y ago

Via analogy:

RDBMS = swiss army chainsaw, can do anything although maybe painfully and slowly.

NoSQL = 5/16th inch torque pre-set wrench for tightening sma connectors to 10 inch-lbs torque. Can't do much else, but its near perfect at its tiny little area of expertise.

On a meta level its a battle between hardware and applications, where app demands have (momentarily, of course) fallen behind commodity hardware ability.

microtonal12y ago

For example, XML databases are handy when you have large XML datasets that you like to query. The database indexes the XML, allowing you to execute most XPath queries and XQuery programs quickly.

taspeotis12y ago

I had a quick Google and MSSQL [1], DB2 [2] and Oracle [3] all support indexing XML.

[1] http://technet.microsoft.com/en-us/library/ms191497.aspx

[2] http://publib.boulder.ibm.com/infocenter/db2luw/v9r5/index.j...

[3] http://docs.oracle.com/cd/B28359_01/appdev.111/b28369/xdb_in...

microtonal12y ago

2 more replies

daliusd12y ago

Map-reduce style scalability. SQL databases are not good for that but can be used for that as well. However then your DB will be not normalized most probably.

But again different NoSQL DBs have different purpose.

robconery12y ago· 9 in thread

I sort of think we've moved beyond this discussion.

nostromo12y ago

The disconnect between your comment and the article is the term "startup" now means giant companies like Airbnb and tiny two person companies that haven't yet created an MVP.

I think this article is targeted at the latter: pre-MVP and just post-MVP. For those startups, having two databases with one dedicated to a backend analytics system reeks of premature optimization.

karterk12y ago

If you grow to be more than 2-people, then taking a mirror dump of the prod db to run queries against is pretty trivial effort.

vidarh12y ago

Think of it this way: You really ought to have a failove/disaster recovery copy of your database, and an easy way of making offsite backups.

VLM12y ago

"having two databases with one dedicated to a backend analytics system reeks of premature optimization."

Its free software, you don't have to pay for two instances of Oracle.

waps12y ago

Having run the technical side of ~5 "small" businesses now (no more than $1.5M revenue), I disagree.

Meanwhile, mysql has big advantages :

1) can do basic optimization with "ALTER TABLE", even (mostly) live.

3) creating reports is quick, customizable and everything you want.

4) It's "idiot-friendly". Employees can ramp up to the structure in a mysql db in 2 weeks flat. Try that with custom document stores.

5) It's typesafe and relational safe (if correctly designed), with the advantages that brings : significantly less weirdness in the database.

6) Phpmyadmin. Mysql workbench. Django. Php ...

robconery12y ago

hedgehog12y ago

marcosdumay12y ago

robconery12y ago

pkolaczk12y ago· 8 in thread

jahewson12y ago

CAP theorem tells us that you can't have availability without sacrificing consistency or partition tolerance, which means that there isn't a NoSQL database which can do that either.

It is not true that relational databases must have a single point of failure (SPoF) or must use failover: MySQL Cluster is a sharded multi-master distributed database without a SPoF.

On the other hand Redis, for example, is a master-slave failover NoSQL datastore.

pkolaczk12y ago

gaius12y ago

None of what you wrote is true.

pkolaczk12y ago

gaius12y ago

1 more reply

coolsunglasses12y ago

Rarely matters for a startup.

pkolaczk12y ago

coolsunglasses12y ago

First million users. Come on. 99% of your audience is never going to have that problem.

Especially if they spend their early days fucking with a Cassandra cluster instead of talking to customers.

And it should be noted, you made it anyway.

When you make it by the skin of your teeth, that means you probably timed it right.

Preempting a problem far-ahead of time in startups means time and effort was wasted, especially if it was done before the existence of the problem was established.

It is unreal to me that people still can't figure out how to apply Maslow's hierarchy to startups.

2 more replies

lmm12y ago· 7 in thread

stiff12y ago

The amount of fashionable nonsense in software engineering seems to be higher than ever, unfortunately.

raverbashing12y ago

> If you at all bother to understand your problem domain

This is a startup. There is no problem domain. There is no spec. There's none for a startup. You may not even have code.

If you're doing a "startup" but have this ironed out, great, everything you said then applies, but it's not a startup, since you found your business model, it's a small company.

And I would love to enforce constraints on the DB but unfortunately, I already had "primary keys" that repeat, unbeknownst to the project customers.

marcosdumay12y ago

> This is a startup. There is no problem domain.

Is the startup just a game developers do on their free time? Because if it is trying to solve a problem, there is a problem domain, even if it's the wrong problem.

lmm12y ago

memracom12y ago

And what is wrong with pointing out that "software engineering" is not used to talk about engineering?

raverbashing12y ago

Exactly

Relational DBs may even be the wrong choice for some specific problems.

And about this quote from the article: "At some point, you will need to ask your primary database questions. If you chose the wrong database, this is where things get tricky. "

Yes, this is correct. However, I know how to read the manual of whatever db I'm using and maybe code something simple to process the output to the format of my liking.

RyanZAG12y ago· 7 in thread

robbles12y ago

Just because you're storing a huge amount of one specific type of data, that doesn't prevent you from taking advantage of the features of a relational database.

Sanddancer12y ago

charliesome12y ago

> For example, a geo-location startup tracking the location in real time of users with a free app is simply not going to be able to use a relational database.

Why not?

nl12y ago

Rapidly growing, infrequently queried data is not the ideal scenario for most relational databases.

1) Relational databases typically aren't optimised for write-throughput. It's quite possible to do it, but you'll need fast and large disks (eg, FusionIO in a SAN or something).

2) Location-tracking applications typically don't require interactive queries - generally it is more a batch-based system that can be run offline.

Saying you are not going to be able to use a relational database is overstating it a bit in my view.

VLM12y ago

Total disagree. The most important job of a startup at startup time is probing the market.

If you have a billion customers each with a million records you are the Google Maps location thingy and are not a startup anymore and a relational solution may, or may not, work.

If you have ten users the ideal database is probably ten interns and some whiteboards. I'm not kidding.

I observe there is a common claim brought up every week on HN about fake it till you make it. You don't automate (insert menial task here) until you have ten customers.

Those two stylistic outlooks are not compatible.

RyanZAG12y ago

VLM12y ago

Also probably different outlooks on which letters to emphasize in CRUD, and probably different assumptions about what "all" startups are doing with a database, anyway.

yeukhon12y ago· 4 in thread

But one can argue that not careful with schema design can break api and make codebase messy.

I guess I will stick with the hard work now... I guess not careful with schema will definitely bite me.

microtonal12y ago

Once you are out of the rapid iteration phase, you can take the latest Ebean generated schema and use to proper migrations for later changes.

est12y ago

Another reason to choose MongoDB is built-in array and nested dict support, with good enough indexing.

So you don't have to create bullshit m2m tables with tedious joins for a fucking tagging system

mwhite12y ago

Obligatory JSON and hstore in Postgres comment.

est12y ago

stable version, every value is string. No atomic incremental operations, no nesting, shitty index.

1 more reply

VLM12y ago· 1 in thread

There can only be one DB much like the LOTR can only have one ring. Why? Thats the only area the linked article falls down on. Its a pretty good article other than that.

raycmorganOP12y ago

dclara12y ago

It is the same thing for choosing Linux distribution and JDK mode. See the references here:

http://bingobo.info/blog/table-of-contents.jsp

BTW, your title should have "relational" instead of "relation".

Gulthor12y ago

There are other vendors out there (Neo4J, OrientDB, etc.) which offer interesting solutions worth looking at - I'm just a bit less familiar with them.

adamnemecek12y ago

A blog post titled "Startups should use NoSQL databases" in 3, 2, 1, ...

jwilliams12y ago

This comes up every now and then on HN. There are plenty of NoSQL horror stories.

But hey. That's data.

If you think choosing SQL will solve your analytics woes down the line -- it's just not true. You're in for some pain no matter what you do.

... That's unless you get a porcelain schema first time. Which, if you're in a startup, probably means you're working on the wrong problem.

That's not an argument for using NoSQL (I used MongoDB daily, but I've got plenty of love for PostgreSQL). It's a rebuttal that SQL magically solves a different problem.

luzero12y ago

Startups should use the right tool.

A relational database might be just as wrong as a nosql one if all you need is redis.

sedlich12y ago

Then you get a little closer to the truth.

lampe312y ago

I often read that argument that NoSQL Databases are Schemaless and yes the Database is but your Data is or it isn't. YOU must know your Data.

"All the while moving work onto the developers to standardize how they handle different migration cases."

Again know the tools you are using. And for example MongoDB has good ORM's too.

"But that takes much more forethought and is dependent on a particular problem."

If your startup is doing something new and shiny you don't have the knowledge and forethought and you often dont know what particular problem will come at you.

I personally started with php then moved to rails and now to meteor(uses MongoDB) and we never before meteor could make so fast a good prototype which for a startup is very important.

So yeah if you are comfy with SQL use it if your comfy with NoSQL use it.

guard-of-terra12y ago

Don't they? For some tasks relational databases are good, for some they are worse. Call me captain.

However relational databases will have hard time with big data because your dataset is bigger than your database and you have no relational integrity.

joshguthrie12y ago

"Blog titles should stop using should."

j / k navigate · click thread line to collapse