Understand companies like Amazon have shifted to NoSQL to manage their data. In short, I do think that SQL is for high processing, low disk space while NoSQL is vice versa in layman terms. Im pretty new to programming and appreciate a healthy discussion on this.
Please do not draw lines and have hate posts and all. Keep the discussion clean and intelligent.
Thanks!!
I'm baffled. I'm a programmer and i rather like SQL. Most programmers i know use SQL quite happily. Indeed, i often hear them wishing they could just write SQL to solve some problem rather than use some API that purports to save you from writing SQL.
I certainly prefer SQL to MongoDB's query language, which is verbose and unexpressive by comparison.
Am i in a tiny minority? Or is this SQL hatred a popular myth?
It's also very, very easy, and almost completely fat-free as a mathematical expression of what you might want to do with data. I learned almost everything I needed to know about it within an afternoon ~10 years ago, and the dividends paid since on that investment have been extraordinary.
The use of "non-language" here just reflects the extremely narrow view of how languages must look amongst programmers of late.
Every time I write a query of reasonable complexity, I have to figure out what I want the database to do (in an imperative sense -- read from this table first, then look up this in that table, then do this other thing with this other table for each row), and then figure out how to translate this into a declarative SQL statement, just so the database's query interpreter can turn that back into an imperative program it can run.
The whole thing is just a huge unnecessary round-trip of thinking, and SQL is limited in that there are many simple things it can't do, and you never really have any guarantee that the database's query analyzer will execute your query imperatively in the manner you hope for (which can lead to terrible performance problems, and hacks like index "hints" still don't always solve it). I'd MUCH rather be able to program against the database imperatively, directly.
So despite using SQL day-in, day-out, I totally agree that it is weird, obtuse, and a kind of "non-language" (in the sense that any query is limited to a single declarative statement, with a host of limitations, instead of an imperative programming language that allows you to write whole programs of execution).
Maybe it really only should be used for simple queries with simple where clauses. But because it can be used for more, we choose to do so.
It's based on relational algebra making it fairly predictable (query optimizer aside which is completely unpredictable). It operates on relations and returns relations.
"Select * from MyRelation;" returns a relation. So we could also write "select * from (select * from MyRelation);" and get the same result. You can continue that nested select as many times as you like and with as much complexity as you like and it would still work. Though undoubtedly it gets difficult to follow.
Adding to the obtuseness, SQL is not typed. So for "select * from MyRelation where SomeField = 1;" you could replace the 1 with another select statement, which would return a relation. But as long as that relation had a single column and a single row, it would be interpreted as a single value and successfully replace the 1.
Also adding to the obtuseness as you get more complex is that it variable scope for table aliases can be difficult to follow. Table alias names can be used across various select, where, from and other clauses to link them together through a common filter but it happens in the single SQL statement.
I think it's that common issue of reducing a "program" into a so gel statement that causes the issues I run into. At times, I feel that I am writing what basically amounts to a stored procedure in a single statement. That's when I start getting mad at SQL. Though, it's really my own choices that made it a problem.
(Granted, that implies both columns have the same name iirc)
SQL is a weird hack of relational algebra with a huge variation of syntax between different providers. Also, it's pretty inflexible if you need to change schema, you need to solve crazy transactional problems if you write your own server, not mentioning the problems arising from clustering.
Try modeling trees in SQL - if your data is mostly static, you will be saved by nested sets or their fractal successor, however most SQL servers mislead you in thinking their tree-functions are anywhere as performing. If you think you don't need trees, imagine modeling categories or an org chart in SQL, and making sure decisions at one level propagate correctly to all levels underneath and can be overriden only with correct permissions, and this must be a part of each query.
But yes, writing a noSQL server should be way easier than a SQL one. And yes, you have a point about trees, as do people complaining about queries results having no structure. Thigs could be better, but schema enforcement and data security in face of massive parallelism are quite worth it.
I'd venture to say NoSQL has gained it's popularity because it's Magic(TM), and it works well with equally magical frameworks. For frameworks that don't have generators like ActiveRecord, NoSQL replaces the need to boringly write your models and database logic. It's like every Getting Started document on the Internet is having a race to the bottom, and people are falsely equating a quick start with a good framework. I have never picked a technology to work with because I could have a blog written in 10 minutes.
No, it's because to a first approximation, they aren't magic. Writing your queries in a more imperative style means that most developers can more easily reason about them, doubly so once they get more complicated.
You don't have to rely on generators or magic frameworks to generate SQL for you with a good document database. All your queries can quite reasonably be right there, and things like "get the document with the ID of foo" are already just one method call away.
Of course, most NoSQL databases have tradeoffs that are not obvious, while I believe all SQL databases stick to ACID. That's probably more of the problem than what query language is used.
Years ago, when I took a database class in college, I hated it. Several years later, I began working at a consulting firm on the support team. My primarily role was defect/enhancement tracking and resolution in the product we had developed for them as well as their satellite apps.
The first time I really got my hands dirty with SQL was pulling out a bunch of VBA and Access "SQL" they were using to generate reports and to rewrite it into real SQL so it could be run on the backend, which cut times down from hours to milliseconds. It was okay (I'd hate those queries now) but it's where I really picked it up.
Then the same client started calling me to write (or edit) queries to pull out some information from their database I don't think anyone had anticipated they'd ever need. That's when I started to really appreciate how beautiful and expressive SQL can be. Stop thinking imperatively - SQL will let you do it but don't. No cursors unless you absolutely have to do. You probably don't. Dynamic SQL has its place, but try to stay away from that too.
There are a few queries I still remember with some fondness. One was to determine the number of days particular classes of employees worked on particular classes of projects. Unfortunately, the way their data was modeled made this not-at-all obvious - and the date information was stored in the "start" and "end" date type fields.
There was a lot more to it, but after pulling my hair out - the customer generally needed these in under half an hour - I suddenly realized I could use a recursive CTE to generate a dataset of all days that I could use to join with, filter out any duplicates, and then use aggregation functions to get what I needed. There were a lot of other details because of the complexity of their model, but it worked and it just seemed so elegant at the end.
I've moved on to another job now and honestly, writing SQL is what I miss most. I would have never pictured myself saying that before I worked there.
SQL is an abstraction with myriad problems in both its core design and its disjointed implementations. ORMs and similar interfaces are crappy abstractions on top of a crappy abstraction. You get the problems of SQL combined with the problems of the ORM.
Preferring to remove a set of problems and be left with only the problems of SQL is not the same thing as liking SQL, it is merely preferring the lesser of two evils.
The biggest annoyance for what I do, is that I don't have a clean way to edit a select list; that is "select foo., ... from (...) foo" works to add* columns, but I can't remove or rename or replace existing columns without listing all of them.
The second biggest annoyance, is the hoops I have to jump thru to join a parent to two (or more) sets of child rows without either multiplying everything or dropping data.
This post is just filled with naive strawmen arguments, I work on a NoSQL database, show me a SQL database that is implemented embedded inside every browser and allows offline operations to sync between masters transparently ...
Trying to define or dismiss 'NoSQL' as a singular movement is not going to work, there are lots of tools, I use different ones for different things, some of them even have a SQL interface.
SQL has been astonishingly successful in recent years and it's a testament to its dominance that the term NoSQL even exists. It's kind of like calling all meats except for beef "NoBeef".
I'm really happy to see some different database technologies getting some attention these days. They each have their own niche. There's no SQL vs redis vs MongoDB vs cassandra vs whatever "winner", just databases with different strengths which can benefit us in different ways.
Databases that predate Codd's work, or long-established ones like dbm and BDB, may appear similar to modern-day NoSQL databases in how they operate, but they surely aren't the same.
Those systems couldn't use relational theory or SQL because they didn't exist yet, or at least didn't reject them outright as one of their main goals.
NoSQL databases, on the other hand, are completely about rejecting relational theory and rejecting SQL. That's at the very core of their philosophy.
True NoSQL databases have been developed as a reaction to several things:
1. A very, very, very small number of situations where relational DB systems cannot easily scale.
2. The far more widespread ignorance of the basics of relational theory, and a lack of willingness to learn about it.
3. The far more widespread ignorance of SQL, and a lack of willingness to learn about it.
4. An urge to be "different" solely for the sake of being different, even if it brings no technological benefit.
5. Unmitigated hype surrounding the term "NoSQL".
Of those, 1) is perhaps the only legitimate reason for using a NoSQL approach today. The number of times this sort of a situation actually arises is remarkably small.
The other four are why those databases have become more widely used, especially within the web development community. As anyone who has dealt with such systems knows, they're rarely about safely and reliably storing and managing data, and they're rarely about doing this efficiently. They're merely a shortcut that some developers use to avoid learning how to use a RDMS.
I guess that databases must be the same for some people.
Funnily enough it's always the PostgreSQL community that wants to create this artificial war. It's like they think everything in terms of one or the other when in fact most systems these days are heterogeneous.
Still, he is quite correct. You still need a Structured Data Query Language.
We don't need a database if we can just load the application state in RAM and save it back via serialization. This works well in surprisingly many cases.
[1] Common Lisp has a particularly nice system for doing it transparently, http://common-lisp.net/project/elephant/, and I believe HN is implemented using a DB-via-serialization strategy as well
I think the NoSQL movement was created for practical reasons. Since SQL didn't scale properly for certain applications, new kinds of databases had to be invented. What often happened is that a software used a typical SQL database with locks, and as that didn't scale, the locks were decreased to a point ACID wasn't theoretically guaranteed anymore. And it worked.
If the database in production doesn't guarantee consistency, you might aswell design a new database which is based on that idea, which is a core reason NoSQL databases were created. There's other reasons, too, of course..
There are a lot of people out there who have a quasi-religious attitude toward NoSQL databases.
It's a cause for them to rally around. In some ways, it gives them something to "fight" for.
Objective consideration about whether it's the best technology to use in a given situation is often disregarded.
There are a very small number of very rare situations where such technology is the best or most feasible approach. Otherwise, it's merely something that a lot of people get involved with in order to intentionally avoid learning how to use SQL and relational databases, or to feel like they "belong" to some greater cause.
SQL the query language is great, the concept of set theory applies really well to data. SQL can be and is used on non-SQL database, impala for example implements SQL to query data burried in hadoop.
NoSQL stores become popular because they scale horizontally (able to use more than one server) naturally. Once the SQL servers can do automatic partitioning, I suspect people will start migrating back.
The truth is that there is no pixie dust, at the end of the day you need to index. I see nosql proponent having the same strugles sql people have with indexing, but right now they have the upper hand because they can spread the work over several servers.
Of course, moving out of this phase becomes a massive headache, since basing your product on essentially unstructured data is a very good definition of "technical debt". And if you're using structured data in your rapidly prototyped object, why not just a RDBMS in the first place? :)
Of course, it's possible to use things like JSON-Schema to validate your data if you choose to.
> And if you're using structured data in your rapidly prototyped object, why not just a RDBMS in the first place?
Because where's the RDBMS with a "natural" query language that is well-suited to complex, dynamic queries and document structures? How do people model a document store, with the ability to point to differently-shaped data, in an RDBMS without giving up all that safety?
Great way to make people close the page immediately after reading the title. (I personally don't need SQL for my small web app => the title's lying => close tab).
Stored procedures can help with some of this, but then you've got to remember to update them every time you need to access data in a slightly different way, and it's hard to figure out when you can get rid of old stored procedures. Tools for managing them aren't up to scratch with tools for managing "real" programs.
Personally, I'm keen on RethinkDB. It's a document-oriented NoSQL database which doesn't lose your data, has built-in clustering, and an extremely strong query language built on chaining method calls. While it doesn't have transactions, the query language is powerful enough that you can easily model most forms of deterministic data transformation within a single query.
NoSQL:
- reading is fast, writing is expensive (if all data are pre-processed/denormalized for reading during the writing phase)
- often schema-less
- low latency (for key-value storage)
- offline batch processing (classical Map Reduce)
- no ACID, choose 2 of 3 in CAP
- demanding on SW engineers to get client-side conflict resolution, tricky in general
- Petabytes of data can be suddenly processed
- huge variation of different paradigms, key-value, document, graph, batch etc.
- haywire indexing
SQL:
- writes are fast (normalization), reads are expensive (JOINs)
- ACID (well, only to some extent, clustering messes up many ACID properties unfortunately and conflicts arise in corner cases)
- set operations and a neat math theory behind them
- stable indexing, easily constructable real-time JOINs
- OLTP
- easier for developers
- non-flexible schema
- tradition, well-known recipes on how to do things
Basically, if you want to have low-latency access, your concurrency model allows eventual consistency, or you have a need to store your data in non-standard structure such as graphs/trees, use NoSQL and pre-process all data to be exactly in the format you require for reading.
If you need 99.999% guarantee of consistent data, amount of data you need to handle is under 50TB, you can put your data into a fixed schema and latency doesn't matter that much, use SQL.
I would recommend you to ask yourself a question - is your app/business read-heavy or write-heavy and decide accordingly.