How does a relational database work? (opens in new tab)

(coding-geek.com)

377 pointsmgachka10y ago60 comments

60 comments

38 comments · 18 top-level

mgrennan10y ago· 5 in thread

Not that Cassandra and Hadoop don't have a place. But because NO-SQL is hot I see lots of young coders (I'm and old DBA) try to turn document store systems into relational databases. They should all be made to read this post.

mgachkaOP10y ago

(I'm the author of the article) I'm 28 and I’m currently a Big Data developer (I use Hadoop, HBase, Hive …) and I don’t understand the buzz surrounding Big Data and NoSQL.

With a relational database the complexity is hidden (more or less…) whereas with Big Data and NoSQL the developer needs to deal with this complexity himself/herself. As a result, most of the Big Data applications I’ve seen don’t work well.

A really like Big Data because it’s more complex but to be honest, most of the time my work does not required the “Big Data scale”.

jchrisa10y ago

At Couchbase we did a survey of developers (this was ages ago) and the biggest motivator for NoSQL was schema flexibility. Not having to coordinate migrations is seen as a productivity boost. [1]

The other thing document databases can offer that relational databases struggle with is taking subsets (which we use for offline sync.) [2]

[1] http://blog.couchbase.com/nosql-adoption-survey-surprises

[2] http://developer.couchbase.com/mobile/

2 more replies

duaneb10y ago

The buzz around NoSQL is you don't have to worry about scaling the database. There are many, many more options now for e.g. multi-master, sharding, no-downtime copy-on-write migrations, etc., but just the idea of being able to run a tiny subset of queries or writes without having to worry about running out of resource capacity is a HUGE plus.

1 more reply

kodablah10y ago

"With a relational database the complexity is hidden"

That is my main issue. I use Cassandra over relational firstly for its linear scalability and multi-master-esque HA. But even ignoring those, I understand exactly what is being scanned and what is not, I don't have to fight with an optimizer at runtime based on several parameters.

3 more replies

threeseed10y ago

Not really sure what you are talking about.

Teradata, Oracle, PostgreSQL for example are reasonably complex databases to cluster and manage yourself. Just as easy/hard as setting up HDFS and installing Hive. In all cases people who are at big data scale are buying OTS solutions e.g. Cloudera appliance. They aren't rolling their own.

And if you are using Hive then I can understand why you are not feeling the buzz. But play around for Spark for a while and it's easy to see the future. Being able to write Scala/Python/SQL/R against a data set that can be anywhere from 100MB to 100PB without any changes is pretty compelling.

1 more reply

faragon10y ago· 3 in thread

Be careful with theoretical asymptotic complexity (big O) related to execution time. E.g. if your algorithm time complexity is O(1), but internally calls a higher complexity function, e.g. malloc(), implemented with higher complexity, e.g. O(log n), your algorithm time complexity would be O(log n) and not O(1). It could be even worse: on average or typical constant time algorithm could be in reality an O(n) one: e.g. case of hash table reindexation (that's the reason of why many big data structures, including most SQL databases, requiring real time behavior, are implemented as trees, tree hierarchies/division/clustering, instead of big hash tables).

duaneb10y ago

I don't think that the `n` in the case of malloc would always be relevant to the semantics of the query. In that case, it would still be appropriate to refer to it as constant time.

For instance, you don't typically look at the size of the literals in the query when evaluating query complexity. If it's really unbounded, you probably shouldn't use a relational database.

faragon10y ago

Not necessarily, but could be the case. Typical malloc implementations have different management for at least small and big memory requests, using different pools, in order to reduce fragmentation. Also, because operations involving virtual address remap are expensive (realloc on a small block is faster with a full memcopy to adifferent location, rather than doing stuff involving the OS kernel doing virtual address remap).

The "problem" of malloc() function (or any other equivalent allocation stuff) is that internally manages free blocks (it is a middleman between the OS and user process -the purpose is to reduce OS calls-), if you have lots of them, dynamic memory could take time. For "malloc" I meant malloc/realloc/free, the whole kit. Those operations are not free (in most cases you're not going to have millions of allocations in one process, that was just an example of hidden things that could make your algorithm not behave like expected).

mgachkaOP10y ago

I you're right. In fact in the optimizer part I say (in a simple way) that big O (i.e. asymptotic complexity) is not the same as CPU cost but it's easier for me because the real cost of an operation depends on the CPU architecture.

Someone told me the same on the article comments and here is the answer I gave him:

You’re right and I agree with you. When I wrote this part, I REALLY hesitated to give the real asymptotic definition and what it means for the number of operations but I chose a simpler explanation since the aim of this post is not to become an expert but to have a good idea. I hope that this won’t mislead people but I thought the real definition was too hard for a “newcomer” and not important to understand a database. This is also why I added in this part “The time complexity doesn’t give the exact number of operations but a good idea.” and said at the end of the part “I didn’t give you the real definition of the big O notation but just the idea” with a link to the real definition.

emehrkay10y ago· 3 in thread

I'm I crazy for wanting to write a database after reading this? Noting too serious, just to flex that dev muscle

Daishiman10y ago

Modern databases are probably some of the most sophisticated software in existence. That being said, you can pick a minimal subset of functionality and roll with that.

ddorian4310y ago

Try this:

Use lmdb for db library.

Use redis for the protocol.

Use twitter.gizzard for replication+sharding.

Boom! Your own webscale nosql!

collyw10y ago

All NoSQL is webscale, that's why it was invented after all....

why-el10y ago· 2 in thread

Good write up. Another excellent resource straight out of the UC Berkeley Database Group that I keep close by is "Architecture of a Database System"[1] by three researchers in the field. It is very readable.

[1] http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf

mikethedyke10y ago

how do you find these articles?

why-el10y ago

Honestly for this one I do not remember since it was two years ago, but I'd start over at http://arxiv.org/, which is a treasure. Granted most are research papers, but even those will link to overview papers, of which the one I link to is an example.

otis_inf10y ago· 2 in thread

> When it comes to relational databases, I can’t help thinking that something is missing. They’re used everywhere. There are many different databases: from the small and useful SQLite to the powerful Teradata. But, there are only a few articles that explain how a database works.

That's because the inner workings are really old, as in: emerged before blogging etc. was popular, hell before the internet was invented.

In the 'before/early internet days', we read books like 'An introduction to Database Systems' by C.J. Date. (I had to blow the dust off my copy to read the exact title ;)), which are more in depth than this article, but I like the article better, because it's more to the point and easier to understand. Well done!

jeffdavis10y ago

"An Introduction to Database Systems" is more about the relational model, whereas this article is more about the implementations.

otis_inf10y ago

Only chapter 5 and 6, the rest isn't. Are we talking about the same book?

0xCMP10y ago· 2 in thread

Wow, I wanted this recently. Anyone know stuff related to graph databases and how those work?

vkat10y ago

Take a look at Cayley - https://github.com/google/cayley

balquhidder10y ago

Is there a good way of actually inserting data using it?

Cakez0r10y ago· 1 in thread

> Nowadays, many developers don’t care about time complexity … and they’re right!

That's a pretty bold statement...

Very thorough explanations though!

michaelmior10y ago

I think the important thing is knowing when not to care. Unfortunately, a lot of developers don't care because time complexity isn't even on their radar. So in the times when it does matter, they get burned.

njharman10y ago· 1 in thread

I believe that to be the best technical document I've ever read. Surely biased as I learned so much.

mgachkaOP10y ago

Hi, I'm glad to read this comment.

If you liked this article, maybe you'll like my article on Shazam. I used the same pattern: I start from the basics of sound processing and computer science and finish with an in-depth explanation of Shazam.

0xCMP10y ago· 1 in thread

Anyone know of any rust based databases being worked on? Relational or otherwise...

jamii10y ago

The database part of http://witheve.com/ is written in Rust. It's not very technically interesting yet (eg no query optimiser) but the basics all work.

jandrewrogers10y ago

Good overview of traditional OLTP architectures. As complicated as they look from the article, it is just scratching the surface of a sophisticated implementation. There are many internals common to more advanced designs that are not even mentioned, and the article is already quite long!

The thing I love most about database engines is that there is probably more hardcore computer science per line of code than any other software system of similar scope. It is a very rich ecosystem for an algorithms and data structures geek.

brudgers10y ago

Because I am interested in databases, I found the Se-radio's 2013 interview with Michael Stonebreaker [1] interesting, particularly in regard to traditional database design and more recent ideas:

http://www.se-radio.net/2013/12/episode-199-michael-stonebra...

[1]: http://www.theregister.co.uk/2015/03/25/mike_stonebraker_win...

n0us10y ago

Actually this is the best post I have ever seen on this website.

jlees10y ago

This is also a pretty accessible quick intro to complexity and data structures, nicely done. Definitely the sort of thing I would include as further reading in a beginner course -- some beginners love to understand "why" and this post answers pretty much all the "why" possible.

codezero10y ago

I decided to spend some time digging into SQLite. I highly recommend the overviews of their architecture and the details about each part of the puzzle.

It's really understandable, very straight forward, even if a lot of it refers to SQLite v2, it still seems very relevant.

http://www.sqlite.org/arch.html

aikah10y ago

Great Article , I wish a book was written where a simple database with a query language was implemented from start to finish , even a nosql one, I always wanted to implement my own.

buckbova10y ago

I've read sql server internals cover to cover and in many respects this is a much better read. Thank you.

mgrennan10y ago

Good read. How long do your keep your transactions logs and how often do you make backups?

beenpoor10y ago

Great article!

j / k navigate · click thread line to collapse

60 comments

38 comments · 18 top-level

mgrennan10y ago· 5 in thread

mgachkaOP10y ago

(I'm the author of the article) I'm 28 and I’m currently a Big Data developer (I use Hadoop, HBase, Hive …) and I don’t understand the buzz surrounding Big Data and NoSQL.

A really like Big Data because it’s more complex but to be honest, most of the time my work does not required the “Big Data scale”.

jchrisa10y ago

At Couchbase we did a survey of developers (this was ages ago) and the biggest motivator for NoSQL was schema flexibility. Not having to coordinate migrations is seen as a productivity boost. [1]

The other thing document databases can offer that relational databases struggle with is taking subsets (which we use for offline sync.) [2]

[1] http://blog.couchbase.com/nosql-adoption-survey-surprises

[2] http://developer.couchbase.com/mobile/

2 more replies

duaneb10y ago

1 more reply

kodablah10y ago

"With a relational database the complexity is hidden"

3 more replies

threeseed10y ago

Not really sure what you are talking about.

1 more reply

faragon10y ago· 3 in thread

duaneb10y ago

I don't think that the `n` in the case of malloc would always be relevant to the semantics of the query. In that case, it would still be appropriate to refer to it as constant time.

For instance, you don't typically look at the size of the literals in the query when evaluating query complexity. If it's really unbounded, you probably shouldn't use a relational database.

faragon10y ago

mgachkaOP10y ago

Someone told me the same on the article comments and here is the answer I gave him:

emehrkay10y ago· 3 in thread

I'm I crazy for wanting to write a database after reading this? Noting too serious, just to flex that dev muscle

Daishiman10y ago

Modern databases are probably some of the most sophisticated software in existence. That being said, you can pick a minimal subset of functionality and roll with that.

ddorian4310y ago

Try this:

Use lmdb for db library.

Use redis for the protocol.

Use twitter.gizzard for replication+sharding.

Boom! Your own webscale nosql!

collyw10y ago

All NoSQL is webscale, that's why it was invented after all....

why-el10y ago· 2 in thread

[1] http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf

mikethedyke10y ago

how do you find these articles?

why-el10y ago

otis_inf10y ago· 2 in thread

That's because the inner workings are really old, as in: emerged before blogging etc. was popular, hell before the internet was invented.

jeffdavis10y ago

"An Introduction to Database Systems" is more about the relational model, whereas this article is more about the implementations.

otis_inf10y ago

Only chapter 5 and 6, the rest isn't. Are we talking about the same book?

0xCMP10y ago· 2 in thread

Wow, I wanted this recently. Anyone know stuff related to graph databases and how those work?

vkat10y ago

Take a look at Cayley - https://github.com/google/cayley

balquhidder10y ago

Is there a good way of actually inserting data using it?

Cakez0r10y ago· 1 in thread

> Nowadays, many developers don’t care about time complexity … and they’re right!

That's a pretty bold statement...

Very thorough explanations though!

michaelmior10y ago

njharman10y ago· 1 in thread

I believe that to be the best technical document I've ever read. Surely biased as I learned so much.

mgachkaOP10y ago

Hi, I'm glad to read this comment.

0xCMP10y ago· 1 in thread

Anyone know of any rust based databases being worked on? Relational or otherwise...

jamii10y ago

The database part of http://witheve.com/ is written in Rust. It's not very technically interesting yet (eg no query optimiser) but the basics all work.

jandrewrogers10y ago

brudgers10y ago

Because I am interested in databases, I found the Se-radio's 2013 interview with Michael Stonebreaker [1] interesting, particularly in regard to traditional database design and more recent ideas:

http://www.se-radio.net/2013/12/episode-199-michael-stonebra...

[1]: http://www.theregister.co.uk/2015/03/25/mike_stonebraker_win...

n0us10y ago

Actually this is the best post I have ever seen on this website.

jlees10y ago

codezero10y ago

I decided to spend some time digging into SQLite. I highly recommend the overviews of their architecture and the details about each part of the puzzle.

It's really understandable, very straight forward, even if a lot of it refers to SQLite v2, it still seems very relevant.

http://www.sqlite.org/arch.html

aikah10y ago

Great Article , I wish a book was written where a simple database with a query language was implemented from start to finish , even a nosql one, I always wanted to implement my own.

buckbova10y ago

I've read sql server internals cover to cover and in many respects this is a much better read. Thank you.

mgrennan10y ago

Good read. How long do your keep your transactions logs and how often do you make backups?

beenpoor10y ago

Great article!

j / k navigate · click thread line to collapse