[1] https://en.wikipedia.org/wiki/Object-relational_impedance_mi...
I think a more correct analogy would be that table are like classes, columns are the properties, and rows are instances. And so defining foreign keys is like setting a pointer to a parent instance.
There is not direct analogy for methods, but you can use function/trigger to do the same job.
PostgreSQL is actually an object-oriented RDMBS, it's not because you are meant to manipulate these objects through SQL that they are less powerful. And SQL is actually Turing Complete with PostgreSQL.
It's clearly not convenient for general programming, but as soon as data manipulation is involved you benefit from a lot of built-in optimization.
Around this time, NoSQL databases started popping up, and a lot of my colleagues moved on to them since it fit so well with the trend toward denormalizing everything. Some loved them and dove in full-force.
Personally, I kept working with normalized data, used a caching layer to handle the denormalized versions of the data and learned more about scaling with Master / Slave configurations, and honestly felt very much like I was being left behind.
In order to see what the fuss was about, I tried a small personal project with MongoDB (this is pre-redis, I think), and honestly enjoyed the simplicity of it. And then my toy project got a sudden popularity bump from 20 users to 40k in a day and my project just died. I spent two weeks trying to keep up and then kinda got it working, but couldn't keep the site alive for more than a day. And since it was just a toy project, I just gave up completely.
I've only used non-relational stores for portions of projects that explicitly warrant them, since.
Another aspect of "too complex" is sometimes the true data structure of the problem is correct and also too complex. Some programmers bite off small chunks and chew on them, then push back that the entire data structure of the problem should become the small successfully chewed up chunk. A model that encompasses all of the concept of "number" is very complicated, so a programmer starts off writing a small simple integer library and pushes back that the definition of number should become the simple integer... then reality impacts and as the system evolves and demands are made, the concept of "number" needs floats, rats, complex, base conversions, maybe worse, and the original very complicated design is the only successful way to implement the business requirement of "number". Whoops. If you baked into the cake at the start how to handle rats with zero denominators life would be a lot easier and safer than bolting it on later or praying the app level code handles it, for example.
Loved the vampire/garlic analogy!
Fyi... the "SQL vs noSQL" means at least 2 different ideas and some of your replies are highlighting one aspect but not the other.
1) "SQL vs noSQL" can mean "SQL syntax (e.g. joins) vs object syntax to save/retrieve documents (e.g. Javascript JSON/BSON)" . This is probably the main driver of MongoDB adoption. (They don't care about Mongodb's scaling aspect; they just like the easier syntax.) The counterpoint to this idea is that "people are using terrible db engines like MongoDB because they are unwilling to learn SQL syntax."
2) "SQL vs noSQL" can mean "OLTP RDBMS engine (e.g. MySQL/Postgre/Oracle) vs distributed db engine (e.g. Hbase/Cassandra/etc)". The counterpoint to this idea is that "people are deploying to distributed db engines when their use case actually fits in a 1-node RDBMS engine."
From your wording, it looks like you're talking about #1. To that point, many programmers don't like the compexity of 10-way joins of a dozen normalized tables to reconstruct a customer data entry screen. With a document-oriented noSQL db, you just retrieve the entire denormalized "document" with no joins. However, there are tradeoffs to the "easier" noSQL engine such as performance.
Historically databases got used a lot where the short term cost of normalization was low and the long term cost of denormalized data was extremely high, like accounting or bank records or medical records.
The world has some data store applications in completely different environments, where the cost ratios are wildly different and the bandwidth load is extremely high. Like feeding the whole logging output of a webserver cluster into a DB for "data mining" or whatever. So minimum total cost in those weird new applications involves doing the opposite of what usually is remains correct in the older, still profitable applications.
Inevitably resume stuffing being what it is, neophillia, you end up with people trying to pound square pegs into round holes to boost their resume, look cool, gain experience, or for the sheer joy of trying something new. There's also the rush of transgressive behavior, short term thinking, etc. So naturally you get the extreme over-reaction of people arguing that corporate financial records should be nosql or your bank balance or credit record should be based on nosql, etc.
1. Bad experiences with cumbersome ORMs and refusal to learn how databases work: basically fighting with performance or normalization problems and concluding the problem was the concept rather than using it poorly (“JOINs/subselects are hard so I'm doing 10k queries. SQL isn't web scale!”). A related component of this was optimizing for only part of the problem – e.g. someone's writing a web form and they found it appealing to slap arbitrary key:value pairs into a NoSQL store because they hadn't gotten around to writing the validation, reporting, etc. code which actually needed structure.
2. Hype, hype, hype: people would look at papers coming out of large places like Google, Yahoo, etc. with impressive numbers and think they needed the same infrastructure to impress the other cool kids, ignoring the fact that those papers mentioned traffic / data volumes many orders of magnitude higher and the huge companies could hire more engineers to make up for the extra time needed to hit that kind of scale. A lot of that thought was influenced by the pre-SSD era so the assumptions about when you exceed the limits of a single server really don't hold up to serious consideration now, or often even back then.
Another problem is trying to store dynamic schemas in SQL, e.g. data for a CMS with user-defined entities. You either have to expose a version of SQL to the client, which is dangerous and/or a huge hassle (see above), or you have to implement SQL-in-SQL with rows masquerading as columns. Neither is ideal.
For me it was a much different problem though that sold me: SQL has no concept of revision control or conflict resolution, and all update tracking must be done in-band, with manually incrementing versions and timestamps. Doing master-master style synchronization between two SQL databases (e.g. server and client) is a pain. Doing it with e.g. CouchDB was a breeze, because of its git-like revision tracking. Being able to shunt JSON in and out was a huge benefit, as simple arrays and hashes do show up everywhere. Being able to suck in data from production into a developer database using its built-in replication features was great too.
If SQL serves your purposes well today, that's great, but there are plenty of reasons to want to move beyond it. I'd like whatever Post-SQL is to handle nested data types, particularly if it comes with functional-programming-style algebraic closure of the resulting constructs. SQL-in-SQL should just be SQL.
In the meantime, I will just design data-first and acknowledge that how I _pull out_ the data is the main constraint anyway, SQL or not. For arbitrary queries, there's dedicated indexing and searching solutions like ElasticSearch that can actually deliver on doing that efficiently, without lots of careful babysitting.
To speculate, noSQL key-value stores became very popular because they allow you to model imperfectly defined situations, and to update that model quicker than in traditional relational-database type situations.
Consider the difference between writing some classes in Java, or bunging everything into a dict in Python. The Java solution could be more formal and well documented, but a pain in the ass if the model changes significantly. The Python solution is a bit more janky and probably sluggish, but can adapt quicker.
I think there are more people in things like research, self-teaching, iterative game dev, and exploratory startups that value the adaptability than there are in things like old school business consultancy for whom the underlying model might be static and well described.
Many sites and apps found themselves in a situation where they would gladly trade strong consistency, for more performance and eventual consistency.
Further, schema changes for relational databases with billions of rows have to be very carefully orchestrated to avoid downtime. Databases with more flexible schemas allow schema changes more easily (though obviously the code most accommodate the fluid schemas).
Various NoSQL solutions were developed to fill these needs.
Sure, they were overused for a bit (though that fad has basically passed), but they definitely arose to fill a real need.
Of course, the reality of the situation is all that matters, and most database engines at that time did not offer such features. So, I don't necessarily fault people for trying to find solutions that simply worked for their needs.
I don't believe this is the best way to decide on a technology.
"noSQL" should be read as "not-only sql" and I think sql in this acronym should be interpreted as "relational databases". I think the relational model is convenient for a lot of purposes but at some point (maybe ~10 years ago) some companies where trying to use it for everything even when those things didn't actually fit in that model. So, "noSQL" for me actually means "alternatives" for different kinds of data time-series, graph, key-value, documents, etc. with redis, neo4j, graphite, mongodb, etc.
What I would like to see in the future is some ecosystem based on small pluggable modules that you can use to build the embedded store for a micro-services. Instead of using a full-feature database system (relational or not) you assemble your store like Legos with only the stuff you need. Something like this https://github.com/level/levelup/wiki/Modules
everything you work on in school is not like this. you spend a lot of time studying in-memory data structures and doing tiny projects (ranging from 100 lines of code homework all the way up to 5000 lines of code midterm project) that implement in-memory data structures and algorithms to work with them.
so you get some inexperienced developer who only has their schooling to inform them and they're all stuck thinking in terms of arrays, lists, hashmaps, and trees. learning a new data model has a lot of cognitive overhead so the allure of a data persistence tool that gives you almost the same data structures you already know is very strong. "Why would I learn about relational data modeling when everything is just a hashmap anyway?" thus MongoDB as extremely popular with 23 year olds and extremely scorned by 33 year olds. 10 years of job experience drives the point home.
Actually, pre-RDBMS those were the dominant systems. The concepts invented from RDBMS were the new kid on the block that was hot and cool and more powerful that supplanted the original "NoSQL" system and had remained dominant until we hit "Internet scale" and people wanted to 1) scale to huge data volumes requiring distributed databases and processing cheaply and 2) wanted to return to finding alternatives to the dominating SQL platforms.
SQL supports linear ordered or even branching versioning fairly easily (though only recently have many SQL-based DBs had decent tools for temporal versioning), and SQL-based object-relational databases (Postgres and Oracle, for example) have supported both document-oriented views over classical relationally-structured data and document-oriented data storage since well before the NoSQL craze.
I still remember the response of a Firebase team member to my forum question about it - "These days, storage and computing power is cheap - just duplicate the child table as nodes in the main table and do it all that way for every parent/child relationship you have. Don't worry that you have to duplicate the same set of data for every parent that is related to the same child...That's how NoSQL works..." <shudder>
Even though I use ORMs in my project these days, every time I have to test a complex query, I write it in raw SQL first and check it before trying to make the ORM duplicate the same query.
Granted, NoSQL has its place and its advantages, but for me, when it comes to "money code", I will stick to SQL.
The second annoyance I have is the push for schemaless. Schemaless does not exist. There is always a schema, except in a schemaless data store, the schema has been moved to app code and then my earlier comment applies.
One thing I do agree with is that most ORMs are not very good. The best ones I have used are very thin layers over the sql, like jOOQ. I think the lack of understanding of SQL and bad ORM experiences (Hibernate WTF) is what led people to think SQL/RDMBs were the problem when in fact they were not.
This is a great point. It's like types for programming languages. They always exist; it's just a matter of whether you have the compiler managing it or whether you need to keep it in your head when you're hacking.
> I think the lack of understanding of SQL and bad ORM experiences (Hibernate WTF) is what led people to think SQL/RDMBs were the problem when in fact they were not.
Again, like types in programming languages, I think the ability to iterate quickly without having to predict how the data will be used in the future leads to 'schemaless' approaches which can be kicked out the door more quickly (maybe) than ponderous schema laden tables.
I get what you're saying but keep in mind that "schemaless" isn't a philosophical statement about denying ontology[1]. Instead, it's an industry label for avoiding database schema operations. E.g. a rapidly-evolving app doesn't know ahead of time all the rigid columns they need so they use "schemaless" db strategy to avoid "ALTER TABLE ADD COLUMN X" or avoid an export/import to new v2,v3,etc tables. From the relative point-of-view of the db engine, it doesn't see a "schema" when the so-called "columns" are embedded as strings inside of generic fields.
I also think so but the issue has several important aspects which justify the use of the separate term "schemaless". One of them is that schema elements (say, column names) can be stored as normal data. For example, instead of having normal columns like Name, Age, Department, we could introduce a column storing these strings in 3 rows. As a result, DBMS is simply unaware of the schema - the schema exists only in our head (and in the app). As a consequence, DBMS cannot help us too much in managing data, and instead our app becomes responsible for these tasks, hence we get problems you mentioned. But the major problem is that currently there is no technology that allows us to say that this table column stores actually column names which can be used in queries and have to be treated as normal columns.
Fairly or not, he suggested the current fight between noSQL/Schemaless vs SQL/RDBMS was being fought in ignorance of all that went on in the 70s.
The first thing I do for almost any new system is design the data model. Once that satisfies all the requirements, building an application and UI on top of that is usually pretty straightforward.
Just like a table without a schema... because after all, it's still likely objects all the way down.
There is no such thing as working without a "schema", because software simply can't function without data and corresponding metadata to describe it.
It has a name. Schema-on-write (static schema) vs. schema-on-read (dynamic schema).
For complex SQL, it's better to write it by hand. You get to tune it and ensure the indexing is correct (either the query usage, or create/modify your indexes), and you get to see exactly what it's doing.
For even simple conditional joins, there's something nice about knowing exactly what's going on, and knowing it's not going to be doing something that results in a nested join loop (eg: the type of slow query you don't notice until you're using it in production and performance suddenly drops).
I'm a big fan of Dapper for this, so a lot of my data layer code looks like:
public IEnumerable<Order> LoadOrders(int customerId)
{
using (var db = GetConnection())
{
return db.Query<Order>("SELECT * FROM orders WHERE customerId = @customerId", new { customerId });
}
}
The one exception I make to ORM-generated code is INSERT/UPDATE queries. For most cases, the ORM doesn't have much it can screw up, and since it's essentially just mapping code ("Name = @Name, Address1 = @Address1, etc") it's more likely I'll make a typo or copy/paste error than anything else.I've spent too many hours debugging crappy ORM queries, and I find it generally takes twice as long (with quadruple the frustration) to get the ORM to generate the complex SQL you want it to vs just writing the SQL.
Firstly, have to write out the SQL, then convert it to the ORM format, only to find the ORM doesn't support something you are using, so you rewrite the query in a more long winded way, then convert it to the ORM.
It still sort of generates SQL for you, but really you end up writing the SQL yourself except in a type-safe way.
The first one is always demonstrated to be a terrible argument when someone creates a SQL alike for whatever NoSQL we're talking about.
The second one has been a more durable argument. But transactional semantics clearly are independent of query language! And technologies like Lustre and Spanner show that one can still have some measure of traditional transactional semantics in distributed systems (filesystems and databases, respectively). There are many applications where some degree of "eventually-consistent" is a great tool for making them scale, but this is often a function of what can be done in the event that inconsistencies create problems (e.g., a store selling more items of some product than it has available, a case where the store can refund the customer or delay delivery).
I'm extremely skeptical of NoSQLs, as you can tell. I would say that NoSQLs have NoPLACE. (Certainly as to that first argument.)
I'm also extremely skeptical of ORMs. So far as I can tell ORMs only ever add a layer of headaches. They seem to aim for simplifying the DB experience for developers and users, but the moment you step out of the small world of queries made simpler by the ORM... you're in for a world of hurt.
The first complaint derives largely from the SQL language being intended to read link English text. Languages like this show up regularly, seemingly based on the misconception that syntax is what's hard about programming. Most of them fail, as they should. SQL didn't, because it was tied to the exceptionally useful relational database.
The second is best illustrated by showing an alternative approach as an example: Korma for Clojure (http://sqlkorma.com/). It's easy to compose queries from simpler queries, just as we compose functions from simpler functions. Views provide some of this.
The third, if it's solvable would make using relational databases much nicer. Instead of schemas being manually-created diffs full of imperative statements like create and alter, make them declarative and have software generate the diff. If transformations on the data itself are required between versions, require it to include a pure function describing said transformation. By default, require a function to transform it back as well unless it explicitly says what data is to be discarded.
It's not a replacement for SQL, but an entirely different product (and has no doubt come a long way since V1, especially with Firebase Functions out recently). You have to make tradeoffs, like data duplication, for the advantages. From the sound of it, the advantages probably aren't relevant to you, but I'm confident they exist for us.
- We only have to store a comparatively small amount of user data for the lifetime value of each user, so the actual database cost is marginal (i.e. data duplication is not expensive to us, as long as reads are fast). (Development) time is (very) expensive.
- Rapid development and iteration. I can make additions and make fields redundant with ease in response to customer feedback, my own development choices, and changing product needs. There's no API update needed, as reads are client side. I have data model classes which store the reference in the database, some constructors, and getters and setters (About the same as any other data model + API would have). The release goes live and data just starts being manipulated in a new way. Alterations can have simple defaults with || in JS and ?? in Swift (even if it's just an empty string).
- Minimal backend. Our application doesn't require much backend logic, and that which we do have is mostly event driven with Firebase Functions, upon certain database writes. Works out quite cheap too (for now - most usages don't trigger any functions).
- Caching, and offline first. We take the approach that it doesn't matter if there's a short delay in 95% of our data updating - either due to patchy mobile signal or the data taking time to sync down. Firebase has made on device caching and real time updates to that data a breeze. With GeoFire, we've got instant map search (keys are cached by Firebase and contain sufficient information to filter, etc).
- Declarative security rules are very powerful, and can even reference values in the database itself for security control - effectively. We store a permissions tree in the database for our more complex security logic.
As a bonus, additional features like Authentication, Analytics, Push Notifications, etc are all convenient to have bundled up.
As you can see, most of this isn't about the database tech, but the development speed for us. The perceived costs and inefficiencies may exist in the database itself, but there are massive advantages for the application development as a whole.
This was very early days too, and I believe Firebase has improved considerably since then, and indeed I did a little "fun" project in it recently [0] (directly related to HN actually) and enjoyed the process.
I may still revisit it sometime, and see if it will suit another project.
[0] - https://hackernoon.com/tophn-a-fun-side-project-built-with-v...
In today's words, "flowcharts" means "code" and "tables" means "data structures".
I don't care if it is:
- tables in a relational database
- nested structs (records) and lists
- nested dictionaries (hash tables) and lists
- JSON
- XML
- ... whatever
What I do care is that I can see the data structures, and not just the code. Static typing often gets that job done fairly well. If you don't use that, please use at least type annotations. This is the most important part of your documetation, and the compiler ensures it remains correct over time. Most of the time, this (+ the function name) is the only documentation I really need.
Show me your well defined normalized tables and I have all I need.
But for the love of $entity please don't show me tons of business logic in stored procedures. I actually won't work for a company that expects me to build or maintain a product based on stored procs.
There are advantages to that, but pretty big disadvantages too. For maintaining and developing a non-trivial app, SQL, or a particular vendor's SQL variant, is definitely not my preferred platform. I definitely don't find that something easy to understand coming to an existing project that's been developed like that. I guess it _could_ be a matter of taste and experience, if it is yours!
For example too many times I've updated a record with a new column value only to find out that the value I've updated because of some trigger caused the value to be set to null.
I can't think of disadvantages to putting your business logic in the RDBMS. Can you elaborate?
For example: I've seen people rely on a unique index in the code. But if you set a unique index in the DB you know it will be unique always. Even when the code is replaced some day.
Too often, when you have a "wide open" DB and multiple "users" of that DB (think groups within an org) decide to query it (whether just for reading, or worse, updating), the queries can often turn out to be radically different for the same business logic.
Maybe one development group thinks that a query should be done one way, while another thinks that to get the same data it should be done another, and now you have two or more groups disseminating data to other groups in totally different ways.
You may ask why the organization has multiple development groups, but it occurs. I worked for one company I won't name because it isn't important - as a web developer in their marketing department; our group was considered as separate from the IT group, which handled the main "DB" which was based around AS/400 systems and Apache SOLR - we worked together as best as possible, but also bumped heads, in that they wanted us to only work with their "DB" (I don't really consider SOLR to be a DB, though it can and is used like one by a variety of orgs) through their interface (which at times didn't work like they documented it - and many times changed without us knowing about it until our stuff broke mysteriously - usually around the end-of-year holiday push) - whereas we needed to store a lot of the stuff "nearline" in a MySQL DB (we used MySQL, PHP, and AWS for much of our development) to make things more responsive for our end users (ie - the people buying the products the company made off the websites we were creating). In essence, we a bit of a "split personality" going on - but our main boss was only two levels removed from the CEO of the company, so our stuff generally was tolerated - but it is a similar situation.
Where you have multiple dev groups (whether by design or because "ad-hoc" things occur - like Bob in accounting figuring out how to use ODBC with Excel macros to query the database for data for his and other departments), you can have this kind of chaos. When the DB is heavily controlled by IT, with the only "views" of it through tightly controlled business logic which is part of the DB, this kind of difference in data can be controlled.
But it does have many more downsides, which also means it probably shouldn't be used or done in that manner. Instead, the interface should probably be through a single interface (RESTful or similar), with the business logic in the code of that interface, and only the barest needed other logic in the DB to tie it together. Provided that the security on the DB is tightly controlled, and the only way other groups can access the data is via the exposed interface API, you can achieve the same results I think.
In a way, that's how we had the access at that one company - we had a RESTful interface API to the backend SOLR store; we could send a formatted "query object" and get back one or more "records" as a stream of data (we'd then usually take the data, parse through it and store parts of it into our MySQL DB - because the query/response time of that SOLR DB was horrible from a web development perspective; I don't think this was the fault of IT, but rather the fact that the datastore on the backend was vast, holding information about products dating back 60 or more years - I'm sure there was likely some COBOL in the mix somewhere).
That did have the downside of the fact that if we wanted a particular means to query for something that didn't exist in the existing API, we either had to make due with what we could do "locally" (thru code and/or mysql "buffering"), or we had to put in a request for a change to the IT group (which may or may not get accepted, and might take weeks for the turnaround time before we could use it). Furthermore, as I mentioned before, there were more than a few times that we rolled out a particular feature, only to have the website(s) that relied on that feature break because the backend API changed "behind our backs" (and we usually saw this over a holiday period, when our sales would peak of course). In many cases, we couldn't do anything about this (not even storing the SOLR information - it was too vast, plus there was a nagging idea that if we tried that someone's head would roll for not using the implemented interfaces and data that already existed - we could "buffer" or "cache" things, but we couldn't wholesale transfer the data over).
At project onset, usually only redis or memcache are in use.
It's a model that has served me well, apart from bugs that cause divergence between redis and relational. These are usually the class of bugs that would corrupt data were I relying on redis alone, so I just think of them as a lesser evil.
There's an obvious bottleneck on writes to the relational DB but I've so far managed to keep from hitting that ceiling.
I'm really becoming a big fan of CQRS/event sourcing where we have two read models - one NoSql for live data and one relational for BI/Reporting. Events are stored separately in JSON because that's how they come in anyway through the API.
Do you read them back? Does your code expect property "X" to be on a "Player" object? Do you put a check everywhere for when it doesn't? Do you have default value for objects that were created before new property was added?
All that work could've been done by your database (which is probably has some kind of JSON implementation anyways), but somehow it's a "no-brainer" to implement data consistency at app-level. Okay.
That'd be forever extensible and if you made the primary key a combination of playerId and key, you'd offload duplicate handling to the database.
Seems simple to me.
However, data production is still a mix of batch and sql systems.
Later when we needed a bit more control over search, we started using Elasticsearch. The way we do it now is to perform joins and get a nice serialized object with all the possible search/filter/sort fields, and put it in ES. All the data still remains in postgres, though.
Zuckerberg may have had to rely on a typical LAMP stack but FB's newer user-facing features still use SQL. For example, the Timeline feature released in 2012 was built on MySQL/InnoDB: https://www.facebook.com/note.php?note_id=10150468255628920
And last year Facebook released MyRocks, which is a space/write optimized replacement for InnoDB, and is being used for their "user database tier" https://code.facebook.com/posts/190251048047090/myrocks-a-sp...
I guess the things I've read about FB using NoSQL [1] is for other parts of their infrastructure, particularly Messages. It sounds like they had considered using MySQL, even though Cassandra was built for the Inbox feature. They ended up building a new system for Messages [2].
Now that I've spent a little time in that rabbit hole, it looks like one answer to your question is Facebook's Inbox Search, which used MySQL originally to store inbox data (7TB for over 100M users, which seems laughably small with respect to Facebook's scale today):
http://docs.datastax.com/en/articles/cassandra/cassandrathen...
> Before launching the Inbox Search application we had to index 7TB of inbox data for over 100M users, then stored in our MySQL[1] infrastructure, and load it into the Cassandra system. The whole process involved running Map/Reduce[7] jobs against the MySQL data files, indexing them and then storing the reverse-index in Cassandra. The M/R process actually behaves as the client of Cassandra. We exposed some background channels for the M/R process to aggregate the reverse index per user and send over the serialized data over to the Cassandra instance, to avoid the serialization/deserialization overhead. This way the Cassandra instance is only bottlenecked by network bandwidth.
The acknowledgments section makes it more clear that the MySQL data was indeed migrated over to Cassandra:
> Cassandra system has benefited greatly from feedback from many individuals within Facebook. In addition we thank Karthik Ranganathan who indexed all the existing data in MySQL and moved it into Cassandra for our first production deployment.
[1] https://news.ycombinator.com/item?id=7891316
[2] https://www.facebook.com/notes/facebook-engineering/the-unde...
Not only is it fairly fast, but it lets us tie JSON blobs that really don't need schema enforcement to be still tagged with the rest of the data and laid out relationally to still allow complex queries.
1) The pervasive use of artificial keys. USE NATURAL KEYS. Unfortunately probably 99% of real-world databases were designed with artificial keys. I wish I could point to some literature on this topic. It is very rare and I only came to learn about this from a DBA who is well-versed in designing with natural keys. I'm trying to get him to publish more on this topic.
2) ORMs. This is just a bad practice. Their use in part derives from the awful schemas designed with artificial keys, requiring another layer of complexity to get a more intuitive model of the data. Fortunately for me over the past 3 years I've been doing almost all my application I/O with F#'s SQL Type Provider, SqlClient, http://fsprojects.github.io/FSharp.Data.SqlClient/ , which strongly types native query results, functions, and SPROCS. Just does not work if you need to construct dynamic SQL. I've been trying to goad the author into also providing meta data retrieval. That would be the icing on the cake.
3) SQL does not seem to be a required topic for undergrads. There are really no unsolved problems (of note), so it's not interesting to academics.
4) Most app programmers don't get much practice writing difficult queries or tuning problem queries, so that one time every 9 months when you do something hard, it is hard. (And again, often compounded by the complexity introduced by artificial keys.)
While I really enjoy the concept of Natural Keys, I just see so few places where they are applicable. If I was designing a banking Db schema, I could see using an account number as a primary key as long as I am not exposing my Db to anything but internal systems. However, if I'm designing a social networking platform, what can I use as a natural primary key for a user object? I can't use a name because not all names are unique and they can change. I can't use an email because they can change and I feel like Email addresses can get pretty large(more space and slower lookups). I could maybe use a username if I enforce that a username can never change and must be unique. But, we then run into the same issues as emails where a username can be fairly long and thus cause slower lookups. I also don't like the idea of using strings as primary keys either because I would need to take into account implementation details like string encoding (utf8, utf16, utf32, ascii, latin-1) and make sure to encode/decode on every lookup/insert.
So, I can see some use cases where natural primary keys make sense. However, I believe that for most use cases, artificial keys are a better choice. Integers don't take up a lot of space, it's easy to enforce uniqueness, they are a natural sequence, and they rarely, if ever need to change[1].
[1] In fact, I would make the argument that if you're altering primary keys at all, you're doing something wrong.
2) ORM's are good practice, they reduce massive amounts of duplicated code in applications and massively lessen development time which generally dwarfs the cost of execution time. If you don't use an ORM you are pissing money down the drain, most businesses are reluctant to do that. Using an ORM doesn't mean not using SQL when appropriate, it means using the ORM in the 90% of cases where it works best. Complex SQL belongs in views, which are then trivially mapped with the ORM giving you the best of both worlds. If you're doing complex queries with the ORM, the ORM isn't the problem, you are.
3) Should be.
4) Nor should they, programmers are expensive, paying them to do what a library can do better is a waste of money and time. You need a guy or two who knows SQL well enough to tune query and do indexing, that's your db guy.
For anyone who likes SQL and is looking for a research topic, read about temporal databases and suggest some ways to handle DDL changes over time (e.g. adding a new NOT NULL column, or changing a relationship from one-to-many to many-to-many).
Here is your starting bibliography:
Richard Snodgrass, Developing Time-Oriented Database Applications in SQL.
Hugh Darwen & C.J. Date, "An Overview and Analysis of Proposals Based on the TSQL2 Approach".
Krishna Kulkarni & Jan-Eike Michels, "Temporal features in SQL:2011".
Tom Johnston, Bitemporal Data
Magnus Hagander, "A TARDIS for Your ORM": https://www.youtube.com/watch?v=TRgni5q0YM8
I would love to read what you come up with!
I've read Darwen/Date, Snodgrass, and Johnston's (two books) on the subject. Johnston's seems the best practical choice, but they patented the ideas :/ That said, one could probably implement Johnston's model without violating patents if you ignore explicitly modeling the episode structure and just follow the theory.
Date and Darwen make decent points about wanting to use the relational model rather than some baked-in concept, but ultimately they do almost no legwork on practically putting any of their notions to use. This seems to be par for the course on Date books.
Weis and Johnston handle the problem more directly. They also tackle a harder problem overall (BiTemporal, vs just Temporal). Also Johnston is just easier to read than Date. (Side note, I feel like a blog where I just read chapters of Date writings and condense the content basic points would end in most of Date's arguments fitting in 50 pages.
That said, Weis and Johnston still punted on schema evolution. Anchor Modeling sort of starts at supporting an evolving schema and moves outward from there. In Anchor modeling there isn't the traditional Temporal/Bi-temporal notion, but rather positors than have varying degrees of certainty about posits. There is basically a concurrence of facts that can be retracted or changed over time. Useful for modeling varying degrees of certainty or alternate perspectives on data.
The downside of Anchor Modeling is that the datamodel is basically 6th normal form with a bunch of table valued functions, triggers, and views to aid in making it palatable for devs. Johnston sort of acknowledges this style in one of his books but I believe argues against the concept of an entities information being spread throughout various tables. In his mind, a table is a type, that is made of attributes (columns). Anchor modeling is more along the lines of 'Attributes are types', and you can relate them to form larger types.
I wish there was more work on the ORM side supporting some of these concepts (Schema evolution and temporality).
Note I will disqualify anything that has a reasonable chance of changing, like the primary email address of an account, a persons name, a persons day of birth or the 'public id' of a bank account
The biggest pain points of SQL (IMHO) are:
- lack of statically typing guarantees (for example, no guarantee that a certain table has certain column)
- bad capability to abstract over parts of the data model (for example, queries have to specify the table that they query)
Both of these can be resolved with use of good enough functional language. There are projects like that in the FP/Haskell community (e.g. Ermine), but it's fragmented.
> "- lack of statically typing guarantees (for example, no guarantee that a certain table has certain column)"
I would have described SQL as statically typed because columns are given a type when created. The issue you described seems more akin to you wanting auto-complete features for SQL (which do exist in some DB management tools) because while you can specify in code a column that doesn't exist, it will fail to compile on the RDBMS in much the same way that you can still write code that fails to compile in imperative and functional languages.
> "- bad capability to abstract over parts of the data model (for example, queries have to specify the table that they query)"
You don't need to specify the table name for each column if you're only querying one table. The issue arises with multiple tables. I see this a bit like having to specify the namespace in Go packages or class name in Java where you don't want to import every property and method into the global namespace. I'm not sure what the functional solution you envisage would be but there are workarounds in imperative languages such as the `using namespace` declaration in C++ and the `with` block in VB (I can't believe I just referenced VB in a serious discussion!). Generally though I've found the pain of referencing table names in SQL to be somewhat mitigated through aliasing them via their acronyms. eg
select * from people p, borrowed_books bb where p.name = "laumars"
SQL definitely isn't a pretty language though so I am very interested in your thoughts for how a more intuitive, functional-inspired, query language might read.> I would have described SQL as statically typed because columns are given a type when created.
Both statically and dynamically is a misnomer because it's akin to something more in-between - like a dynamic language which can create, say, an empty list and declare it can only contain integers.
Just quickly though - I believe static typing helps productivity, because you don't have wait for some edge case in your program (in this case infrequently invoked query, since SQL is often embedded in other language) to fail. There are other benefits, like Quickcheck.
I think better term (which would explain the distinction between schema creation and query compilation) would be static typechecking of queries against the schema.
> The issue arises with multiple tables
In general, SQL is quite adhoc in what it lets you do, and anything more complicated you need to resolve with code generation (for example running same query over different tables). It's very hard to reuse SQL code. That's ugly, and functional languages offer a better solution - high-order functions. I think what I really want is to deal with schema (and other things, such as result set) as first-class types.
I think the language should be functional, but also total (every function will finish, no recursion allowed), to make it easy for compilers (query engines).
The biggest problem is that SQL has no well defined "compile time". For all practical purposes, it's always an interpreted language. I can't check my code against the production database without running it.
> You don't need to specify the table name for each column if you're only querying one table.
I do think the GP was about doing stuff like that:
SELECT * FROM function_that_returns_table();int will be an int. You can't store string there. What else do you need? Something along the lines of SQL's `check` or Postgres's domains?
> queries have to specify the table that they query
How do you imagine not doing that, something like "pull some things and do stuff with it"? I think we are quite far from that kind of reality.
> certain table has certain column
I don't get it. It either has, or it doesn't. In one case you get a value, in other one — SQL error. If that's so vital, pull up some schema information and check before running an app.
Me personally: sum types. (Some languages call these tagged unions. Rust calls it an enum, but note that enum here does not mean "an integer under the hood")
We have the case where we have an entity who, as part of its primary key, has a value that is either a valid integer or a sort of "Empty" value. It's part of the key, so I can't use a nullable integer to describe this column, as doing so would prevent me guaranteeing uniqueness (the "empty" value is only valid once, unlike a NULL in a unique constraint). I can use (bool, int) column pair and some check constraints, but it leaves the integer exposed to poorly formed queries, such as SUM(the_integer_part). If I can dedicate a "special value" in the integer, I can use just the integer; that's similarly brittle. (SUM — and most other arithmetic — is valid, but ONLY if the column isn't the "Empty" value.) It'd be nice to be able to model the table as something like,
CREATE TABLE (
...
count_or_empty union { int | empty } NOT NULL,
...
PRIMARY KEY (..., count_or_empty, ...)
);
If the DB supported sum types, the type of count_or_empty here could deliberately be a union, which would not be compatible with SUM. You'd need to "unwrap" the union prior to doing such operations on it (check if it's Empty or not) and then do the appropriate thing: trying to blindly SUM on it would be a static error."The relational model is dead, SQL is dead, and I don’t feel so good myself". [PDF] - https://sigmodrecord.org/publications/sigmodRecord/1306/pdfs...
> Both of these can be resolved with used of good enough functional language
Yes, functional aspects is exactly what is missing in RM in general and SQL in particular. Yet, I do not think this can be easily added - the whole paradigm has to be changed.
But I do completely agree with the pain points you pointed there. I'd go further on the first point, client software should have an easy way to check its guarantees at connection time too.
So what other approach to designing a declarative language do you want to take?
I propose total functional language because it's extremely simple and systematic approach. You add your primitive types, primitive functions, primitive type operators (like algebraic types) and voila - you are pretty much set. That way, you get lot of reusability which would be hard to do in an adhoc designed declarative language. And a very uniform syntax as a nice bonus.
My question is not rhetorical. Building declarative languages based on typed lambda calculus is very well understood.
what do you mean by static typing guarantees? tables have a schema already (static type declarations). table schemas are essentially all objects of the same type though (a collection of column types, and the possible values of those types are enumerated). are you referring to a meta-data level of type checking? like "schema of type Foo always contains a column named FOO with type varchar" or something like that?
what do you mean by "queries have to specify the table that they query"? do you mean that there ought to be templated or generic queries? I can't see a way out of specifying which data sets you want to query on though. I'm hoping you can use more detailed language to clarify your intent here.
Its still amazing how far you can go with a single table and few tweaks to a postgres instance.
Did you meant to write it that way round?
Nope. Just start thinking a bit more if I need to continue using RDMS or switch to NoSQL now. I can when the time is right for the project. It varies for projects. While learning phoenix (elixir) I stuck with postgres because the tutorials were easier to follow. While creating a fancy blog engine I switched to Mongo
just look at the nightmare MongoDB has created in most startups a year later.
Personally I use NoSQL options for the "replacement of ad-hoc disk files" they mention on that page. Like many of the comments here, anything more advanced than that and I'm using a relational database.
I'm not sure about the mongoDB nightmare, for me its done everything as the documentation claims it'll do.
What _are_ the "correct" use cases for NoSQL? Everything has always been relational data for me
there are many and the different use cases for it that I know of aren't really connected by a theme or any criterion that can be generalized. this is my own observations only, and not a statement based on theory. the wisdom of experience. take it for what it's worth.
1. application state persistence. persisting the app state in something closer to its native representation (JSON, for example) is convenient sometimes. if all you really want to do is save the state and then reload it later in another session you've got a strong use case for some kind of NoSQL. Note that application state persistence is very different than object state persistence. The app state is specific to a single user at a single point in time. Object state might have multiple users (anyone that can manipulate that object) and multiple points in time (e.g. objects that are persisted for long periods of time).
2. Key:Value store for cacheing. this is the most basic, canonical use case. Memcached, redis, etc.
3. data structures that SQL is poorly suited for. Graphs and Trees come to mind. if you find your SQL schema suddenly becoming bogged down in M2M through tables, you probably have a good use case for a Graph oriented datastore.
4. Time series. arguably this is just a subset of relational data, or a domain specific language extension over traditional SQL, but still, it feels a little different in practice.
Depends on which noSQL. GraphDBs make sense when your application needs efficient graph queries. Eventually-consistent, distributed stores make sense when your application needs very high availability and can tolerate eventual consistency.
NoSQL can also be justified on non-technical grounds as a temporary workaround for poor administrative policy regarding SQL databases, it most likely once the DBA group realizes that you've built a critical non-SQL datastore, the policies will be extended to them as well if the basic problem hasn't been corrected.
Still, it's not automatically a good choice just because the above applies. It's guaranteed only to not be an horrible choice.
The real answer is that nearly nobody has applications where NoSQL is a better choice than SQL databases. And those few that do will know it very well, so they don't need to ask around on internet forums.
When you need a highly available and reliable DB for your application, then you need a cluster approach for data replication. Most popular SQL DBs are single-node, with some application-level clustering solutions, so the only option is to scale vertically. However, a lot of no-SQL DBs like Redis and DynamoDB do clustering/replication closer to the iron.
Specifically a page had a list of comments, and a comment was a date/time and some text.
There was no cross-referencing or nesting, just a basic ordered list.
There is a relationship there, but it's a simple 1:1 relationship.
I've gotten plenty of mileage out of using key-value stores to complement relational databases. They're good at acting as task queues and caches, for example.
ProPublica just published a bunch of data-related jobs and positions. The phrase "Proficiency in SQL is a must" makes an appearance: https://www.propublica.org/atpropublica/item/propublica-is-h...
[0] https://news.ycombinator.com/item?id=8505000 https://news.ycombinator.com/item?id=10585009
I tried doing something regarding browser hits from Akamai data in Pandas and 3 different SQL databases (mysql, postgres, sqlite) and nothing came close to pandas for holding 150m hits (one day's worth across our properties) in memory as well as Pandas. Especially with Dask Dataframes mixed in. No competition for the effort involved.
def getdataframe(sql):
try:
connection = pyodbc.connect("DSN=myDSN", autocommit=True)
return pandas.read_sql(sql, connection)
except Exception as e:
print repr(e)
finally:
connection.close()You may want to check Oracles MODEL clause. Not as powerful as a spreadsheet but you can already do a lot of things in the middle of your SQL.
Either a more modern looking language or some easy tools to analyze and build queries would help a lot in my view. Also refactoring tools that can analyze the impact of table changes in all stored procedures would be nice.
What you are lacking is:
a) knowledge of your chosen database system in sufficient detail to know the trade-offs and pros/cons, and
b) sufficient knowledge of SQL specific to your chosen database system in order to get the most out of it.
This is a bit like knowing C++ in sufficient detail to be able to cope with a complex bit of code that you see, and also knowing your compiler sufficiently well so that you know it has foibles/bugs in certain areas.
What you need is more knowledge, not more tools. Even if I have a hammer and a chainsaw, both are useless to me if I don't know enough about wood to build what I want.
The experience you describe is exactly like any other programming language. If I look at a Haskell program, it will look like gibberish to me, because I don't know the syntax.
Another common ground between regexes and SQL is that you should always have the manual of the tool in question open when you write a regex / SQL query because their syntaxes are all subtly different. For example for regexes:
Perl: /^(?:aa|bb\b)+/
vim: /^\%(aa\|bb\>\)\+/Having said that I agree with you that there is plenty of room for improvement in the SQL-language itself.
The end goal is eliminating impedance mismatch, because your current in-memory state, your RPCs and your big database would be described by the same schema language, somewhat strongly typed but still allowing for evolution. I have no idea if something like that already exists, though.
Sometimes you join an order to supplier, other times it will be to customer and other times it will be to a list of products. Sometimes, all of them need to be included.
With nested structs the parent/child relationship is fixed. If your query needs to invert this relationship, you essentially have to search through your entire database which will be incredibly slow and resource intensive.
Either that, or you just store your data and their relationships separately and then allow for highly optimised searches to be conducted between these relationships according to the query and statistics you hold about the data ... but then you have just reimplemented an RDBMS..
That's actually one my goal with https://github.com/jimktrains/drsdb. All communication would be done via protocol buffers and schemas could be created or loaded at runtime to be verified and used.
The primary goal begin a distributed RDBMS written in rust.
HTSQL has some great examples of where SQL could be better: http://htsql.org/doc/overview.html#why-not-sql. I would love to get that goodness in a language not tied to the rest of HTSQL.
There is space for statistical and search-based (like Prolog) languages, but those are very niche. Proof of that is that they exist, but yet nobody here is talking about them.
1 - Ok, object store query languages are not a strict subset of standard SQL. But it's only a matter of adding one or two commands, like Postgres does.
Everyone should spend a day learning SQL if for no other reason they the ability to think logically about data.
I suppose we all have our favorite tools.
What I want to know is why nobody ever came up with a better query interface. Every abstraction I've ever seen was built on top of SQL.
What don't you like about SQL?
Instead, I've been toying with the idea of a language where queries are expressed as a pipeline of operations on result sets, like this:
FROM employee
/* result set: all employees */
LEFT JOIN department ON employee.DepartmentID = department.DepartmentID
/* result set: all employees + their department */
ORDER BY employee.last
/* result set: all employees + their department, sorted */
WHERE department.name IN ('Sales', 'Engineering')
/* result set: Sales and Engineering employees + their department, sorted */
FIELDS employee.first, employee.last, department.name
/* result set: Sales and Engineering employees' first, last, and department names, sorted */
This is a SELECT, but you can for example turn it into an update just by adding a clause to the end: UPDATE employee.salary = employee.salary * 1.05
This is a much more regular language; there's nothing enforcing you to write these operations in a certain order, and you can add or remove clauses as necessary. For complex queries, I find that I write in this style anyway, using a chain of WITH clauses as a pipeline with a final SELECT on the end to get the results.I've known some very, very prickly DBA's in my time who referred to the databases they looked after as "My database". So they would say things like "Don't put junk in my database"
And would give you endless grief over how you wrote your queries or asked you a million questions about why you needed a new table and why your proposed design was shit.
As a result, many of us devs tended to view SQL Databases as some sort of dark art. So in this regard, NoSQL is freeing at first glance.
But if I'm honest, once I got over my fear, the "pros" of NoSQL solutions in comparison to good ol' SQL seem to be relatively feeble.
I think it's easier to get up and running with a NoSQL solution because there is far less friction when it comes to rapid prototyping of ideas, but things get complex pretty quickly.
I'd also say that for the vast majority of applications out there, the difference between the two will mostly be a wash.
My own personal experience is DBA's/Linux server administrators have authority issues. I had one situation where I had a encrypted file on one of the Linux server and the server administrator requested for months to have the decryption key so he could inspect the file.
SQL, relational theory, and set theory are a great place to start understanding how to work with data. And a great way to start understanding software.
All software deals with data. If you don't have a good understanding of data, you are never going to have a good understanding of software.
One of the best books I've ever read was Applied Mathematics for Database Professionals by Lex de Haan, and Toon Koppelaars. I think that's the database equivalent of SICP. You need to read it and understand it if you want to seriously deal with data. And you want to if you want to write software.
I'm obviously biased because of the way I got into things, but I look at things as a top-down vs a bottom-up point of view. I was a violinist and music theorist before I got into technology, and the bottom-up approach has always resonated with me.
In classical music, pretty much everything bubbles up from a baseline foundation and a structure; the stuff at the top that you actually see is a result of that structure. You don't start with some notes that you want to play on an instrument and then go and try to find a structure that supports those notes. You go bottom-up. You lay the foundation and build on that.
It was easy for me to map that idea of musical theory onto a database early in my career. And I moved up in the stack as I needed to. I started by building things entirely in SQL. You want complex statistical analysis? Sure, I'll do that . . . in SQL. Because I didn't know any better.
Then I found out that there are actually other languages that can do certain types of things much better. R, Python, C#, etc.
11 years later, I'm now very capable in a number of languages, and I don't suck. Along the way, I've had to put a lot of effort into learning the things I would have got from a comp sci degree program, and I'm probably not the best at certain types of software challenges.
I use noSQL stuff for caches and data warehouses, I use some of that for offloading traffic and keeping the reads separate from the writes. But there isn't a project that I touch that doesn't involve SQL in some way.
SQL is incredibly useful every day. Learning it, knowing it, understanding it, is a bare minimum for people I hire.
If you have a comp sci degree, and you don't know SQL, I'm going to probably write you off. If you have a liberal arts degree of some kind, and you do know SQL, I'm probably going to hire. You can learn everything else on the job.
None of that is an excuse for the totally shitty article linked here.
We use SQL today because it's good and it works. Not many languages can say that these days.
Now as a Scala developer, my team has actually decided to use PostgreSQL, and I'm still in the process of trying to convince them to fully embrace SQL instead of just using it as a dumb cache to ease memory pressure on our backend.
(And if you're hiring, I have a math degree and know some SQL. :)
The language is too heavy of an abstraction away from what's really going on under the hood. In a way it suffers from the same issue as functional programming. Not saying functional programming/sql is bad but... it has issues like almost everything.
> Instead wouldn't it be better for the language to explicitly allow the user to apply algorithms or procedures to make things more efficient
That...is hacking.
By explicit I mean BinarySearch(Table, x = name) rather than "SELECT * FROM Table WHERE x = name"
Let me explain to you why "explicit" is better... Why should "SELECT column_name1, column_name2 FROM table" be more efficient than "SELECT * FROM table"? The abstraction is so leaky that in order to make a query better you resort to a language hack that only makes sense when you understand the instructions SQL compiles down into. This is bad. Leaky abstractions are bad. I shouldn't have to know what the SQL query is doing to optimize....
In web development your application servers use languages like go or python that are closer to the metal which allow us to explicitly deploy certain algorithms without this strange layer of SQL expressions that compiles to imperative code. This leads to faster applications that are easier to optimize at the expense of using terse highly abstract expressions such as those found in SQL.
Here's the strange part of web development. Everyone knows that the bottleneck for most websites are in the database. Yet why do we deploy easily optimizable imperative languages in the application server while putting a highly inefficient SQL expression language over the main bottleneck (the database)?
Shouldn't it be the other way around? Shouldn't we have Web application servers written in highly abstract functional languages while Database languages written in easily optimizable imperative code that is closer to the metal?
I think it's important to look at why SQL is still the best tool for the job 43 years later, especially in the current climate of going to production with 6-week old JS frameworks.
In my work with Django's ORM I have run into problem queries as often as I have with doing SQL, and Django's ORM has never let me down.
ORM keeps your thinking in line with your object oriented code, and I find it very easy and natural to use. Who cares if there are a bunch of artificial keys underneath, imo caching queries is a better solution to slow queries than trying to optimize SQL. So using the ORM is never a pain point.
The other advantage is automatic escaping to prevent SQL injection, which is still a top contender on OWASP's list. I never have to worry ahout SQL injection when using the ORM.
Maybe other ORMs are poor solutions, but at least with Django I have been very happy using it.
I find it to be extremely simple to work with. It also does a fantastic job generating efficient queries due to the way it forces you to structure your models. The `F`, `Q`, and aggregation functions are top-notch as well. If none of the ORM features suit you for a particular query, Django ORM allows you to write the raw query yourself in a secure manner. Couple this with the Migration system, and I doubt you'll find a better ORM suite anywhere else.
I find Python has some of the best ORMs out there between Django ORM, SQLAlchemy, Peewee, and Pony.
Here's the gif: http://imgur.com/K5a7U9O.gif
I see a bunch of repeated items in many SQL queries, things that would be functions in another language.
One of my colleagues pointed out that this is indicative of poorly formed queries. What do you all think?
create view red_shirts as
select *
from shirts
where color = 'red'
;
Then you can just say: select * from red_shirts;
This is a simple example. Views are normally much more intricate and useful. Basically any select-statement could be saved as a view.Databases also let you define functions.
It's a nice, functional, declarative language in the vein of prolog and such. You just tell it the shape of the data you want, where to materialize it from, filter, aggregate, calculate the window of, etc... and the system figures out how to execute it as efficiently as it can. It beats out procedurally munging data by a long shot. It's more concise for many operations than ML-like variants.
It's a great tool to have. And understanding the underlying maths, relational algebra, is beautiful. I've found trying to implement your own rick-shod relational database is a good way to try to mechanically understand the theory. Then move on to implementing datalog... etc. The reason why SQL continues to stick around is that the fundamental theories are quite sharp! I'd appreciate a more concise syntax some days but overall I can't say I'm displeased. It's great!
Codd didn't invent SQL. Donald Chamberlin and Raymond Boyce did.
> SQL is originally based on relational algebra and tuple relational calculus
Maybe "originally based on", but not "an implementation of". For example, it is perfectly possible for an SQL query to return duplicate rows, which isn't possible under relational algebra/calculus, a relation being by definition a set of rows (or, more precisely, of tuples).
Wow, that's a subtle, almost unnoticeable promotion of MailChinp. /s
I think LINQ with F#'s type providers would probably be what I have in mind (which works like JOOQ with a tighter integration).
Contrast this with most webcrap. Or most of the NoSQL databases.
I often don't understand why people try to avoid SQL by any costs instead of just learning and applying it properly. I don't understand those "we do SQL for everything" teams either.
RDBMS in conjuction with NoSQL solutions can be a very powerful combination. We do a lot of Postgres + Redis + CouchDB in my projects.
Each service should have a separate data source and being the sole responsable of a specific part of the data.
In this environment however a full RDBMS is a little an overkill.
The solution I am working on is RediSQL: https://github.com/RedBeardLab/rediSQL
It is a Redis module that embed SQLite.
Redis is nowadays a common piece in any infrastructure.
The little module plugs into Redis and expose a new command REDISQL.EXEC that provide the ability to run SQL statement.
It is multithread, does not impact the performance of Redis, and very simple to use.
Great write performance, I got 50k inserts per second on my machine, that should be enough for most microservices.
I would love any kind of feedback on the module or if you need any help to get you started just open an issues.
KSQL and Oracle towers: http://www.bayareapilot.com/IMG_0317%20Large%20Web%20viewnea...
Two of the things I still use to this day, though, are:
* Postgres
* Emacs
SQL is just a language, it specifically stands for Structured Query Langauge. It has nothing to do with the underlying database.
Relational databases all implement SQL because that's what the language was originally created for but it's just an interface. Relational databases can also implement other interfaces like mysql with its X protocol. Other database types like key/value, document, graph, columnar, time-series, RDFs, etc can also implement SQL and many are starting to for easier interoperability, like Cassandra with CQL.
There is definitely potential for a better query language and there are examples like ReQL and GraphQL but SQL is still just fine for most use-cases.
That being the case, people rarely even think about whether SQL is a good language or a bad language, whether it's lacking something etc.
But once you actually try thinking about it, it turns out that it's a pretty well designed language and any alternatives you can think of are usually much inferior to it.
Some of our competitors used to be 'SQL-like', and even they swapped to using full SQL.
I think the fact that SQL is based on solid mathematical principles really helps it stay relevant.
;-)
Personally, I thought REQL was a really interesting take on query languages. As a developer, it allowed you write much cleaner code. You barely need an ORM. REQL kinda sucks for analysts at first, but in the long run, it makes writing complicated queries much, much easier.