SQLite is fast, small, portable, easy & simple to maintain and backup, AND reliable. And unless you are running a high traffic site (or application) it could handle everything a small (even medium) business would need.
Why small companies get talked into running MSQL or Oracle or MySQL is beyond me. And even if (and that's a big IF) they needed more "power", there's Postgres.
PS: Sorry for hijacking this thread. I'm a big fan boy of both SQLite and Postgres.
All of these differences are actually assets for the embedded DB market. They could be fixed, but you would end up with a database that was a winner in neither space.
I think the killer problem with SQLite for businesses is that it essentially locks their data inside the application. With a full SQL server, the data is trivially exposed for use / integration with other systems.
1) Very limited ALTER TABLE support. 2) Very limited JOIN support. 3) No real multiuser/multiprocess concurrency support. Limited concurrency in-process with WAL. 4) Poor query optimizers, compared to PostgresSQL and even MySQL. Poor index analysis in complex queries.
4 is really a big one. It's surprisingly easy to hit situations where SQLite is orders of magnitude slower than real databases, fails to make proper use of available indexes to narrow range queries, does terabytes more write traffic than was necessary, etc. And unlike MySQL/PostgresSQL, the query planner inspection tools are horrible, too.
On top of that, some SQLite features (R-tree, slightly less bad index analysis, ...) must be enabled and aren't compiled in by default. This complicates deployment.
Of course, that's my speculation, but what's the downside to going with PostgreSQL/MySQL in the first place unless you never intend on getting bigger?
Sure it has its issues, but the objections people have against it mainly seem to be prejudice from the MyISAM days along with not appreciating how it shines on the workloads it's optimized for.
It sucks at reporting queries, but it shines when you have a large cluster of machines, multi-level replication chains, and mostly do queries that end up being primary key lookups or primary key range lookups with relatively simple constraints. That's the sort of thing you're likely to do for most of your traffic with any database once you scale up.
PostgreSQL also didn't have some of the features that made MySQL really fast until relatively recently, e.g. being able to entirely resolve a query on indexes without ever looking up the actual data rows.
I'd say the biggest problem MySQL has at scale is that replicated changes are applied in a single thread whereas updates on master servers are multi-threaded.
I'm very excited by recent improvements in PostgreSQL, and I wish it were the database I worked with professionally, but don't be so quick to dismiss MySQL.
This is speculation on my part, but I believe its because those companies have huge engineering organizations, and see their engineering talent as a competitive advantage. So their priorities are very different from other large enterprises.
Organizations like Google, etc., are going to have huge architectural diagrams, and then use whatever tools fit most nicely and perform the best as a component of that architecture. And they have the engineering resources to shoehorn it in there, and work around all of the bugs, misfeatures, caveats, and usability problems.
In other words, such companies are never looking for a complete system, because they are the ones building the complete system.
But for organizations where engineering talent is more of a supporting role, even at very large enterprises, the equation changes. Those companies simply can't afford to hire google's engineering team and put it to work in a supporting role. So these organizations are looking for something a little more complete, safe-by-default, extensible, adaptable to their environment, robust, low-maintenance, etc.
I believe it's a big mistake to misjudge what kind of company you are. For instance, blindly following Google's technical choices may be a disaster if engineering is not the central focus of your business.
An interesting selection of companies given that all three are known for their use of home-grown databases (BigTable, Dynamo, Cassandra) for their primary offerings that are not of the SQL variety at all.
Though I think it is still a good question. It may have something to do with the ease of setting up MySQL when you are a young startup trying to get something working as quickly as possible, leaving it often hard to justify a change after you've hit the big leagues.
That is: if you're running on an older kernel, you probably won't see quite as much gain.
To be specific, the change reduced read I/O (http://i.imgur.com/L8NWO.png) and load average (http://i.imgur.com/7793A.png) both by an order of magnitude, and the variance is much tighter than before. (The system is a dual Xeon quad-core X5355/2.66GHz with 32GB RAM and RAID5.)
That improvement was fairly miraculous — a factor of 10x just by upgrading a kernel is not something that happens every day. Still, I would not be surprised if Postgres 9.2 pushes performance even higher.
1. http://kernelnewbies.org/Linux_3.2#head-fbc26b4522e4e990a9ea...
Hey Rosser! Didn't know you hung out here.
http://rhaas.blogspot.se/2011/08/linux-and-glibc-scalability...
FreeBSD probably works as well as Linux >= 3.2 when it comes to llseek, but there might have some other scalability issue which is not present in Linux.
EDIT: Here is a link to the Linux kernel patch, PostgreSQL uses SEEK_END to check the file sizes.
http://www.oracle.com/us/corporate/pricing/technology-price-...
"The number of required licenses shall be determined by multiplying the total number of cores of the processor by a core processor licensing factor specified on the Oracle Processor Core Factor Table"
http://www.oracle.com/us/corporate/contracts/processor-core-...
They give you a big discount for buying Sun servers (.25 factor). Either way, it's a huge amount of money, the standard edition costs a cool $17,500 per processor so with 64 cores and the best .25 multiplier you're still looking at 16 x $17,500 or $280,000 for the DB processor license (that doesn't cover support or anything else). The Enterprise edition runs an astounding $47,500 per CPU, so you can easily run north of a million dollars per server if you're running a lot of cores.
Sure, it's going to be expensive, but only schmucks pay full price for a 64-core license.
Still, it's good to see the best open source database out there delivering cutting edge performance. Great work!
There is: the TPC family[1] and their opensource dopplegangers, the OSDL-DBT family[1].
I don't think they've been applied to non-relational databases as yet.
[1] http://www.tpc.org/information/benchmarks.asp [2] http://sourceforge.net/apps/mediawiki/osdldbt/index.php?titl...
For OLAP work, it seems to be the primary bottleneck.
I was in your situation, where client wanted SQL Server since they already have the license. During development, I use PostgreSQL instead, to "support Postgres as well".
At the end, roughly one-third [1] of the total development effort was spent on overcoming SQL Server's limitations, things that you would never have to think about in PostgreSQL.
So, try telling the client that they already have PostgreSQL license as well, with unlimited future upgrade.
[1] This figure was pulled from ass. The actual productivity loss could be more due to similar reasons outlined in http://news.ycombinator.com/item?id=3784750
This kind of performance optimization isn't new, concurrency is the name of the game. Erlang is a language built around concurrency and it has some databases written in it (couchdb) that scale with more cores due to erlangs inherent capabilities. So has this kind of performance increase been seen before, yes.