Skip to content

Top Best Ask Show New Jobs

Postgres 9.2 will feature linear read scalability up to 64 cores (opens in new tab)

(rhaas.blogspot.com)

262 pointschanks14y ago107 comments

107 comments

50 comments · 8 top-level

vladev14y ago· 18 in thread

To me, Postgres is the most underestimated database. Not sure if this is a bad thing...

crag14y ago

Let me add another database that's "underestimated" (by mainstream corporate America): SQLite3.

SQLite is fast, small, portable, easy & simple to maintain and backup, AND reliable. And unless you are running a high traffic site (or application) it could handle everything a small (even medium) business would need.

Why small companies get talked into running MSQL or Oracle or MySQL is beyond me. And even if (and that's a big IF) they needed more "power", there's Postgres.

PS: Sorry for hijacking this thread. I'm a big fan boy of both SQLite and Postgres.

justinsb14y ago

SQLite is a great embedded database, but is deliberately not designed for replacing a 'full' database server. It doesn't support highly concurrent usage, it can't be run as a network service, it has a comparatively weak type model.

All of these differences are actually assets for the embedded DB market. They could be fixed, but you would end up with a database that was a winner in neither space.

I think the killer problem with SQLite for businesses is that it essentially locks their data inside the application. With a full SQL server, the data is trivially exposed for use / integration with other systems.

gcp14y ago

Reasons not to use SQLite?

1) Very limited ALTER TABLE support. 2) Very limited JOIN support. 3) No real multiuser/multiprocess concurrency support. Limited concurrency in-process with WAL. 4) Poor query optimizers, compared to PostgresSQL and even MySQL. Poor index analysis in complex queries.

4 is really a big one. It's surprisingly easy to hit situations where SQLite is orders of magnitude slower than real databases, fails to make proper use of available indexes to narrow range queries, does terabytes more write traffic than was necessary, etc. And unlike MySQL/PostgresSQL, the query planner inspection tools are horrible, too.

On top of that, some SQLite features (R-tree, slightly less bad index analysis, ...) must be enabled and aren't compiled in by default. This complicates deployment.

jeltz14y ago

SQLite even provides real MVCC (multi-version concurrency control) with decent read/write concurrency if you run it in WAL mode. Ran in WAL mode SQLite is a very competent database.

http://www.sqlite.org/wal.html

ceol14y ago

I'd imagine these small companies dream of being big some day, so why would they invest the time to use SQLite when, should they hit it big, they would have to completely switch databases?

Of course, that's my speculation, but what's the downside to going with PostgreSQL/MySQL in the first place unless you never intend on getting bigger?

Cieplak14y ago

"SQLite usually will work great as the database engine for low to medium traffic websites (which is to say, 99.9% of all websites). The amount of web traffic that SQLite can handle depends, of course, on how heavily the website uses its database. Generally speaking, any site that gets fewer than 100K hits/day should work fine with SQLite. The 100K hits/day figure is a conservative estimate, not a hard upper bound. SQLite has been demonstrated to work with 10 times that amount of traffic."

http://www.sqlite.org/whentouse.html

lobster_johnson14y ago

SQLite is indeed a very good piece of software. It's also a fairly idiosynchratic piece of software: For example, table schemas are essentially untyped, and you can assign any type of value to a column value. The lack of good support for date/time data is problematic. And the complete lack of support for GIS is pretty serious; for many people, PostGIS is the "killer app" for Postgres.

ryandvm14y ago

Agreed. I get that MySQL is incredibly simple to set up, so I can sort of understand why people use it for pet projects or whatever. But what I never understood was why doesn't PostgreSQL see better adoption from the big player (Google, Amazon, Facebook, etc.)?

avar14y ago

MySQL is not just some toy database. It's used by a lot of big installations that need a lot of performance, massive replication etc.

Sure it has its issues, but the objections people have against it mainly seem to be prejudice from the MyISAM days along with not appreciating how it shines on the workloads it's optimized for.

It sucks at reporting queries, but it shines when you have a large cluster of machines, multi-level replication chains, and mostly do queries that end up being primary key lookups or primary key range lookups with relatively simple constraints. That's the sort of thing you're likely to do for most of your traffic with any database once you scale up.

PostgreSQL also didn't have some of the features that made MySQL really fast until relatively recently, e.g. being able to entirely resolve a query on indexes without ever looking up the actual data rows.

I'd say the biggest problem MySQL has at scale is that replicated changes are applied in a single thread whereas updates on master servers are multi-threaded.

I'm very excited by recent improvements in PostgreSQL, and I wish it were the database I worked with professionally, but don't be so quick to dismiss MySQL.

jeffdavis14y ago

> why doesn't PostgreSQL see better adoption from the big player (Google, Amazon, Facebook, etc.)

This is speculation on my part, but I believe its because those companies have huge engineering organizations, and see their engineering talent as a competitive advantage. So their priorities are very different from other large enterprises.

Organizations like Google, etc., are going to have huge architectural diagrams, and then use whatever tools fit most nicely and perform the best as a component of that architecture. And they have the engineering resources to shoehorn it in there, and work around all of the bugs, misfeatures, caveats, and usability problems.

In other words, such companies are never looking for a complete system, because they are the ones building the complete system.

But for organizations where engineering talent is more of a supporting role, even at very large enterprises, the equation changes. Those companies simply can't afford to hire google's engineering team and put it to work in a supporting role. So these organizations are looking for something a little more complete, safe-by-default, extensible, adaptable to their environment, robust, low-maintenance, etc.

I believe it's a big mistake to misjudge what kind of company you are. For instance, blindly following Google's technical choices may be a disaster if engineering is not the central focus of your business.

falcolas14y ago

Speaking only in broad generalities, it's because PostgreSQL was not suitable for large scale businesses until lately. The lack of replication, limited performance on multiple cores, etc. drove businesses to MySQL/Oracle in the past, and inertia keeps them there now.

randomdata14y ago

> But what I never understood was why doesn't PostgreSQL see better adoption from the big player (Google, Amazon, Facebook, etc.)?

An interesting selection of companies given that all three are known for their use of home-grown databases (BigTable, Dynamo, Cassandra) for their primary offerings that are not of the SQL variety at all.

Though I think it is still a good question. It may have something to do with the ease of setting up MySQL when you are a young startup trying to get something working as quickly as possible, leaving it often hard to justify a change after you've hit the big leagues.

fdr14y ago

Postgres was just not popular enough, I think; it's still considerably less popular than MySQL, and familiarity and operations counts for a lot. So I think the number of "marquee" names falls about in line with what one would expect. Facebook is very nearly out of the running because of its PHP lineage, Amazon has very close ties to Oracle, but Google could probably have easily gone either way, but MySQL is popular, so it probably got there first.

notatoad14y ago

MySQL gets used because phpmyadmin gives people an easy stepping stone to get started, and then it keeps getting used just because it's what everybody is familiar with.

mhurron14y ago

PostgreSQL 7 and earlier kind of sucked regarding performance. Noticeably that is, especially compared to MySQL. With PostgreSQL 8 and on, that changed. Unfortunately a lot of those well known projects started before PostgreSQL 8 came a long and so it was sort of inappropriate for the level of performance they required.

rdl14y ago

Heroku?

leftnode14y ago

I just switched from MySQL to Postgres for everything (as a result of seeing how powerful and stable it is at my full time job) and it is simply amazing. Easily one of the most impressive pieces of software built.

stesch14y ago

It won a lot of awards: http://en.wikipedia.org/wiki/PostgreSQL#Awards

rosser14y ago· 10 in thread

As much as I love PostgreSQL (and I do; it's put food on my table for the last decade), I have to stress that this linear-ish scalability needs both pg >= 9.2 and a Linux kernel >= 3.2. It seems to be a combination of the lseek(2) changes in the kernel, and the lock contention/handling changes in the db.

That is: if you're running on an older kernel, you probably won't see quite as much gain.

lobster_johnson14y ago

We upgraded from 2.6 to 3.2 recently and our Postgres has been flying ever since. The scheduler changes (and presumably the lseek changes) make a huge difference in load. We have not done any performance timings, though, so we don't know if the changes translate to better performance.

To be specific, the change reduced read I/O (http://i.imgur.com/L8NWO.png) and load average (http://i.imgur.com/7793A.png) both by an order of magnitude, and the variance is much tighter than before. (The system is a dual Xeon quad-core X5355/2.66GHz with 32GB RAM and RAID5.)

That improvement was fairly miraculous — a factor of 10x just by upgrading a kernel is not something that happens every day. Still, I would not be surprised if Postgres 9.2 pushes performance even higher.

jeltz14y ago

There is another change of note in Linux 3.2 which could matter PostgreSQL, the major changes to how writebacks of dirty pages works.[1] Now in your case the changes where mostly to read perfromance so I do not think the writeback changes mattered here.

1. http://kernelnewbies.org/Linux_3.2#head-fbc26b4522e4e990a9ea...

kingmanaz14y ago

This is going to sound pathetic and off topic, but do you have pointers toward any good materials on database performance benchmarking? Your screenshots display exactly the kinds of presentation I am after, yet I've never managed to progress beyond windows performance monitor and a pile of unorganized spreadsheets. I keep getting more and more DBA work dumped on me and it seems like somewhere I've missed the principles of the art of DBA (if there are any). Other than some Celko and Date, everything I come in contact with has market-speak and corporate buzzwords written all over it.

justauser14y ago

That's a tremendous improvement. Obviously, I don't know the particulars of your situation but are you planning to update to more current hardware? The 5300 Cloverton was 65nm process and 2006 era. I'd be curious about the effect of using the 3.2+ kernel on the new e5-2600 Sandybridge process.

andyzweb14y ago

what distros are offering a 3.2 kernel?

shanemhansen14y ago

At my last company one of the biggest complaints against postgreSQL (when being compared to oracle) was that it "could only scale to 20 cores", which made it unsuitable for "enterprise use". Nice to see that non-issue removed.

Hey Rosser! Didn't know you hung out here.

rosser14y ago

Shane, I think most of the complaints about postgres at said company were made out of a pre-existing bias towards the expensive commercial solution, rather than against the (at the time, admittedly) somewhat less capable FOSS project. When I was asked about vertical scalability, I told them "16-20 cores right now, but at the rate they're improving things, it'll be 32, and then 64 in the next 3-5 years." Nice to see things tracking so closely with my predictions.

jeltz14y ago

The lseek scalability issue was not very noticeable on PostgreSQL 9.1, but first became obvious after work had started on fixing the scalability problems in PostgreSQL itself.

http://rhaas.blogspot.se/2011/08/linux-and-glibc-scalability...

ssmoot14y ago

Does that mean that FreeBSD will become a bit of a second-class-citizen with the release of 9.2? Or are similar optimizations available and planned there?

jeltz14y ago

The llseek scalability issue was caused by the Linux kernel taking a lock on a data structure when no such lock was necessary. While it is possible that FreeBSD has the same problem with llseek, I see no reason to assume that is the case.

FreeBSD probably works as well as Linux >= 3.2 when it comes to llseek, but there might have some other scalability issue which is not present in Linux.

EDIT: Here is a link to the Linux kernel patch, PostgreSQL uses SEEK_END to check the file sizes.

https://lkml.org/lkml/2011/9/15/401

ww52014y ago· 3 in thread

Database vendors typically charge by the number of cores. 64-core can get really expensive with database licenses. The hardware cost has decreased drastically over time but database licensing are still in the dark age. Postgres 9.2 has real competitive advantage here. Hopefully it would force the other vendors' licensing cost down.

jonknee14y ago

Evidence:

http://www.oracle.com/us/corporate/pricing/technology-price-...

"The number of required licenses shall be determined by multiplying the total number of cores of the processor by a core processor licensing factor specified on the Oracle Processor Core Factor Table"

http://www.oracle.com/us/corporate/contracts/processor-core-...

They give you a big discount for buying Sun servers (.25 factor). Either way, it's a huge amount of money, the standard edition costs a cool $17,500 per processor so with 64 cores and the best .25 multiplier you're still looking at 16 x $17,500 or $280,000 for the DB processor license (that doesn't cover support or anything else). The Enterprise edition runs an astounding $47,500 per CPU, so you can easily run north of a million dollars per server if you're running a lot of cores.

etrain14y ago

Oracle tends to be pretty opaque in their pricing, and part of that is because with any sale this big, there's always going to be a lot of negotiation.

Sure, it's going to be expensive, but only schmucks pay full price for a 64-core license.

Still, it's good to see the best open source database out there delivering cutting edge performance. Great work!

Ecio7814y ago

Microsoft SQL Server used to be licensed per socket and not per core (and they used to use it to market their product against Oracle). Guess what? the new 2012 version will be licensed per core too..

gtaylor14y ago· 3 in thread

Yikes, that's a lot of cores. Glad to see the Postgres team keep pushing the scalability envelope.

TylerE14y ago

Is it? I mean, bargain basement budget desktops have more cores than a typical server of 10 years ago. A 24 or 32 core server can't really be considered that exotic these days, can it?

wmf14y ago

On HN anything that EC2 doesn't provide is considered exotic. And cheap servers still only have 16 cores/32 threads.

__alexs14y ago

The number of places that run e.g. 4U $10,000 HP servers with 4 socket, 16 core Opteron servers is reasonably low I think.

robomartin14y ago· 2 in thread

Is there such a thing as a suite of standardized performance tests for large-scale, multi-core databases? How are people comparing Cassandra, PostgreSQL, mySQL and other options against each other for raw performance?

jacques_chester14y ago

> Is there such a thing as a suite of standardized performance tests for large-scale, multi-core databases?

There is: the TPC family[1] and their opensource dopplegangers, the OSDL-DBT family[1].

I don't think they've been applied to non-relational databases as yet.

[1] http://www.tpc.org/information/benchmarks.asp [2] http://sourceforge.net/apps/mediawiki/osdldbt/index.php?titl...

zorked14y ago

There is no such thing as "raw performance". It's all very, very application-specific.

bsg7514y ago· 2 in thread

With this improvement, how much is Postgres hampered by a lack of a parallel query processor?

For OLAP work, it seems to be the primary bottleneck.

einhverfr14y ago

A good project to watch in this regard is Postgres-XC. Maybe not quite ready for production but it's close. (http://sourceforge.net/projects/postgres-xc/)

petrohi14y ago

+1 on that.

ww52014y ago· 2 in thread

That's awesome. I have to admit I have always know Postgres is great and toyed with it but never used it in a real project, due to the availability of MySQL or client preference. I'll try to put it into the current project. Client wants Oracle since they already have Oracle license, but I will change the requirement to support Postgres as well.

j-kidd14y ago

If possible, try not to go down the "support Postgres as well" route.

I was in your situation, where client wanted SQL Server since they already have the license. During development, I use PostgreSQL instead, to "support Postgres as well".

At the end, roughly one-third [1] of the total development effort was spent on overcoming SQL Server's limitations, things that you would never have to think about in PostgreSQL.

So, try telling the client that they already have PostgreSQL license as well, with unlimited future upgrade.

[1] This figure was pulled from ass. The actual productivity loss could be more due to similar reasons outlined in http://news.ycombinator.com/item?id=3784750

xradionut14y ago

Maybe the requirement is Oracle or SQL Server because the client has the resources (DBA, support contracts, etc) to support those platforms in house?

verminoth14y ago· 2 in thread

I'm new to all of this, so does this mean that other databases don't have this kind of performance?

gauravk9214y ago

Performance is subjective, but let's dig into the performance optimization this patch includes. The update addresses an issue where to do a llseek (read) of the database, the linux kernel would lock the read, causing it to have only one output at a time. The patch removes the lock because it was unnecessary and thus the performance scales concurrently without a lock creating contention.

This kind of performance optimization isn't new, concurrency is the name of the game. Erlang is a language built around concurrency and it has some databases written in it (couchdb) that scale with more cores due to erlangs inherent capabilities. So has this kind of performance increase been seen before, yes.

jeltz14y ago

The bulk of the fixes were in the locking in PostgreSQL though, and before those fixes were made the llseek problem did not appear to many since in almost all cases people hit a bottleneck in the PostgreSQL code before hitting the Linux Kernel one.

j / k navigate · click thread line to collapse