To make the responses as constructive as possible, please also specify:
* Whether you currently use postgres, and what other systems you use (presumably ones that do have the feature that is missing in postgres)
* Whether you currently need the feature, or whether you anticipate the need in the future, or whether you expect that other people will need it
* Whether the missing feature is preventing you from using postgres, or just a significant pain point (which could include ease-of-use issues)
I'm using postgresql right now.
My use case: I could have a table like:
create table items (id uuid primary key, attributes json);
I'd have a webapp using javascript for adding/editing new items. If the user wants to add a new item, the client code would generate the uuid.Editing and adding new rows would both use the same UPSERT/MERGE code -- no need for separate insert or update statements -- if the uuid already exists, it's assumed to be an update, if it doesn't, it's inserted as a new row.
While we're at it, here's a "find or create" in a single atomic statement: https://gist.github.com/paul/75ec84d131e36492b17b
create table people (uuid varchar, name varchar);
with new_rows as (select (unnest(ARRAY[('xxx','John Smith'), ('yyy', 'Ben Bitdiddle')]::people[])).*), updated as (UPDATE people SET name=new_rows.name FROM new_rows WHERE people.uuid=new_rows.uuid RETURNING people.uuid) insert into people select new_rows.* from new_rows left join updated using (uuid) where updated.uuid is null; UPDATE Table SET ... WHERE id=$1
if (rows_updated == 0) {
INSERT INTO Table ... WHERE NOT EXISTS (SELECT 1 FROM Table WHERE id=$1)
if (rows_inserted == 0) {
throw "Conflict!"
}
}
It's a bit of a hack, but works well enough. I agree would be nice to have a more transparent way of doing it.Another feature I would wish to see is ORDER BY + skip for update (or any equivalent solution) to easily implement queues with multiple consumers.
EDIT: Just ORDER BY (without skipping locked entries) for UPDATE would be excellent too to avoid deadlocks when doing multi row updates.
A minor feature I'd like to see is to allow ORDER BY on UPDATE statements to avoid deadlocks. If you have two multi-row UPDATEs that hit the same rows but in different order, you've got a classic deadlock setup. Apparently you can sometimes control UPDATE ordering by persuading Postgres to use certain query plans if you really know what you're doing, but that seems like an undesirable requirement for most users.
To answer your bullet points, Postgres is my #1 database choice, but I've also used Oracle and MySQL many times and sometimes still do. Neither of these features are things that are preventing me from using Postgres, but I've sure wished they were there. In particular bitmap indexes seem useful to many people.
[1] http://www.postgresql.org/message-id/20090111104442.GA3503@e... [2] http://www.postgresql.org/message-id/20081101000154.GO27872@...
That would be reasonable to do even if you don't know the internals very well. The index types (called "index access methods") are well-isolated (API-wise) from most of the rest of the code. Technically, you could even add a new index access method as an extension without even touching the core.
The exception to both of those things is the WAL recovery code, which can't be written as an extension and also requires more knowledge of the rest of the system.
But the WAL recovery code is not too bad for those that know it. So if you did the rest, then I wouldn't be surprised if someone jumped in to write it or help you write it.
"A minor feature I'd like to see is to allow ORDER BY on UPDATE statements to avoid deadlocks."
One solution here is to do a "SELECT ... FOR UPDATE" first. Unfortunately, that is redundant (and does redundant work).
I'm curious about what the performance ramifications of that would look like. I recently had to disable bitmap scans in a data warehouse-type-thing I run because filesystem fragmentation was apparently misleading Postgres into thinking it would be cheaper than an index scan. The bitmap scan plan takes ~15 minutes to run cold, and the index scan takes ~20 seconds cold. (Really odd thing is that index scan takes ~1.5 seconds warm, versus ~0.4 seconds warm with the bitmap scan).
One major stopping point for us is the fact that on SQL Server we use a Case Insensitive, Accent Insensitive collation. Postgres doesn't support this, and apparently has no plans to.
We have many stored procedures in TSQL which are built around the assumption of CI,AI. Converting these will be a lot of work, but the lack of CI,AI collation would be by far the biggest pain point.
citext: http://www.postgresql.org/docs/9.2/static/citext.html
unaccent: http://www.postgresql.org/docs/9.2/static/unaccent.html
I think you can make it work.
The main feature which I'd like the PG development team to look at next is implementing a more comprehensive partitioning solution.
As far as remaining competitive with the commercial database vendors, the one thing that is the primary reason I have seen for enterprises choosing to reluctantly pay for Oracle over using Postgres for free is RAC (Real Application Clusters). Active/active clustering of DB nodes seems to hit a sweet spot for businesses wanting uptime and scalability without some of the downsides of physical replication.
This is being developed as Postgres-XC. They released version 1.0 in June 2012.
"Postgres-XC is an open source project to provide a write-scalable, synchronous multi-master, transparent PostgreSQL cluster solution. "
And it's fast.
Oracle-like partitioning syntax support (including composite) would be great - although possible through inheritance and triggers/rules, it is the only bigger feature missing in PostgreSQL in my opinion. The current form requires a lot of overhead but is definitely worth the effort when working with 1B+ rows.
What I really miss in postgres is row level security like it's implemented in oracle.
To implement RLS in postgres I need to either install veil and go through the pain of configuring it or create a lot of views and work on them (but some scenarios can not be implemented like that).
json_merge('{"a":5,"b":7}', '{"b":6,"c":8}') -> '{"a":5,"b":6,"c":8}'
Some of the things I miss most frequently are:
- Easy table partitioning.
- Support for "phrase search" in fts.
- MERGE/UPSERT support.
- Columnar storage (and related optimizations) for "OLAP like" workloads.
- EXPLAIN ANALYSE that traces functions.
* we currently use postgresql, oracle, mysql, teradata, mssql
There's active development on this feature, so now's the time to influence it to better suit your needs. A separate blog post might be a better forum.
A well supported, polished multi-master implementation would definitely bring PostgreSQL into the 21st century. It's been far too long not to have this. And look to databases like Cassandra for ease of setup and maintenance.
We are doing some pretty obscure stuff to it and it's serving us well. Oracle on the other hand requires so much attention and special handling with it's type system, that if there were an alternative, I wouldn't even bother with it.
http://www.postgresql.org/docs/devel/static/functions-json.h...
Previously these remained locked into the database and to get programmatically-useable info (for logging, better error messages or to translate in terms of e.g. your ORM) you had to parse error messages which were potentially localized and usually lacked half the necessary information.
And even better, for python developers, psycopg2 already supports these fields[1][2]
[0] http://www.postgresql.org/docs/devel/static/libpq-exec.html#...
[1] http://psycopg.lighthouseapp.com/projects/62710/tickets/149
[2] http://initd.org/psycopg/docs/extensions.html#psycopg2.exten...
edit: the underlying connector might be a better idea to start with, as it'll need to expose the fields somehow before an ORM can take advantage of them.
Interesting point. Foreign key failures are usually the result of a program bug. Unique and exclusion[1] constraints depend on concurrent actions, and can't be caught ahead of time as easily. So I assume you are mostly talking about CHECK constraints.
Also, we're talking about a single statement, which probably means a single table.
It does sound like a good idea to validate all of the check constraints on a table at once, and report all of the failures.
[1] http://www.postgresql.org/docs/current/static/ddl-constraint...
Speaking of books a few of the best I'm aware of are:
- PostgreSQL 9 Admin Cookbook (Simon Riggs, Hannu Krosing)
- PostgreSQL 9.0 High Performance (Gregory Smith)
- Instant PostgreSQL Backup and Restore How-to (Shaun M. Thomas)
all them from Packt Publishing
[1] http://www.postgresql.org/docs/devel/static/
[2] http://wiki.postgresql.org/wiki/Main_Page
I'm lucky enough to work at a company that tends to use Pg as our default RDBMS; speaking strictly as a dev it's a heck of a lot more ergonomic and sophisticated than some of the things I've dealt with.
> This is a minimal implementation, but should still be useful in many cases. Currently data is only populated "on demand" by the CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW statements.
> It is expected that future releases will add incremental updates with various timings, and that a more refined concept of defining what is "fresh" data will be developed. At some point it may even be possible to have queries use a materialized in place of references to underlying tables, but that requires the other above-mentioned features to be working first.
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;...
Algorithms for updates of materialized views exist since at least 20 years but maybe they don't cover all these cases efficiently.
How does that work from a computer science point of view?
UPDATE: Found this talk about it from PGCon last year: http://lanyrd.com/2012/pgcon/schdzf/
[1] http://www.postgresql.org/docs/devel/static/release-9-3.html
[2] http://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9...
I wonder if it would be possible to plug a python/node interpreter into that.
Sure, why not?
This is a great feature because it is a significant extensibility improvement. And that's what postgres is really about: the idea that any user/developer with an idea has many powerful APIs that can help them achieve that without waiting for anyone else to agree that the idea is good.
Looks like it just takes a C function pointer, if you can call whatever from C you're done (you can embed Python, no idea for Node).
Looks like this is mostly for postgres extension modules though (e.g. foreign data wrappers), it's not for applications (postgres clients).
PostgreSQL was always an outsider which I had ignored, but in my recent position PostgreSQL is the main database engine for our whole stack. It took my about a month to get used to the differences in the syntax and PostgreSQL related stuff and I'm amazed about the features and quality of it, especially when you remember that this is a free and open source database engine.
On the other hand, if PostgreSQL devs would start a Kickstarter project for PGSQL specific version of Microsoft SQL Server Management Studio, my both hands (and feet) would pay. I had tried Database Visualiser (paid version), PGAdmin and few others, but none of those as is even close to the capabilities of Management Studio.
Does anybody maybe know any alternative or have any tips to share?
Will pg have less time for YC and HN since he'll be checking so many database connections now?