I don't mind that; sure, there are other RDBMS servers that support SQL as their main or exclusive language, but unless you are talking about ancient Sybase products (for which there is a very good reason for the shared branding) “SQL Server” is a clear, exclusive Microsoft product identity and not any worse of a label, say, FTP (yes, there are other file transfer protocols).
OTOH,what does bug me is when people say “SQL” to mean “Microsoft SQL Server”.
Also bear in mind that almost nobody pays list price for any of this.
Also while we can do what we like with our newer SaaS offerings, getting some of our on-prem clients to use PG over SQL Server would be an uphill struggle so not considering other options isn't entirely our decision.
That $15K sounds like an Enterprise Edition costing too, IIRC Standard is significantly cheaper. Most don't need Enterprise, especially since 2016Sp1 when a lot of previously Enterprise-only features became available in all editions (including the free Express edition in many cases). Developer edition is currently free (licensed for development use only, of course) and is essentially Enterprise edition with different licensing terms.
* SSMS - SQL Server Management Studio
* The MS BI stack: SSRS (reporting), SSAS (analysis), SSIS (integration)
I believe also updates to index seek or scan in that time.
I'll update this thread when I find out why.
begin;
alter table foos add answer int not null default 42;
alter table foos drop column plumbus;
update foos set name = upper(name);
create table bars (t serial);
drop table dingbats;
rollback; // Or, of course, commit
What's the benefit? Atomic migrations. You can create, alter, drop tables, update data, etc. in a single transaction, and it will either commit complete if all the changes succeed, or roll back everything.This is not possible in MySQL, or almost any other database [1], including Oracle — DDL statements aren't usually transactional. (In MySQL, I believe a DDL statement implicits commits the current transactions without warning, but I could be wrong.)
Beyond that, I'd mention: PostGIS, arrays, functional indexes, and window functions. You may not use these things today, but once you discover them, you're bound to.
[1] https://wiki.postgresql.org/wiki/Transactional_DDL_in_Postgr...
I don't know if it accomplishes anything truly new (other than ideas that aren't very useful in practice like being able to have multiple test runs going in parallel), but it's a pretty neat way to be able to do it and works well.
Lastly, if a test fails you'd typically like to leave the data behind so that you can inspect it. A transactional test that rolls back on failure won't allow that.
Without this built-in feature, I'd have used filesystem snapshots, if I didn't mind the time it'd take to stop and start Pg.
----
1: https://www.postgresql.org/docs/current/manage-ag-templatedb...
Might want to mention the downside of using MySQL as well. (Am also interested to know as a daily MySQL user.)
- JSON column (actually MySQL 5.6 supports it but I doubt if it's as good as Postgres)
- Window functions (available in MySQL 8x only, while this has been available since Postgres 9x)
- Materialized views, views that is physical like a table, can be used to store aggregated, pre-calculated data like sum, count...
- Indexing on function expression
- Better query plan explanation
For indexing on function expressions in particular, the workaround we use is to add a generated column and index that.
MySQL 5.7 fully supports this. See https://dev.mysql.com/doc/refman/5.7/en/create-table-generat... and https://dev.mysql.com/doc/refman/5.7/en/create-table-seconda...
> JSON column (actually MySQL 5.6 supports it but I doubt if it's as good as Postgres)
Actually MySQL 5.6 doesn't support this, but 5.7 does, quite well: https://dev.mysql.com/doc/refman/5.7/en/json.html
Additionally, an ALTER TABLE blocks access to the table. Indexes can be created concurrently while other transactions can still read and write the table.
But MySQL doesn't support indexing the complete JSON value for arbitrary queries. You can only index specific expressions by creating a computed column with that expression and indexing that.
It's an excruciating process though.
Meanwhile, for high-volume OLTP workloads, MySQL (with either InnoDB or MyRocks on the storage engine side) has some compelling advantages... this is one reason why social networks lean towards MySQL for their OLTP product data, and/or have stayed on MySQL despite having the resources to switch.
As with all things in computer science, there are trade-offs and it all depends on your workload :)
I really get that MySQL is good for what it does, from an engineer's point of view. It is an absolute piss-poor excuse for a database, prior to v8.0.
So what's wrong with MySQL (again, prior to v8.0, but no one seems to use the damn current version)
-Not ANSI SQL compliant (unlike Postgres)
-No CTEs/WITH clause (?!)
-no WINDOW FUNCTIONS (?!?!?!?)
-"schemas are called databases" which makes for bizarre interpretation of `information_schema` queries, which behave the same across all other DBs except mySQL. What I mean to say is MySQL calls each schema it's own database. This results in having to connect the same DB multiple times to other programs/APIs/inputs which accept JDBC.
-Worse replication options than postgres, not default ACID compliant,
-Don't know the programming term for this... but the horrendous "select col1, col2, col3... colN, count(<field>) from table group by 1" implicit group by. Meaning the system takes your INVALID query, and does things underneath the hood to return a result. Systems should enforce correct syntax (you must group by all non-aggregation columns... mysql implicitly does this under the hood).
-on a tangentially related note to the prior one, MySQL returns null instead of a divide by zero error when you divide by zero. Divide by zero errors are one of the few things that should ALWAYS RETURN AN ERROR NO MATTER WHAT -mysql doesn't support EXCEPT clauses
-doesn't support FULL OUTER JOIN
-doesn't support generate_series,
-poor JSON support
-very limited, poor array/unnest support
-insert VALUES () (in postgres) not supported
-lack of consistent pipe operator concatenation,
-weird datatype suppport and in-query doesn't support ::cast
-doesn't support `select t1._* , t2.field1, t2.field2 from t1 join t2 on t1.id = t2.id` ; that is, you cannot select * from one table, and only certain fields from the other.
-case dependence in field and table names when not escape quoted (mysql uses backtick, postgres uses double quote for escaping names). What the fuck is this? SQL is a case-insensitive language, then the creators build-in case sensitivity?
-As I mentioned above, mysql uses backticks to escape names. This is abnormal for SQL databases.
-mysql LIKE is case-insensitive (what the hell, it's case-sensitive everywhere else). Postgres has LIKE, and ILIKE (insensitive-like).
-ugly and strange support for INTERVAL syntax (intervals, despite being strings, give a syntax error in mysql. Example: In postgres or redshift etc you would right `select current_timestamp - interval '1 week'. In MySQL, you'd have to do `select current_timestamp - interval 1 week` (the '1 week' could be '7 month' or '2 day'... it's a string, and should be in single quotes. MySQL doesn't do this)
-mysql doesn't even support the normal SQL comment of `--`. It uses a `#` instead. No other database does that.
-probably the worst EXPLAIN/EXPLAIN ANALYZE plans I've ever seen from any database, ever
-this is encapsulated in the prior points but you can't do something simple like `select <fields>, row_number() as rownum from table`. Instead you have to declare variables and increment them in the query
-did I mention it's just straight up not SQL standard compliant?
At least MySQL 8.0 supports window functions and CTEs (seriously it's a death knell to a data analyst not to have these). They are the absolute #1 biggest piece of missing functionality to an analyst in my opinion.
This entire post focused on "mySQL have-nots", rather than "Postgres-haves" so I do think there are actually _even more_ advantages to using Postgres over MySQL. I understand MySQL is very fast for writes, but to my understanding it's not even like Postgres is slow for writes, and on the querying side of the coin, it's a universe of difference.
If you ever use MySQL in the future and there will be a data analyst existing somewhere downstream of you, I implore you to use MySQL v8.0 and nothing older, at any cost, for their sake.
- PLV8/PLPython/C functions/etc (with security!)
- TimescaleDB
- Better JSON query support
- Foreign Data Wrappers
- Better window function support
- A richer extension ecosystem (IMO)
Honestly, at this point I wouldn't use MySQL unless you only care about slightly better performance for very simple queries and simpler multi-master scaling/replication. Even saying that, if you don't need that simple multi-master scaling RIGHT NOW, improvements to the Postgres multi-master scaling story are not too far off on the roadmap, so I would still choose PG in that case.
Many of the largest tech companies rely on MySQL as their primary data store. They would not do so if it was unreliable with persistence.
There are many valid reasons to choose Postgres over MySQL, or vice versa -- they have different strengths and weaknesses. But there are no major differences regarding data reliability today, nor have there been for many years now.
Is it still the case?
Where I work, we chose MySQL back in 2012 due to production quality async replication. I think (but am never sure) that that is now good in Postgres land.
PG has a lot of SQL features I'd love to use and can't. OTOH MySQL's query planner is predictably dumb, which means I can write queries and have good idea about how well (or not) they'll execute.
EDIT: Apparently 11.1 is available in beta as of April 9th.
EDIT: I'll try again. Looks like it was added April 9th
Any complications or hiccups I need to worry about moving from 10 to 11?
Per Heroku Docs: By supporting at least 3 major versions, users are required to upgrade roughly once every three years. However, you can upgrade your database at any point to gain the benefits of the latest version.
How do you manage failover and replication? At my previous job this was done by a consultant. Is this doable on a self hosted setup?
Thank you in advance.
I know of BDR, but there hasn't much news about it lately, especially with more recent versions of Pg.
We like Galera for our simple needs: we use keepalived to do health checks, and if they pass the node participates in the VRRP cluster. If one node goes down/bad, another takes over.
If you're just looking for a hot standby and dont need a multi master setup, you can set those up just with pg. https://www.postgresql.org/docs/9.4/hot-standby.html
Queries done through pgbouncer just pause as if the query is really really slow when the db goes down, then when pglookout does the failover, the bash script switches pgbouncer's config and those pending queries are sent immediately.
For a complex web-app, would you suggest an ORM (looking at SQLAlchemy) or a custom module with hand written queries and custom methods for conversion to python objects?
My app has a lot of complex queries, joins, etc. and the data-model is most likely to change quite a bit as the app nears production. I feel using an ORM is an unnecessary layer of abstraction in the thinking process. I feel comfortable with direct SQL queries, and in some cases, want to directly get JSON results from PGSQL itself.
Would that be a good idea, and more importantly, scalable?
Note : My app will be solely developed by me, not expecting to have a team or even another developer work on it.
For advanced queries, you can write raw SQL
The way I see it, an ORM has three useful features:
- A migration/seed mechanism (you will need it anyway)
- A schema definition for mapping tables to object
- A query builder
If you feel that an ORM is too heavy, you can seek for just the query builder.
* for normal queries (select /cols from table where id etc etc) we just used plain django orm. even for weird joins, django orm makes it a lot easier than using raw sql
when we needed raw speed, we just wrote raw sql and delegated to django sql layer -- that way we leverage everything the framework has with raw sql power.It maps pretty much 1:1 to SQL and for me it beats the alternative (using text interpolation for composing queries).
If you're doing anything more complex than these basic sorts of queries and subqueries, or your developers are proficient in sql, using even a very good ORM like sqlalchemy is going to be a step down.
Since you say you're doing this all yourself, and SQL is probably the most ubiquitous programming language (in terms of percentage of jobs requiring it, not total LOC) so learning opportunities there are more valuable, I would go direct.
Since you're probably used to dealing with and migrating your tables manually, I would keep custom SQL for all your complex operations, and use SQLAlchemy for doing basic insert/update/select. Django also has an "unmanaged" mode where you can create a model and it will avoid trying to create a migration to create the table.
Of course, you have to manually update the model if you manually change your DDL.
Watch out for differences on how you are serializing data from Django/SQLAlchemy models vs. raw dicts from PsychoPG.
I like to organize my SQL by keeping each query in a separate .sql file and writing a little wrapper that fetches the files (+1 for caching it) and then executing it. I'm not a fan of lots of inline SQL mixed with Python.
Overall I think it's a great + powerful setup!
cur.execute(query, {‘foo’: bar})
Passing values directly into cur.execute is the best way to prevent SQL injection as well since it will sanitize the input params upon running
I'm developing a web application that uses SQLAlchemy. The ORM has been a huge boon for CRUD functionality. We also have some very complicated reporting features and use SQLAlchemy's query builder almost exclusively. I find that the query builder maps very cleanly to SQL, so I can still "think" in SQL while writing and reading it. And the query builder makes complex query composition easier to manage.
SQLAlchemy provides more than just the ORM... I actually wish the docs were structured differently to better emphasize that in search results, etc.
Hard to say, but don't forget about migration support, which is quite helpful.
With Aurora, the storage layer is swapped out entirely for a distributed storage engine, that I believe is based upon DynamoDB.
The wire protocol and server interface are much the same as regular Postgres, though there are some additional benefits as well as caveats as you might expect
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide...
This has really put us off using other AWS managed products and was a major factor in us deciding against using Amazon Elasticsearch Service.
I know of BDR, earlier versions of which are open source, but there hasn't been much movement with Pg 10 or 11 AFAICT.
We don't do anything complicated, but simply want two DBs (with perhaps a quorum system) that has a vIP that will fail-over in case one system goes down (scheduled or otherwise).
Galera provides this in a not-too-complicated fashion.
> This is the Git repo of the Docker "Official Image" for postgres (not to be confused with any official postgres image provided by postgres upstream)
Supporting five versions is no more than MS do: currently SQL Server versions 2017, 2016sp2, 2016sp1, 2014sp3, 2014sp2, 2012sp4, 2008R2sp2 and 2008sp3. 2008sp3, 2008R2sp2, and 2016sp1 will hit their final EOL in a couple of months taking SQL Servers's supported list back down to 5 too.
I expect other significant DB maintainers have similar support life-time requirements for much the same reasons, though I'll leave researching who does[n't] as an exercise for the reader.
Similar with PG I assume. You could always pay someone an expensive contracting fee to support your use of an older version than is publicly supported.
Note the recent versioning change: 9.4, 9.5, 9.6 were the previous 3 major versions bases, and the last two are 10 and 11.
2) It's horrifically high risk because downgrading is usually not a thing
3) It usually requires downtime.
1. Test the upgrade: set up an additional secondary (9.3), break the replication link (promote it to a master). Test the upgrade on that. It was really fast, under 30 seconds to shut down the old DB, run the in-place upgrade, and start up the new DB.
2a. In production: set up an additional secondary (9.3). Make the primary read-only. Promote the new secondary to a master. Shut down, upgrade to 11.2, restart. Point applications at it.
2b. Backout plan: leave the applications pointing at the original database server, make it read-write.
There are other options, including with only seconds of downtime, but <1 minute with pg_upgrade was simple and very acceptable for us.
Consider the situation when you're adding thousands of new records per seconds, and the database is being used every second (quite literally: to compute per seconds statistics).
A better solution is to have triggers on the old master, to do the same inserts on the new master (after copying the data/promoting a replica/whatever), and have similar triggers on the new master when the IP is not the old master (to be able to backout to the old server)
Then both the new and the old master run "in parallel", with the same data, and you can have the apps use the new server (on a new domain name, new ip, new port, whatever) when you want - on a app by app basis if you want. You can keep both until you decide to decommission the old master.
Because it mirrors and supports the reality of the business world.
Every large or small organization that manages their business, every year make 'Grow/Invest', 'Maintain', 'Disinvest' decision for each of the product/service lines.
Does not matter if is software, or making kielbasa. Postgres is exceptional, and is supporting the first 2.
There's legacy crap everywhere, all langs,db,versions etc. Supported sometimes for 10+ years.
Security updates should push the upgrade path a little harder, but there are still cases where a database can be completely isolated from the network and that might not even matter.