Citus Unforks from PostgreSQL, Goes Open Source (opens in new tab)

(citusdata.com)

763 pointsjamesheroku10y ago153 comments

153 comments

93 comments · 28 top-level

exhilaration10y ago· 8 in thread

AGPL license if anyone's curious: https://github.com/citusdata/citus/blob/master/LICENSE

gtrubetskoy10y ago

Which means there is no chance this would ever become part of PostgreSQL proper.

rch10y ago

That's correct, and with such a significant license change I think the term 'unfork' is being used inappropriately in the title.

Edit: the PostGIS extension is GPL, and that license choice has been very successful. Hopefully the AGPL works out at least as well for Citus, I'm just not familiar enough to know what the implications will be in this context.

1 more reply

oconnore10y ago

AGPL actually makes a lot more sense for a database (except for SQLite, which is typically redistributed).

davidfetter10y ago

Citus owns the copyrights, and the CLA ensures that they will continue to. That means that they can reassign it under a different license if they so choose, and they could at some point choose a liberal license.

1 more reply

polskibus10y ago

Does that mean that whatever connects to this database needs to be AGPL too?

crudbug10y ago

No .. it means any changes you make to CitusDB should be made public.

[0] https://en.wikipedia.org/wiki/Affero_General_Public_License

kodablah10y ago

No, it means if you make changes to the code and use it over a network you have to be AGPL too (think "network" as "distribution" in the GPL sense).

2 more replies

rkrzr10y ago

GNU AGPL to be precise (GNU Affero General Public License).

no1youknowz10y ago· 7 in thread

This is awesome. I have experience with running a CitusDB cluster and it pretty much solved a lot of the scaling problems I was having at the time. For it to go open source now, is of huge benefit to the future projects I have.

> With the release of newly open sourced Citus v5.0, pg_shard's codebase has been merged into Citus...

This is fantastic, sounds like the setup process is much simpler.

I wonder if they have introduced the Active/Active Master solution they were working on? I know before, there is 1 Master and multiple Worker nodes. The solution before was to have a passive backup of the Master.

If say, they released the Active/Active Master later on this year. That's huge. I can pretty much think of my DB solution as done at this point.

ozgune10y ago

(Ozgun from Citus Data)

We're working on making Citus masterless. In all openness, we evaluated two different approaches to this in the past six months, and wrapped up the design for one. This design works well on the cloud, and we already demonstrated a working version: https://youtu.be/_nun2S6EdWo?t=411

For on-premise deployments, the primary challenge is set-up complexity. We're now prototyping one of those designs to know more: https://github.com/citusdata/citus/issues/389

We expect to share all the details and a concrete timeline in April.

Florin_Andrei10y ago

Would it be possible (eventually) to use Citus for sharding within the datacenter, and BDR for master/master replication between datacenters?

Or is Citus taking over the master/master replication? (or is it doing something different?)

1 more reply

cm310y ago

What does "works well on the cloud" mean specifically? Is there some difference when run on your own hardware?

1 more reply

batbomb10y ago

Can Citus handle geospatial sharding?

1 more reply

enesunal10y ago

good work! congrats =)

also in Turkish: kolaylıklar dilerim :)

no1youknowz10y ago

Excellent news! Really looking forward to this one! :)

heme10y ago

Can you elaborate on the scaling problems you were having?

gtrubetskoy10y ago· 6 in thread

If anyone from Citus is reading this: how does this affect your business model? I remember when I asked at Strata conf a couple of years ago why isn't your stuff Open Source, the answer then was "because revenue". So what changed since then?

umur10y ago

Umur from Citus here. Adding to Craig's comments:

Several things have changed over the last two years that allowed us to make this happen: Most importantly, we've continued building out the product for a more broad user base, grown with more customers and users, received further funding as validation, and expanded both our team and product to offer additional revenue generating services. All put together, open sourcing Citus is something we've always wanted to do, and we are excited to continue building on it for many years to come, with the help of the community and our enterprise customers.

craigkerstiens10y ago

Hi, Craig from Citus here. We have some premium features in our Enterprise edition. Many of these are features that larger enterprises will want to pay for such as security features around roles, a tool for automated cluster resizing, and enhanced load balancing tools, and of course support.

Beyond that, we have a few other things in the works for the future that will cover other revenue models.

atonse10y ago

My hunch is that the two are not really related.

Companies of any appreciable size will be happy to pay for support if they choose to make Citus a part of their critical infrastructure. And the industry reached an inflection point where there are enough companies want as much of their infrastructure to be open source as possible, that you can run a company where most of your stuff is open source, while still making a ton of money (like RedHat, CoreOS, Docker, etc)

SwellJoe10y ago

I know Red Hat is making a ton of money. But, CoreOS and Docker, are they at the "making a ton of money" stage, or merely well-funded by investors?

1 more reply

ascetone10y ago

It doesn't.

AGPL means the only people using it have to licensed the same.

jgreen1010y ago

Well, people use MongoDB...

AGPL only requires open sourcing any modifications you make to the software when you give users direct access to a server running the software, which seems like something you would never want to do in case of a database.

You can use database servers running AGPL software in a closed source SaaS: http://www.gnu.org/licenses/why-affero-gpl.en.html

1 more reply

onRoadAgain2310y ago· 6 in thread

Being burned before,I will never use an OS infrastructure project that has enterprise features you need to pay for. They always try to move you to paid and make the OSS version unpleasant to use over time as soon as the bean counters take over to milk you

"For customers with large production deployments, we also offer an enterprise edition that comes with additional functionality"

sergiosgc10y ago

This is the model used by many companies backing OSS. The fact that you have been burned before means the actor in that case (or cases) acted badly, not that the model is wrong.

Software isn't free to produce, and the need to make money off software isn't something companies should be ashamed of. In fact, nowadays I'm leaning towards trusting OSS with clear financial sustainability over software whose long term existence seems shaky.

gdulli10y ago

Do you often make big decisions based on extrapolation from so few data points?

I use a major open source system with enterprise features and support but don't pay for any of those options. I've used it for 3 years and it's been invaluable. No pressure to start paying for anything. Some of its premium features have actually become free over that period. But I wouldn't decide that all open source systems with premium features are safe based on that experience.

onRoadAgain2310y ago

If this costed roughly a million dollars, then yes. Especially if you're locked in like with a DB. I use nginx because even though it has this mode it would be easy to replace with something else.

xenophonf10y ago

I definitely view "open core" products with greater skepticism than truly open source ones, but I think it comes down to the community surrounding (and engendered by) the sponsoring company/foundation. These Citus guys seem to be really enthusiastic about contributing their work to the community. That attitude mitigates any concerns I have, because to me it seems that they are really a part of the larger PostgreSQL community---not just trying to take advantage of it like certain companies whose names we won't mention.

gtaylor10y ago

What are some alternatives for paying their employees to develop these products that are 100% free? There are some excellent commercial organizations that drive some of the tech that many or even most of us have came to rely on. This goes all the way down to Linux and BSD itself.

jamespo10y ago

Feel free to code your own

TY10y ago· 5 in thread

This is awesome! Tebrikler (congrats) on the release of 5.0 and going OS, definitely great news.

Can you publish competitive positioning of Citus vs Actian Matrix (nee ParAccel) and Vertica? I'd love to compare them side by side - even if it's just from your point of view :-)

peatmoss10y ago

Second the request for comparison to Vertica. I've recentishly become a user at work, and I wonder how this compares. A quick googling didn't yield anything too informative.

flavor810y ago

...and Redshift. I love what Amazon provide, but it gets expensive.

umur10y ago

(Part 3/3 - please see two comments below as the starting point) Aside from the different use-cases they address, there is one other, important difference between Citus and Redshift (and any other distributed database in the world, for that matter). Citus does not fork the underlying database, PostgreSQL. Instead, Citus extends PostgreSQL to transform it into a parallel processing, distributed database. We use PostgreSQL's powerful extension APIs to accomplish this (you simply CREATE EXTENSION Citus on PostgreSQL's latest version, 9.5, to get your distributed PostgreSQL database).

While this might appear as an implementation piece at first, it has important product implications, and might even impact how you might want to think about your database stack. By not forking the core database, you are choosing to always stay with the core PostgreSQL product. For starters, you get the uber-cool (and uber-fast) JSONB type that came with 9.4, or the recently checked in UPSERTs, or the popular PostGIS extension for geospatial capabilities. More philosophically, the moment you use forks of database, you know you'll be diverging over time. And when you introduce new databases and/or piece together many different ones to build one application, your development cycles will only get costlier and more complex over time.

This was a long answer to a short question, but hopefully useful. Let me know if you have questions, or any feedback using Citus – would love to hear your thoughts!

1 more reply

umur10y ago

(Part 2) Now comes the storage engine, and cstore_fdw as it relates to PostgreSQL. Built by the Citus Data team, cstore_fdw is entirely a separate component from the Citus product above. It enables columnar storage for your vanilla, single-node PostgreSQL to provide data compression for faster analytics. As such, cstore_fdw does not come with any of the parallelism I've described above that Citus (or Redshift, Vertica etc.) provides.

Precisely because cstore_fdw is built for PostgreSQL, and Citus is PostgreSQL (see Part 3), however, you can still choose to use cstore_fdw as the storage engine for your Citus cluster. Citus will still parallelize the queries as you'd expect it to, but instead of hitting row- based tables, they will hit columnar ones. cstore_fdw has certain limitations, importantly it is not updatable; so we don't consider it as an alternative to a data warehouse. Rather, it is useful if you are archiving your quickly growing timeseries / event data on PostgreSQL or Citus.

1 more reply

umur10y ago

Umur from Citus here. For purposes of this question, I’ll bucket traditional data warehousing (DWH) solutions like Redshift, Vertica, Greenplum together, although there are many nuances among each of them of course.

First, Citus is not a traditional data warehouse. We position Citus as the real-time, scalable database that serves your application under a mix of high- concurrency short requests and ad-hoc SQL analytics (i.e. think both random and sequential scans for a customer-facing analytics app). The default storage engine for Citus is the PostgreSQL storage engine, which is row-based. This is in contrast to many data warehouses, which often use a column store and/or batch data loads, and are focused purely on analytics. The trade-offs you get are: - Citus vs. DWH performance: DWH and Citus both have a similar parallelization for analytics queries (multi-core, multi-machine), but most data warehouses typically use a columnar storage engine instead of a row-based one. Columnar storage is designed for faster analytics queries, so that makes columnar DWH generally faster on longer running analytics queries. However, this comes at the expense of (1) concurrency and (2) short-request performance (think simple lookups, updates, real-time data ingest) vs. Citus' row-based storage. If you've tried having 10s of concurrent connections to Redshift for short lookups, or performing 100s/1000s of inserts/updates to power your application, these limitations will be familiar. This is to be expected, as Redshift is not designed as a real-time operational database, but an offline data warehouse.

In essence, the two classes of products are more complimentary than substitutes, even while they have some overlaps in their analytic capabilities. Something like Redshift will give you fast offline analytics, after you move your data in batch (via S3); Citus will directly power your analytic apps in real-time; without ETL'ing your event/user data back and forth between separate OLTP and OLAP databases. Both can be extremely fast: Redshift can run complex data warehousing queries that take an hour in a few minutes, Citus can scan and aggregate 100 million records in a few seconds, while simultaneously ingesting your events in real-time.

I hope that provides some clarification on the workloads. There is a lot more, including columnar storage and product approach (re: implications of extending Postgres 9.5 vs. forking Postgres 8.x), and I’ll dive into those in separate comments as well.

1 more reply

rkrzr10y ago· 5 in thread

This is fantastic news! Postgres does not have a terribly strong High Availability story so far and of course it also does not scale out vertically. I have looked at CitusDB in the past, but was always put off by its closed-source nature. Opening it up seems like a great move for them and for all Postgres users. I can imagine that a very active open-source community will develop around it.

hblanks10y ago

We've been running CitusDB for a couple years now at CloudFlare for serving aggregated analytics to customers (cf. https://blog.cloudflare.com/scaling-out-postgresql-for-cloud...).

It's a good product, and it was even fairly easy to do a major version upgrade / cluster relocation. At least as easy as such a thing can be. :-)

rkrzr10y ago

Are there any limitations you have run into? E.g. can you still use all index types that Postgres offers or are there any special distributed index types that CitusDB adds perhaps?

2 more replies

tlarkworthy10y ago

nit: Postgres doesn't scale horizontally, it only scales vertically.

andrewflnr10y ago

I guess if you're putting more servers in a rack, it does tend to be a vertically-oriented process. :)

manigandham10y ago

It doesnt scale vertically either. Postgres is single-threaded meaning it can't make use of multiple CPU cores on the same machine. There have been some slow improvements to this and 9.6 seems to hint at some parallel aggregation changes but overall Postgres is strong in features but weak in scaling (in any direction).

1 more reply

devit10y ago· 4 in thread

I've been unable to find any clear description of the capabilities of Citus and competing solutions (postgres-x2 seems the other leader).

Which of these are supported:

1. Full PostgreSQL SQL language

2. All isolation levels including Serializable (in the sense that they actually provide the same guarantees as normal PostgreSQL)

3. Never losing any committed data on sub-majority failures (i.e. synchronous replication)

4. Ability to automatically distribute the data (i.e. sharding)

5. Ability to replicate the data instead or in addition to sharding

6. Transactionally-correct read scalability

7. Transactionally-correct write scalability where possible (i.e. multi-master replication)

8. Automatic configuration only requiring to specify some sort of "cluster identifier" the node belongs to

ozgune10y ago

(Ozgun from Citus Data)

On PostgreSQL language support, we're updating our FAQ to have more information: https://www.citusdata.com/frequently-asked-questions Since the PostgreSQL manual (and its feature set) spans over 4K+ pages, we found that the best way to think about Citus' capabilities is from a use-case standpoint. If your workload needs distributed transactions that span across machines, or large ETL jobs, Citus currently isn't the best fit.

Citus supports sharding and replication out of the box (#4, #5). On #6, reads go through a master node (metadata server) and you see what you write.

We don't have #7. The way in which we implement this also has implications on your other questions. Multi-master (no single metadata server) is by far the biggest feature request that we receive: https://news.ycombinator.com/item?id=11353866

If we go with the approach in https://github.com/citusdata/citus/issues/389, you will be able to configure #3, #6, #7 through PostgreSQL's streaming replication settings. We still won't support distributed transactions that span across multiple machines.

On #8, could you elaborate a bit more? Do you mean a logical identifier for the node?

Also, it's hard to write a concise reply on a topic that requires so much context. I'd love to grab coffee with anyone who's interested in diving deep into distributed databases. Feel free to shoot me an email at ozgun@citusdata.com

gorodetsky10y ago

Thanks for awesome product!

Do you know when you're planning to release Citrus 5.0 deb/rpm packages?

2 more replies

devit10y ago

Does this mean that distributed transactions are not supported at all?

cowardlydragon10y ago

But if they answer those questions, you won't buy support/use it...

Have a donut and look at our marketing spreadsheets.

I'm so tired of "seamless" "effortless" "simple" distributed database lies. There's mathematical theorems as to why there is no free lunch.

faizshah10y ago· 4 in thread

So this sounds similar to Pivotal's Greenplum which is also open source, can anyone compare the two?

frn10y ago

Greenplum is based on postgres 8.2, with the featureset you'd expect from pg 8.2 - basically none of the additions after 2006 have merged to GP.

faizshah10y ago

Ok, and what's the process like for disaster recovery with citus?

1 more reply

ioltas10y ago

Since the move to open source, more recent upstream changes have been slowly merged in the code base, though they seem to be still on a 8.3 base, still a couple of years worth of code to go through.

amitlan10y ago

Greenplum is a fork of Postgres codebase, Citus is not; it's an extension that leverages community Postgres's extensibilty APIs. This point seems to be highlighted in their post.

azinman210y ago· 4 in thread

I want it to be called citrus, which is what I always read it as....

BinaryIdiot10y ago

Yikes I keep calling it citrus. The word citrus is so ingrain I'm actually having a hard time pronouncing citus when I see the word. I didn't even notice I was doing it wrong until I saw your comment.

jrochkind110y ago

Then I'd get confused and think it had something to do with Citrix.

azinman210y ago

A cute lemon logo might help :)

mrgreenfur10y ago

Only after reading this did I realize it's NOT citrus. Thanks!

X86BSD10y ago· 4 in thread

AGPL? This is dead in the water :( It will never be integrated into PG. What a shame. It should have been a 2 clause BSDL. Sigh.

tspiteri10y ago

The BSDL does not make much economic sense to the company open sourcing their code; a new competitor would fork the code, make closed improvements, and merge any changes from the open source code. That means that the competitor is always gaining by a one-way flow of improvements.

To use open source code, the more permissive the license the better. But to actually open your own code, BSDL is a very tough sell.

That's also why they use the AGPL. With database systems, even if they were under the GPL, some competitors could just modify the system and run it on their own server with improvements, and offer just the service to their clients. Again, the improvements go one way only: since the competitor would not distribute the modified system, as it's running on their servers, they would not need to distribute source changes. With the AGPL, that loophole is closed.

gtrubetskoy10y ago

>> The BSDL does not make much economic sense to the company open sourcing their code

If this were true then Cloudera, Horton and a whole bunch of other companies would be out of business, yet in reality they are doing really well. All that AGPL is doing for Citus is:

1. Turning away people (customers) who are religious about licenses.

2. Eliminating any possibility of this code ever being integrated into PostgreSQL

1 more reply

X86BSD10y ago

So you take a BSDL codebase, fork it, close it, make proprietary changes, profiting from the BSDL codebase, then slap the PG community in the face by open sourcing it under a more restrictive license hoping to benefit from the community you just slapped in the face but restricting competition.

They are of course free to release their code under any license they wish. I just think releasing code under the *GPL when you profited from a liberal BSDL is a douche nozzle thing to do. But knock yourself out! This tells me all I need to know about the company.

4 more replies

andreasklinger10y ago

might be intentional

satygeek10y ago· 4 in thread

Does CitusDb fit in olap analytical workloads to do aggregations on hundreds millions of records using varying order and size of dimensions (eg druid) in max of 3 seconds response time using as few boxes as possible - Or there are other techniques have to be used along with Citusdb? Can you shed a light on your experience with CloudFlare in terms of cluster size and queries perf?

mslot10y ago

Yes, Citus may be a good fit, for a complete example see:

https://www.citusdata.com/blog/15-marco-slot/402-interactive...

greggyb10y ago

Hundreds of millions of records in <=3 seconds is not really a big challenge with a good data model and proper indexing on even a single server.

I work for a BI consultancy and we don't even bat an eye until we hit billions of records in a primary fact table.

Certainly the DB server does need to scale vertically to some extent as you pass through the orders of magnitude > 10M. A good columnstore engine is also worthwhile to consider.

lfittl10y ago

I'll leave the initial questions to the Citus team, but re: CloudFlare this link might be helpful:

https://blog.cloudflare.com/scaling-out-postgresql-for-cloud...

satygeek10y ago

Thanks. I went through it but couldnt find info about their cluster size, data size and queries response time

Someone10y ago· 3 in thread

One must thank them for open sourcing this, and cannot blame them for using a different license, but using a different license makes me think calling this "unfork" is bending the truth a little bit.

anarazel10y ago

The "unfork" part is primarily about not forking the postgres codebase anymore, as done before citus 5.0 (i.e. we modified parts of postgres, to make it citus). Citus now entirely works as an extension to postgres, using the extension facilities postgres provides.

takeda10y ago

Perhaps I'm missing something, but this is just an extension that works with standard postgres, there are no code changes in postgres itself, so it doesn't look like it ever was a fork.

jasonmp8510y ago

(Jason from Citus here)

Yes, that's what you're seeing right now, but in the past Citus (used to be "CitusDB") was a superset of the entire PostgreSQL codebase. During the lead-up to the open source release, we removed the use of any static methods or internal machinery and rewrote the installation process to use the PostgreSQL CREATE EXTENSION command. Additionally, we moved all of pg_shard's DML functionality into Citus to unify the product line.

So ultimately CitusDB was a fork but is now entirely an extension.

1 more reply

voctor10y ago· 2 in thread

Citus can parallelize SQL queries across a cluster and across multiple CPU cores. How does it compare with the upcoming 9.6 version of PostgreSQL which will support parallel-able sequential scans, parallel joins and parallel aggregate ?

lfittl10y ago

AFAIK all the parallel work done in 9.6 refers to parallel operations on a single node (but multiple cores).

This would be complimentary to what Citus does, which is distributing the load across multiple shard instances (each with their own cores, benefiting from the parallel work in 9.6).

voctor10y ago

Yes, but Citus can also parallelize on multiple cores when used on a single machine ("If you’re running Citus on a single machine, this will scale queries across multiple CPU cores. and create the impression of sharding across databases."). Will this functionality becomes obsolete with the 9.6 ?

1 more reply

jjawssd10y ago· 2 in thread

My guess is that Citus is making enough money from consulting that they don't need to keep this code closed source when they can profit from free community-driven growth while they are expanding their sales pipeline through consulting.

craigkerstiens10y ago

Hi, Craig from Citus here. In addition to the open source Citus, we have some premium features in our Enterprise edition. Many of these are ones that larger enterprises will want to pay for such as security features around roles, a tool for automated cluster resizing, and enhanced load balancing tools, and of course support. Beyond that we have a few other things in the work that will speak to various revenue models for the future.

onRoadAgain2310y ago

They offer an enterprise paid version with more functionality.

"for customers with large production deployments, we also offer an enterprise edition that comes with additional functionality"

ccleve10y ago· 1 in thread

I'd very much like to see what algorithm these systems are using to enable transactions in a distributed environment. Are they just using straight two-phase commit, and letting the whole transaction fail if a single server goes down? Or are are they getting fancy and doing some kind of replication with consensus?

wmfiv10y ago

I believe transactions must process against a single node.

erikb10y ago

Unforking is a very smart decision. Postgres also has gained a lot of favour since MySQL was bought by Oracle. Altogether Citus has earned a lot of kudos for that move, at least with me, for all that may count!

lobster_johnson10y ago

This is great!

One thing I'm having trouble with is finding information about transactional semantics. If I make several updates (to differently sharded keys) in a single transaction, will the transaction boundaries be preserved (committed "locally" first, then replicated atomically to shards)? Or will they fan out to different shards with separate begin/commit statements? Or without transactional boundaries at all?

In fact, I can't really find any information on how CitusDB achieves its transparent sharding for queries and writes. Does it add triggers to distributed tables to rewrite inserts, updates and deletes? Or are tables renamed and replaced with foreign tables? I wish the documentation was a bit more extensive.

signalnine10y ago

Congrats from Agari! We've been looking forward to this and continue to get a lot of value from both the product and the top-notch support.

ahachete10y ago

Congratulations, Citus.

Since I heard last year at PgConfSV that you will be releasing CitusDB 5.0 as open source, I've been waiting for this moment to come.

It makes 9.5's awesome capabilities to be augmented with sharding and distributed queries. While this targets real-time analytics and OLAP scenarios, being an open source extension to 9.5 means that a whole lot of users will benefit from this, even under more OLTP-like scenarios.

Now that Citus is open source, ToroDB will add a new CitusDB backend soon, to scale-out the Citus way, rather than in a Mongo way :)

Keep up with the good work!

BinaryIdiot10y ago

I don't have a ton of experience scaling out and using different flavors of PostgreSQL but I had run across Postgres-XL not long ago; does anyone know how this compares to that?

ismail10y ago

Any thoughts on using something like postgres+citrus vs hadoop+hbase+ecosystem vs druid for olap/analytics with very large volumes of data

uberneo10y ago

Great product - If would be nice to have a Admin interface like RethinkDB where you can clearly define your replication and Sharding settings. Any documentation around how to do this from command line ?

albasha10y ago

I recently switched back to MariaDB because I didn't see a clear/easy path for Postgres scalability in case the project i am working on takes off. I am under the assumption there are at least two fairly simple approaches to scale MySQL; master-master replication using Galera and Aurora from AWS. What do you guys think? Am I right in thinking MySQL is easier to scale given I want to spend the least amount of time maintaining it.

Dowwie10y ago

would a natural evolutionary path for start ups be to emerge with postgresql and grow to requiring citusdb?

ioltas10y ago

Congrats to all for the release. That's a lot of work accomplished.

ksec10y ago

Does anyone know How does Citus compared to Postgre XL ?

lambdafunc10y ago

Any benchmarks comparing CitusDB against Presto?

Dowwie10y ago

is it correct to compare citusdb with pipelinedb?

j / k navigate · click thread line to collapse

153 comments

93 comments · 28 top-level

exhilaration10y ago· 8 in thread

AGPL license if anyone's curious: https://github.com/citusdata/citus/blob/master/LICENSE

gtrubetskoy10y ago

Which means there is no chance this would ever become part of PostgreSQL proper.

rch10y ago

That's correct, and with such a significant license change I think the term 'unfork' is being used inappropriately in the title.

1 more reply

oconnore10y ago

AGPL actually makes a lot more sense for a database (except for SQLite, which is typically redistributed).

davidfetter10y ago

1 more reply

polskibus10y ago

Does that mean that whatever connects to this database needs to be AGPL too?

crudbug10y ago

No .. it means any changes you make to CitusDB should be made public.

[0] https://en.wikipedia.org/wiki/Affero_General_Public_License

kodablah10y ago

No, it means if you make changes to the code and use it over a network you have to be AGPL too (think "network" as "distribution" in the GPL sense).

2 more replies

rkrzr10y ago

GNU AGPL to be precise (GNU Affero General Public License).

no1youknowz10y ago· 7 in thread

> With the release of newly open sourced Citus v5.0, pg_shard's codebase has been merged into Citus...

This is fantastic, sounds like the setup process is much simpler.

If say, they released the Active/Active Master later on this year. That's huge. I can pretty much think of my DB solution as done at this point.

ozgune10y ago

(Ozgun from Citus Data)

For on-premise deployments, the primary challenge is set-up complexity. We're now prototyping one of those designs to know more: https://github.com/citusdata/citus/issues/389

We expect to share all the details and a concrete timeline in April.

Florin_Andrei10y ago

Would it be possible (eventually) to use Citus for sharding within the datacenter, and BDR for master/master replication between datacenters?

Or is Citus taking over the master/master replication? (or is it doing something different?)

1 more reply

cm310y ago

What does "works well on the cloud" mean specifically? Is there some difference when run on your own hardware?

1 more reply

batbomb10y ago

Can Citus handle geospatial sharding?

1 more reply

enesunal10y ago

good work! congrats =)

also in Turkish: kolaylıklar dilerim :)

no1youknowz10y ago

Excellent news! Really looking forward to this one! :)

heme10y ago

Can you elaborate on the scaling problems you were having?

gtrubetskoy10y ago· 6 in thread

umur10y ago

Umur from Citus here. Adding to Craig's comments:

craigkerstiens10y ago

Beyond that, we have a few other things in the works for the future that will cover other revenue models.

atonse10y ago

My hunch is that the two are not really related.

SwellJoe10y ago

I know Red Hat is making a ton of money. But, CoreOS and Docker, are they at the "making a ton of money" stage, or merely well-funded by investors?

1 more reply

ascetone10y ago

It doesn't.

AGPL means the only people using it have to licensed the same.

jgreen1010y ago

Well, people use MongoDB...

You can use database servers running AGPL software in a closed source SaaS: http://www.gnu.org/licenses/why-affero-gpl.en.html

1 more reply

onRoadAgain2310y ago· 6 in thread

"For customers with large production deployments, we also offer an enterprise edition that comes with additional functionality"

sergiosgc10y ago

This is the model used by many companies backing OSS. The fact that you have been burned before means the actor in that case (or cases) acted badly, not that the model is wrong.

gdulli10y ago

Do you often make big decisions based on extrapolation from so few data points?

onRoadAgain2310y ago

If this costed roughly a million dollars, then yes. Especially if you're locked in like with a DB. I use nginx because even though it has this mode it would be easy to replace with something else.

xenophonf10y ago

gtaylor10y ago

jamespo10y ago

Feel free to code your own

TY10y ago· 5 in thread

This is awesome! Tebrikler (congrats) on the release of 5.0 and going OS, definitely great news.

Can you publish competitive positioning of Citus vs Actian Matrix (nee ParAccel) and Vertica? I'd love to compare them side by side - even if it's just from your point of view :-)

peatmoss10y ago

Second the request for comparison to Vertica. I've recentishly become a user at work, and I wonder how this compares. A quick googling didn't yield anything too informative.

flavor810y ago

...and Redshift. I love what Amazon provide, but it gets expensive.

umur10y ago

This was a long answer to a short question, but hopefully useful. Let me know if you have questions, or any feedback using Citus – would love to hear your thoughts!

1 more reply

umur10y ago

1 more reply

umur10y ago

1 more reply

rkrzr10y ago· 5 in thread

hblanks10y ago

We've been running CitusDB for a couple years now at CloudFlare for serving aggregated analytics to customers (cf. https://blog.cloudflare.com/scaling-out-postgresql-for-cloud...).

It's a good product, and it was even fairly easy to do a major version upgrade / cluster relocation. At least as easy as such a thing can be. :-)

rkrzr10y ago

Are there any limitations you have run into? E.g. can you still use all index types that Postgres offers or are there any special distributed index types that CitusDB adds perhaps?

2 more replies

tlarkworthy10y ago

nit: Postgres doesn't scale horizontally, it only scales vertically.

andrewflnr10y ago

I guess if you're putting more servers in a rack, it does tend to be a vertically-oriented process. :)

manigandham10y ago

1 more reply

devit10y ago· 4 in thread

I've been unable to find any clear description of the capabilities of Citus and competing solutions (postgres-x2 seems the other leader).

Which of these are supported:

1. Full PostgreSQL SQL language

2. All isolation levels including Serializable (in the sense that they actually provide the same guarantees as normal PostgreSQL)

3. Never losing any committed data on sub-majority failures (i.e. synchronous replication)

4. Ability to automatically distribute the data (i.e. sharding)

5. Ability to replicate the data instead or in addition to sharding

6. Transactionally-correct read scalability

7. Transactionally-correct write scalability where possible (i.e. multi-master replication)

8. Automatic configuration only requiring to specify some sort of "cluster identifier" the node belongs to

ozgune10y ago

(Ozgun from Citus Data)

Citus supports sharding and replication out of the box (#4, #5). On #6, reads go through a master node (metadata server) and you see what you write.

On #8, could you elaborate a bit more? Do you mean a logical identifier for the node?

gorodetsky10y ago

Thanks for awesome product!

Do you know when you're planning to release Citrus 5.0 deb/rpm packages?

2 more replies

devit10y ago

Does this mean that distributed transactions are not supported at all?

cowardlydragon10y ago

But if they answer those questions, you won't buy support/use it...

Have a donut and look at our marketing spreadsheets.

I'm so tired of "seamless" "effortless" "simple" distributed database lies. There's mathematical theorems as to why there is no free lunch.

faizshah10y ago· 4 in thread

So this sounds similar to Pivotal's Greenplum which is also open source, can anyone compare the two?

frn10y ago

Greenplum is based on postgres 8.2, with the featureset you'd expect from pg 8.2 - basically none of the additions after 2006 have merged to GP.

faizshah10y ago

Ok, and what's the process like for disaster recovery with citus?

1 more reply

ioltas10y ago

Since the move to open source, more recent upstream changes have been slowly merged in the code base, though they seem to be still on a 8.3 base, still a couple of years worth of code to go through.

amitlan10y ago

Greenplum is a fork of Postgres codebase, Citus is not; it's an extension that leverages community Postgres's extensibilty APIs. This point seems to be highlighted in their post.

azinman210y ago· 4 in thread

I want it to be called citrus, which is what I always read it as....

BinaryIdiot10y ago

jrochkind110y ago

Then I'd get confused and think it had something to do with Citrix.

azinman210y ago

A cute lemon logo might help :)

mrgreenfur10y ago

Only after reading this did I realize it's NOT citrus. Thanks!

X86BSD10y ago· 4 in thread

AGPL? This is dead in the water :( It will never be integrated into PG. What a shame. It should have been a 2 clause BSDL. Sigh.

tspiteri10y ago

To use open source code, the more permissive the license the better. But to actually open your own code, BSDL is a very tough sell.

gtrubetskoy10y ago

>> The BSDL does not make much economic sense to the company open sourcing their code

If this were true then Cloudera, Horton and a whole bunch of other companies would be out of business, yet in reality they are doing really well. All that AGPL is doing for Citus is:

1. Turning away people (customers) who are religious about licenses.

2. Eliminating any possibility of this code ever being integrated into PostgreSQL

1 more reply

X86BSD10y ago

4 more replies

andreasklinger10y ago

might be intentional

satygeek10y ago· 4 in thread

mslot10y ago

Yes, Citus may be a good fit, for a complete example see:

https://www.citusdata.com/blog/15-marco-slot/402-interactive...

greggyb10y ago

Hundreds of millions of records in <=3 seconds is not really a big challenge with a good data model and proper indexing on even a single server.

I work for a BI consultancy and we don't even bat an eye until we hit billions of records in a primary fact table.

Certainly the DB server does need to scale vertically to some extent as you pass through the orders of magnitude > 10M. A good columnstore engine is also worthwhile to consider.

lfittl10y ago

I'll leave the initial questions to the Citus team, but re: CloudFlare this link might be helpful:

https://blog.cloudflare.com/scaling-out-postgresql-for-cloud...

satygeek10y ago

Thanks. I went through it but couldnt find info about their cluster size, data size and queries response time

Someone10y ago· 3 in thread

One must thank them for open sourcing this, and cannot blame them for using a different license, but using a different license makes me think calling this "unfork" is bending the truth a little bit.

anarazel10y ago

takeda10y ago

Perhaps I'm missing something, but this is just an extension that works with standard postgres, there are no code changes in postgres itself, so it doesn't look like it ever was a fork.

jasonmp8510y ago

(Jason from Citus here)

So ultimately CitusDB was a fork but is now entirely an extension.

1 more reply

voctor10y ago· 2 in thread

lfittl10y ago

AFAIK all the parallel work done in 9.6 refers to parallel operations on a single node (but multiple cores).

This would be complimentary to what Citus does, which is distributing the load across multiple shard instances (each with their own cores, benefiting from the parallel work in 9.6).

voctor10y ago

1 more reply

jjawssd10y ago· 2 in thread

craigkerstiens10y ago

onRoadAgain2310y ago

They offer an enterprise paid version with more functionality.

"for customers with large production deployments, we also offer an enterprise edition that comes with additional functionality"

ccleve10y ago· 1 in thread

wmfiv10y ago

I believe transactions must process against a single node.

erikb10y ago

lobster_johnson10y ago

This is great!

signalnine10y ago

Congrats from Agari! We've been looking forward to this and continue to get a lot of value from both the product and the top-notch support.

ahachete10y ago

Congratulations, Citus.

Since I heard last year at PgConfSV that you will be releasing CitusDB 5.0 as open source, I've been waiting for this moment to come.

Now that Citus is open source, ToroDB will add a new CitusDB backend soon, to scale-out the Citus way, rather than in a Mongo way :)

Keep up with the good work!

BinaryIdiot10y ago

I don't have a ton of experience scaling out and using different flavors of PostgreSQL but I had run across Postgres-XL not long ago; does anyone know how this compares to that?

ismail10y ago

Any thoughts on using something like postgres+citrus vs hadoop+hbase+ecosystem vs druid for olap/analytics with very large volumes of data

uberneo10y ago

albasha10y ago

Dowwie10y ago

would a natural evolutionary path for start ups be to emerge with postgresql and grow to requiring citusdb?

ioltas10y ago

Congrats to all for the release. That's a lot of work accomplished.

ksec10y ago

Does anyone know How does Citus compared to Postgre XL ?

lambdafunc10y ago

Any benchmarks comparing CitusDB against Presto?

Dowwie10y ago

is it correct to compare citusdb with pipelinedb?

j / k navigate · click thread line to collapse