CQL: Categorical Databases (opens in new tab)

(categoricaldata.net)

111 pointsnoworriesnate27d ago39 comments

39 comments

26 comments · 10 top-level

bob102924d ago· 6 in thread

> CQL is not a database management system: it neither stores nor updates data.

The same could be said for SQL. How does CQL differ from SQL? If I squint my eyes just a tiny amount, these ideas become really difficult to separate. I was always under the impression that the relational model is based upon many concepts studied in category theory. To my mind, all of the following things are overlapping parts of the exact same monster:

  Set theory
  Category theory
  Graph theory
  Type theory
  Discrete mathematics 
  Relational algebra
  Relational calculus
  Relational modeling
  An actual sql schema

mamcx23d ago

> How does CQL differ from SQL?

Most attempts to replace/improve SQL derive from the fact SQL was a poorly conceived and designed interface, that originally was meant to be a very small DSL for end users, but unfortunately, was allowed to become a poor, complicated, confusing mess for app developers:

https://cacm.acm.org/research/50-years-of-queries/

    SQL is not an orthogonal language... This is because, in the early days, Ray Boyce and I did not think we were designing a language for programmers. ..
 As it turned out, Ray and I were wrong about the predominant usage of SQL...

(same problem from JS, php, etc: Creators don't anticipate that developers will suffer and torture their needs with such anemic ideas!)

---

So the #1 thing any actual replacement or alternative to SQL is how actually become a good language for development, so it is actually composable, can be actually be used to reason about it, has minimal foot guns, etc.

There is a lot of misunderstanding and pushback, similar to how people in the past fight improvements over JS/C/C++ until typescript, rust comes.

But, oh boy, SQL need their typescript!

jeltz23d ago

The main innovation here seems to be compile time checking of that foreign keys are respected but that is a thing that can be added to SQL and there is at least one proposal for doing so. So I do not really see anything fundamentally different from SQL.

https://keyjoin.org/

Full disclosure: I am one of the co-authors of this paper and an associated patch implenting it in PostgreSQL that we have proposed.

I am happy to see more people than us think this is useful.

JoelJacobson23d ago

Here is a tl;dr as well: https://keyjoin.org/tldr.html

Groxx23d ago

Since it took bit of time and multiple broken links to find: https://categoricaldata.net/examples has links to a lot of info about CQL for learning/comparing/etc.

(edit: in retrospect, this is just the "getting started" header link, despite the URL)

Unfortunately none that I've followed seems to do much to describe much of anything except the math foundations. They might cover pieces of syntax, but not how to use or think about them, and they seem to be rather excited about the data-generating features that I can't imagine anyone using outside tech demos (unless that's the only insert method? what about update?) :/

(edit 2: you can't even rely on the manual apparently, I'm fairly confident https://categoricaldata.net/help/Demo.html (manual -> examples -> demo) won't work because it has no schema definition for `Animal`. and it like all the others I've checked are little more than syntax and output, no real explanation)

js824d ago

> How does CQL differ from SQL?

SQL is like Java, CQL is like Haskell. SQL has been around and used in production. CQL is a research language, possibly cleaner foundation but YMMV.

The math fields you list are connected, but whether they are the same monster - again it's kinda like claiming all programming languages and implementations are the same (Turing-complete?) monster.

randomNumber724d ago

SQL is not an imperative programming language.

2 more replies

mattstir23d ago· 3 in thread

> Reduce risk of failure through artificial intelligence. CQL contains an embedded automated theorem prover that guarantees the correctness of CQL programs.

Man, it's a rough environment right now marketing-wise. I don't know if they're contractually obligated to say the funny magic words, but the term AI is nearly entirely meaningless at this point. Akin to saying "behold my mighty calculator app: it prevents divisions by zero through artificial intelligence!"

ianhorn23d ago

Huh, I read the pitch differently. As "reduce risk of (failure through artificial intelligence)," not as "(reduce risk of failure) through artificial intelligence."

Maybe that's my bias since that's what I'm working on, but it's a big benefit to have stronger compiler guarantees of correctness so that an LLM can't screw things up as much. No BSing that it works when the compiler requires proof.

layer823d ago

I had to read it twice to come to that conclusion. Maybe the prima facie ambiguity is intentional.

PaulHoule23d ago

It's what we used to call AI back in the 1970s and 1980s which has advanced a whole lot with little awareness. That is, people thought 10,000 rules was a lot of rules in 1998 and now you can work with 10,000,000 rules. And theorem provers, SAT and SMT all got vastly better.

randomNumber724d ago· 2 in thread

Since Codd's paper showed that the relational model dominates other approaches (for data storage) I would expect a paper that shows categorical database are not affected by this and what benefit they have.

js824d ago

My (amateur) take. CDB model (based on functions) has three advantages over RDB model (based on relations):

1. Easier modelling sum types (inheritance) due to duality.

2. Better handling of null due to labelled null.

3. Better foundation of elementary types (they're just another table ids). (Column stores often do that already, if your question is about storage.)

adrian_b24d ago

While the relational model is claimed to be based on relations, the vast majority of the "relations" used in practice are functions, not general relations.

A general relation exists only between the columns of a table that are included in a multi-column primary key.

All columns that are not part of the primary key are functions of the primary key.

Most tables used in practice use a single column as the primary key, which is frequently just a number or a UUID. Most databases contain only tables that are functions, without any table that contains general relations.

The most frequently used kinds of joins are just function compositions.

1 more reply

mattsouth24d ago· 2 in thread

Not to be confused with https://cql.hl7.org/ which is the CQL I know of.

gengstrand23d ago

For me, CQL will always mean Cassandra Query Language https://cassandra.apache.org/doc/4.0/cassandra/cql/

gourabmi23d ago

This was my first thought too!

joshsh23d ago· 1 in thread

If this sort of thing is interesting to you -- using type theory and category theory to add compositionality to programming and databases -- also check out CQL's cousin, the graph programming language Hydra: https://github.com/CategoricalData/hydra.

snthpy22d ago

Interesting, thanks!

iron_fever23d ago· 1 in thread

I know very little about databases, but I work heavily with category theory, so fwiw: I think the main benefit is composition. The edge over SQL shows up when you combine schema mappings - a mapping is a functor, so when you migrate data along it the constraints come with it by construction, and you don't end up writing ETL and hoping integrity held.

As best as I can tell (but i really dont know much about databases) it's probably a narrow advantage - storage and everyday queries still go to Codd's model - but for stitching schemas together it seems like it could work.

I'm using similar math for automated formal verification, where this approach is what makes it tractable.

solomonb23d ago

I haven't used CQL but this is how the advantages have been described to me as well.

flying_sheep24d ago· 1 in thread

Thanks for the sharing. It looks interesting but I did not dive deep into it. Just wonder how is it different from SQL trigger which can also ensure integrities?

js824d ago

It's not much really, CDBs are based on foreign key relationships as a fundamental building block, rather than on relation.

The difference is more in theory than in practice.

srean24d ago

There was a good blog post on how the category theoretic ideas behind this applies to data frames

What Category Theory Teaches Us About DataFrames https://mchav.github.io/what-category-theory-teaches-us-abou...

Discussed on HN at (67 comments)

https://news.ycombinator.com/item?id=47561426

Syzygies23d ago

We have not yet converged on a best (or even adequate) way to present a structured mountain of information to AI, not already in its training corpus.

AI agents fielded by major AI players still fail at the basic task of providing immediate and correct support for use of the current versions of their products. If a programming language is too new to have adequate representation in the training corpus, there isn't an accepted standard way to provide a reference manual targeting AI agents. Even the best way to include documentation in a large project so new AI agents can take over is controversial. A pile of linked markdown files really isn't an answer, less structured than a codebase itself, that AI is good at navigating.

Other HN posts have discussed using SQL as a backbone for the AI "mind mapping" support we need for AI more critically than for ourselves.

I was hoping that CQL could be an answer to this. Perhaps, but not its current primary goal.

deterministic22d ago

> that has revolutionized several areas of computer science

That is quite a claim. I will argue that Categorical Databases most definitely haven't. Any areas where it is true?

j / k navigate · click thread line to collapse

39 comments

26 comments · 10 top-level

bob102924d ago· 6 in thread

> CQL is not a database management system: it neither stores nor updates data.

  Set theory
  Category theory
  Graph theory
  Type theory
  Discrete mathematics 
  Relational algebra
  Relational calculus
  Relational modeling
  An actual sql schema

mamcx23d ago

> How does CQL differ from SQL?

https://cacm.acm.org/research/50-years-of-queries/

    SQL is not an orthogonal language... This is because, in the early days, Ray Boyce and I did not think we were designing a language for programmers. ..
 As it turned out, Ray and I were wrong about the predominant usage of SQL...

(same problem from JS, php, etc: Creators don't anticipate that developers will suffer and torture their needs with such anemic ideas!)

---

There is a lot of misunderstanding and pushback, similar to how people in the past fight improvements over JS/C/C++ until typescript, rust comes.

But, oh boy, SQL need their typescript!

jeltz23d ago

https://keyjoin.org/

Full disclosure: I am one of the co-authors of this paper and an associated patch implenting it in PostgreSQL that we have proposed.

I am happy to see more people than us think this is useful.

JoelJacobson23d ago

Here is a tl;dr as well: https://keyjoin.org/tldr.html

Groxx23d ago

Since it took bit of time and multiple broken links to find: https://categoricaldata.net/examples has links to a lot of info about CQL for learning/comparing/etc.

(edit: in retrospect, this is just the "getting started" header link, despite the URL)

js824d ago

> How does CQL differ from SQL?

SQL is like Java, CQL is like Haskell. SQL has been around and used in production. CQL is a research language, possibly cleaner foundation but YMMV.

The math fields you list are connected, but whether they are the same monster - again it's kinda like claiming all programming languages and implementations are the same (Turing-complete?) monster.

randomNumber724d ago

SQL is not an imperative programming language.

2 more replies

mattstir23d ago· 3 in thread

> Reduce risk of failure through artificial intelligence. CQL contains an embedded automated theorem prover that guarantees the correctness of CQL programs.

ianhorn23d ago

Huh, I read the pitch differently. As "reduce risk of (failure through artificial intelligence)," not as "(reduce risk of failure) through artificial intelligence."

layer823d ago

I had to read it twice to come to that conclusion. Maybe the prima facie ambiguity is intentional.

PaulHoule23d ago

randomNumber724d ago· 2 in thread

js824d ago

My (amateur) take. CDB model (based on functions) has three advantages over RDB model (based on relations):

1. Easier modelling sum types (inheritance) due to duality.

2. Better handling of null due to labelled null.

3. Better foundation of elementary types (they're just another table ids). (Column stores often do that already, if your question is about storage.)

adrian_b24d ago

While the relational model is claimed to be based on relations, the vast majority of the "relations" used in practice are functions, not general relations.

A general relation exists only between the columns of a table that are included in a multi-column primary key.

All columns that are not part of the primary key are functions of the primary key.

The most frequently used kinds of joins are just function compositions.

1 more reply

mattsouth24d ago· 2 in thread

Not to be confused with https://cql.hl7.org/ which is the CQL I know of.

gengstrand23d ago

For me, CQL will always mean Cassandra Query Language https://cassandra.apache.org/doc/4.0/cassandra/cql/

gourabmi23d ago

This was my first thought too!

joshsh23d ago· 1 in thread

snthpy22d ago

Interesting, thanks!

iron_fever23d ago· 1 in thread

I'm using similar math for automated formal verification, where this approach is what makes it tractable.

solomonb23d ago

I haven't used CQL but this is how the advantages have been described to me as well.

flying_sheep24d ago· 1 in thread

Thanks for the sharing. It looks interesting but I did not dive deep into it. Just wonder how is it different from SQL trigger which can also ensure integrities?

js824d ago

It's not much really, CDBs are based on foreign key relationships as a fundamental building block, rather than on relation.

The difference is more in theory than in practice.

srean24d ago

There was a good blog post on how the category theoretic ideas behind this applies to data frames

What Category Theory Teaches Us About DataFrames https://mchav.github.io/what-category-theory-teaches-us-abou...

Discussed on HN at (67 comments)

https://news.ycombinator.com/item?id=47561426

Syzygies23d ago

We have not yet converged on a best (or even adequate) way to present a structured mountain of information to AI, not already in its training corpus.

Other HN posts have discussed using SQL as a backbone for the AI "mind mapping" support we need for AI more critically than for ourselves.

I was hoping that CQL could be an answer to this. Perhaps, but not its current primary goal.

deterministic22d ago

> that has revolutionized several areas of computer science

That is quite a claim. I will argue that Categorical Databases most definitely haven't. Any areas where it is true?

j / k navigate · click thread line to collapse