At face value, I shudder at the syntax.
Example from their tutorial:
EmployeeName(name:) :- Employee(name:);
Engineer(name:) :- Employee(name:, role: "Engineer");
EngineersAndProductManagers(name:) :- Employee(name:, role:), role == "Engineer" || role == "Product Manager";
vs. the equivalent SQL:
SELECT Employee.name AS name
FROM t_0_Employee AS Employee
WHERE (Employee.role = "Engineer" OR Employee.role = "Product Manager");
SQL is much more concise, extremely easy to follow.
No weird OOP-style class instantiation for something as simple as just getting the name.
As already noted in the 2021 discussion, what's actually the killer though is adoption and, three years later, ecosystem.
SQL for analytics has come an extremely long way with the ecosystem that was ignited by dbt.
There is so much better tooling today when it comes to testing, modelling, running in memory with tools like DuckDB or Ibis, Apache Iceberg.
There is value to abstracting on top of SQL, but it does very much seem to me like this is not it.
Datalog is not really a query language, actually. But it is relational, like SQL, so it lets you express relations between "facts" (the rows) inside tables. But it is more general, because it also lets you express relations between tables themselves (e.g. this "table" is built from the relationship between two smaller tables), and it does so without requiring extra special case semantics like VIEWs.
Because of this, it's easy to write small fragments of Datalog programs, and then stick it together with other fragments, without a lot of planning ahead of time, meaning as a language it is very compositional. This is one of the primary reasons why many people are interested in it as a SQL alternative; aside from your typical weird SQL quirks that are avoided with better language design (which are annoying, but not really the big picture.)
If I understand you correctly, you can easily get the same with ephemeral models in dbt or CTEs generally?
> Because of this, it's easy to write small fragments of Datalog programs, and then stick it together with other fragments, without a lot of planning ahead of time, meaning as a language it is very compositional.
This can be a benefit in some cases, I guess, but how can you guarantee correctness with flexibility involved?
With SQL, I get either table or column level lineage with all modern tools, can audit each upstream output before going into a downstream input. In dbt I have macros which I can reuse everywhere.
It's very compositional while at the same time perfectly documented and testable at runtime.
Could you share a more specific example or scenario where you have seen Datalog/ Logica outperform a modern SQL setup?
Generally curious.
I am not at all familiar with the Logica/Datalog/Prolog world.
But: "Logica compiles to SQL".
With the caveat that it only kind of does, since it seems constrained to three database engines, probably the one they optimise the output to perform well on, one where it usually doesn't matter and one that's kind of mid performance wise anyway.
In light of that quote it's also weird that they mention that they are able to run the SQL they compiled to "in interactive time" on a rather large dataset, which they supposedly already could with SQL.
Arguably I'm not very good with Datalog and have mostly used Prolog, but to me it doesn't look much like a Datalog. Predicates seems to be variadic with named parameters, making variables implicit at the call site so to understand a complex predicate you need to hop away and look at how the composite predicates are defined to understand what they return. Maybe I misunderstand how it works, but at first glance that doesn't look particularly attractive to me.
Can you put arithmetic in the head of clauses in Datalog proper? As far as I can remember, that's not part of the language. To me it isn't obvious what this is supposed to do in this query language.
"Anyone who know the system can easily learn it" he said with a sniff.
Yes, the similarity to Prolog lets you draw on a vast pool of Prolog programmers out there.
I mean, I studied a variety of esoteric languages in college and they were interesting (I can't remember if we got to prolog tbh but I know 1st logic pretty well and that's related). When I was thrown into a job with SQL, it's English language syntax made things really easy. I feel confident that knowing SQL wouldn't oppositely make learning Prolog easy (I remember Scala later and not being able to deal with it's opaque verbosity easily).
Basically, SQL syntax makes easy things easy. This gets underestimated a lot, indeed people seem to have contempt for it. I think that's a serious mistake.
I understand the desire to no waste your time, but I think you're missing the big idea. Those statements define logical relations. There's nothing related to classes or OOP.
Using those building blocks you can do everything that you can with SQL. No need for having clauses. No need for group by clauses. No need for subquery clauses. No need for special join syntax. Just what you see above.
And you can keep going with it. SQL quickly runs into the limitations of the language. Using the syntax above (which is basically Prolog) you can construct arbitrarily large software systems which are still understandable.
If you're really interested in improving as a developer, then I suggest that spend a day or two playing with a logic programming system of some sort. It's a completely different way of thinking about programming, and it will give you mental tools that you will never pick up any other way.
Goes on the holidays list.
That said. I like Logica and Datalog. For me the main use case is "recursive" queries as they are simpler to express that way. PRQL has made some progress there with the loop operator but it could still be better. If you have any ideas for improvement, please reach out!
I.e. I understand now that it's seemingly about more than simple querying, so me coming very much from an analytics/ data crunching background am wondering what a use case would look like where this is arguably superior to SQL.
Wait, does Logica factorize the number passed to this predicate when unifying the number with a * b?
So when we call Composite (100) it automatically tries all a's and b's who give 100 when m7ltiplied
I'd be curious to see the SQL it transpiles to.
The way I read these rules is not from left-to-right but from right-to-left. In this case, it would say: Pick two numbers a > 1 and b > 1, their product a*b is a composite number. The solver starts with the facts that are immediately evident, and repeatedly apply these rules until no more conclusions are left to be drawn.
"But there are infinitely many composite numbers," you'll object. To which I will point out the limit of numbers <= 30 in the line above. So the fixpoint is achieved in bounded time.
Datalog is usually defined using what is called set semantics. In other words, tuples are either derivable or not. A cursory inspection of the page seems to indicate that Logica works over bags / multisets. The distinct keyword in the rule seems to have something to do with this, but I am not entirely sure.
This reading of Datalog rules is commonly called bottom-up evaluation. Assuming a finite universe, bottom-up and top-down evaluation are equivalent, although one approach might be computationally more expensive, as you point out.
In contrast to this, Prolog enforces a top-down evaluation approach, though the actual mechanics of evaluation are somewhat more complicated.
I found a way to look at the SQL it generates without installing anything:
Execute the first two cells in the online tutorial collab (the Install and Import). Then replace the 3rd cell content with the following and execute it:
%%logica Composite
@Engine("sqlite"); # don't try to authorise and use BigQuery
# Define numbers 1 to 30.
Number(x + 1) :- x in Range(30);
# Defining composite numbers.
Composite(a * b) distinct :- Number(a), Number(b), a > 1, b > 1;
# Defining primes as "not composite".
Prime(n) distinct :- Number(n), n > 1, ~Composite(n);
Look at the SQL tab in the results.To use SQLite use @Engine("sqlite") imperative. And you can then connect to you database file with @AttachDatabase imperative.
For example if you have example.db file with Fruit table which has col0 column, then you can count fruits with program:
@Engine("sqlite"); @AttachDatabase("example", "example.db");
CountFruit(fruit) += 1 :- Fruit(fruit);
Then run CountFruit predicate.
Google is pushing the new language Logica to solve the major flaws in SQL - https://news.ycombinator.com/item?id=29715957 - Dec 2021 (1 comment)
Logica, a novel open-source logic programming language - https://news.ycombinator.com/item?id=26805121 - April 2021 (98 comments)
The basic selling point is a compositional query language, so that over-time one may have a library of re-usable components. If anyone really has built such a library I'd love to know more about how it worked out in practice. It isn't obvious to me how those decorators are supposed to compose and abstract on first look.
Its also not immediately obvious to me how complicated your library of SQL has to be for this approach to make sense. Say I had a collection of 100 moderately complex and correlated SQL queries, and I was to refactor them into Logica, in what circumstances would it yield a substantial benefit versus (1) doing nothing, (2) creating views or stored procedures, (3) using DBT / M4 or some other preprocessor for generic abstraction.
The author discusses Logica vs. plain SQL vs POSIX.
I’d always start with dbt/ Sqlmesh.
The library you’re talking about exists: dbt packages.
Check out hub.getdbt.com and you’ll find dozens of public packages for standardizing sources, data formatting or all kinds of data ops.
You can use almost any query engine/ DB out there.
Then go for dbt power user in VS Code or use Paradime and you have first class IDE support.
I have no affiliation with any of the products, but from a practitioner perspective the gap between these technologies (and their ecosystems) is so large that the ranking of value for programming is as clear as they come.
Curious to hear battle stories from other teams using this.
Having been in quite a few data teams, and supported businesses using dashboards, a very large chunk of the time, the requests do align with the composable feature: people want “the data from that dashboard but with x/y/z constraints too” or “<some well defined customer segment> who did a|b in the last time, and then send that to me each week, and then break it down by something-else”. Scenarios that all benefit massively from being able to compose queries more easily, especially as things like “well defined customer segment” get evolved. Even ad-hoc queries would benefit because you’d be able to throw them together faster.
There’s a number of tools that proclaim to solve this, but solving this at the language level strikes me as a far better solution.
That is so say, you have to define the jobs that do the aggregations, as well. Knowing that you can't just add historical records and have them immediately on current reports.
I welcome the idea that a support team could use better tools. I suspect polyglot to win. Ad hoc is hard to do better than SQL. DDL is different, but largely difficult to beat SQL, still. And job description is a frontier of mistakes.
Each can do the other, to a limited extent, but it becomes increasingly difficult with even small increases in complexity. For instance, you can do inferencing in SQL, but it is almost entirely manual in nature and not at all like the automatic forward-inferencing of Prolog. And yes, you can store data(facts) in Prolog, but it is not at all designed for the "storage, retrieval, projection and reduction of Trillions of rows with thousands of simultaneous users" that SQL is.
I even wanted to implement something like Logica at the moment, primarily trying to build a bridge through a virtual table in SQLite that would allow storing rules as mostly Prolog statements and having adapters to SQL storage when inference needs facts.
It's trivial to convert stuff like web server access logs into Prolog facts by either hacking the logging module or running the log files through a bit of sed, and then you can formalise some patterns as rules and do rather nifty querying. A hundred megabytes of RAM can hold a lot of log data as Prolog facts.
E.g. '2024-11-16 12:45:27 127.0.0.1 "GET /something" "Whatever User-Agent" "user_id_123"' could be trivially transformed into 'logrow("2024-11-16", "12:45:27", "127.0.0.1", "GET", "/something", "Whatever User-Agent", "user_id_123").', especially if you're acquainted with DCG:s. Then you could, for example, write a rule that defines a relation between rows where a user-agent and IP does GET /log_out and shortly after has activity with another user ID, and query out people that could be suspected to use several accounts.
And, of course the relational model of data is based on first-order logic, so one could say that SQL is a declarative logic programming language for data.
> Malloy is an experimental language for describing data relationships and transformations.
https://www.linkedin.com/posts/medriscoll_big-news-in-the-da...
Also for those playing along at home - a few other related tools for “doing more with queries”.
- AtScale - a semantic layer not dissimilar to LookML but with a good engine to optimize pre building the aggregates and routing queries among sql engines for perf.
- SDF - a team that left Meta to make a commercial offering for a sql parser and related tools. Say to help make dbt better.
(No affiliation other than having used / been involved with / know some of these people at work)
Typically program runs efficiently, but when optimization is needed - you can do it by breaking up the predicate.
It should be easy enough if you're somewhat fluent in both languages, and has the perk of not being some Python thing at a megacorp famous for killing its projects.