PartiQL: One query language for all your data (opens in new tab)

(aws.amazon.com)

241 pointsportmanteaufu6y ago84 comments

84 comments

67 comments · 20 top-level

lwansbrough6y ago· 10 in thread

    PartiQL> SELECT * FROM [1,2,3]
       | 
    ===' 
    <<
      {
        '_1': 1
      },
      {
        '_1': 2
      },
      {
        '_1': 3
      }
    >>
    --- 
    OK! (86 ms)

Jeez. 86ms for this query on this data set? Hope that's not representative of the general performance!

ignoramous6y ago

Other queries involving joins, aggregates, unrolls, and pivots on schemaless, nested, multi-path documents performed way better than the example cherry-picked from the main blog post-- Queries completing between 5ms to 25ms, albeit on toy dataset.

https://partiql.org/tutorial.html

justicezyx6y ago

You missed the point. The modern infrastructure's primary value is scalability. This number is of course bad, for the data. But this number will be more impressive when the data is million time bigger.

jiggawatts6y ago

You missed the point. Modern infrastructure's scalability is irrelevant if even one user's experience is poor.

In the era of 64-core processors, scaling horizontally is meaningless for 99.9% of architecture designs. Latency matters to everyone, always.

Trivial queries taking nearly 1/10th of a second on modern kit is absolutely atrocious, and shows a total lack of awareness of performance as a feature.

therapon6y ago

It looks like the first query run in the REPL takes more time (startup cost). Subsequent runs of that query return around 5ms. Even a query like `SELECT * FROM 1` when executed immediately after starting the REPL will take longer than usual. Even running non-query expression, e.g., `1 + 1` immediately after starting the REPL will take more time to complete than usual.

The time reported by the REPL can be misleading.

There is work that we definitely need to do on performance as we develop PartiQL. Performance is something we have been considering since inception and we will keep considering as we go forward.

justicezyx6y ago

I mean modern infrastructure's user experience is very poor compared to decades ago's PC based software...

monsieurbanana6y ago

> Latency matters to everyone, always.

I used to do a lot of BigQuery for analytics. Latency in BigQuery is crap and clearly not it's selling point, we're not talking ms here, we're talking seconds at a minimum. Yet it's a really nice database for it's use cases.

1 more reply

dlurton6y ago

This is on the JVM so the JIT's optimizations probably haven't kicked in yet.

banachtarski6y ago

Who are we kidding? Even without JIT optimizations that seems absurdly slow.

dlurton6y ago

Also, the PartiQL compiler makes heavy use of closures, each of which becomes a class, so the first time a query executes the JVM has to load a few dozen classes--this probably explains the 86ms more than a lack of JIT optimizations alone.

1 more reply

manigandham6y ago

Did you just copy/paste from the tutorial?

It's an early reference implementation and demo program to show off the language syntax, it doesn't have much to do with whatever engine actually executes the query and that will be the majority of any real query's timing.

Regardless, even the query parsing and compilation should be much faster if it moves into lower-level language like C++ or Rust.

breck6y ago· 7 in thread

This is neat. Anyone want to add support for TreeBase/Tree Notation? http://treenotation.org/treeBase/. It's currently on the backburner to query TreeBases in SQL without having first to convert the TreeBase to sql. Seems like it would be relatively straightforward to use this to do that.

topicseed6y ago

Most of your recent comments link to your project's website. Please stop with the abusive promotion.

krapht6y ago

I have real problems with TreeNotation advertising itself as "a software-less database system". It's basically abusing your filesystem to store a tree data structure and using git to handle concurrent updates. That's great, except for the part where they ask "does it scale"; and the answer is yes, but should be no.

A database is so much more than just a schema and validator, but this is being advertised as a database replacement.

And, I want to stress, that this doesn't mean I don't think it isn't useful. I bet there are lots of times where you want to enforce some sort of structure on a bunch of folders with files in them. That's not a database though.

breck6y ago

Thanks for the feedback! I did not expect that confusion. I just made an update (and will push shortly) to be more explicit that it scales for collaborative knowledge bases. But I'm not talking about something like real time transactional DBs, etc.

I have not used TreeBase for anything other than collaborative knowledge bases. Haven't even thought much beyond that. Thanks for letting me know that wasn't clear.

1 more reply

shkkmo6y ago

I don't find it abusive, the comments seem relevant and it's a open source library.

preommr6y ago

FWIW this guy does actually spam his project a lot. I recognize him from reddit where people have also complained about his constant promotion of this project.

1 more reply

breck6y ago

It's extremely relevant to the OP.

majewsky6y ago

Still, it is always appropriate to add a disclosure that you're affiliated with the project you're linking to.

1 more reply

zellyn6y ago· 5 in thread

Anyone know how this compares to Presto and zetasql?

manigandham6y ago

Presto is a distributed query engine that can run queries across different datasources. It accepts a basic ANSI SQL syntax.

ZetaSQL is a custom SQL dialect, along with parser and analyzer, that Google uses for products like BigQuery and Spanner.

PartiQL is a new query language extended from SQL to work with various non-relational data sources and schemaless data formats in a more natural and idiomatic way.

dlurton6y ago

One big difference is native support for nested data that's built right into the syntax of the language. Most other SQL implementations allow support for nested data through functions which have non-intuitive syntax.

cmollis6y ago

We generally build views to unnest the arrays, maps, and structs and query from them (or build other tables from the views in hive) but something like this is certainly a bit easier

cwyers6y ago

I also am trying to figure out how this stacks up against Apache Drill.

manigandham6y ago

It's just a language, not a query engine. You can add PartiQL to Drill and Presto so that they can support a richer querying syntax over the unstructured/schemaless data sources they handle.

manigandham6y ago· 5 in thread

This is pretty nice. If only because using a SQL dotted syntax seamlessly with JSON data.

smt886y ago

Postres has had this for years. It's arrows instead of dots, but that's the only visual difference.

manigandham6y ago

It's not the same at all, and it gets much more verbose with minor complexity and lacks functionality.

PG is working on adding SQL/JSON support for JSON Path queries for the next version. It'll be a major improvement but still not as nice as what PartiQL has here.

dangoor6y ago

The SQL standard includes JSON support: https://modern-sql.com/blog/2017-06/whats-new-in-sql-2016

manigandham6y ago

The SQL standard is different from what databases actually support. No RDBMS supports this yet. Only Elasticsearch, Couchbase, and managed services like Rockset and CosmosDB.

sixbrx6y ago

Oracle 12.2+ reportedly supports the ISO SQL/JSON though I haven't tried it. Which btw looks like mostly just renames of Postgres functions.

https://docs.oracle.com/en/database/oracle/oracle-database/1...

agentultra6y ago· 5 in thread

Interesting that they opted for a relational rather than a categorical one; the latter is proving to be more flexible [0].

[0] https://www.categoricaldata.net/

cwyers6y ago

Interesting that they opted for the single-most popular query language on the planet, versus somebody's hobby project? Why is that interesting?

frenchman996y ago

Looks like this package is maintained by a single person: https://github.com/CategoricalData/CQL

Not sure it's comparable to something like Amazon, that has probably dedicated funding.

cheez6y ago

How is it proving to be more flexible?

wisnesky6y ago

I'd be happy to showcase our recent progress. But let's connect offline, so as not to hijack the conversation. Feel free to drop me a line at ryan@conexus.ai.

breck6y ago

Interesting project. Thanks for sharing.

k__6y ago· 4 in thread

Is this a GraphQL alternative or more for accessing DBs in the backend?

manigandham6y ago

It's a querying language like SQL, but designed to handle more unstructured and complex data models natively. You wouldn't want to expose this publically for the same reasons you wouldn't expose a SQL interface directly to your database.

GraphQL has some similarities in handling complex queries across multiple data sources, but also has lots of functionality and large ecosystem around offering it as a public API to clients.

manojlds6y ago

This is not a GraphQL alternative. APIs that expose GraphQL can potentially use this in the backend to fetch the data, however.

girvo6y ago

Correct me if I'm wrong, but I believe the answer is "both", at least as far as I've read.

TheDong6y ago

GraphQL is intended for public APIs which are interacted with by arbitrary, possibly malicious, queries.

This appears to be for known queries. Unless it is designed for arbitrary queries, DoS is a likely problem.

kodablah6y ago· 3 in thread

Is there a specification in anything besides PDF easily available to link to?

yannisGP6y ago

(I'm a member of the PartiQL team.) The language spec source will be open-sourced, as well, early next week (week of Aug 5). As I said above, to @ahl: Overall, we look forward to a community effort and participants that are interested in making significant investments to achieve the project's goals. We invite diverse opinions and viewpoints. As PartiQL grows towards a diverse community, we expect to add maintainers (for code and spec) that have non-Amazon affiliations and explore more formalized methods of governance.

throwawayoo6y ago

here you go. https://partiql.org/assets/PartiQL-Specification.pdf

majewsky6y ago

OP asked for anything besides PDF.

ahl6y ago· 2 in thread

@dlurton since you seem to be speaking for the PartiQL team on this (congrats on the launch!): The reference implementation is open source; what's the plan for the language spec? Is that something that AWS is going to own and control? The website references the PartiQL Steering Committee -- is that just AWS folks or is the intention to make it more broadly composed of members of the community you build?

I'm interested in adopting PartiQL for our product, but would we get to participate in the evolution of the language or would we purely be downstream of the decisions made to benefit AWS products and services?

yannisGP6y ago

hi @ahl, I'm a member of PartiQL's steering committee and glad to see your interest to participate in PartiQL's evolution. The language spec source will be open-sourced, as well, early next week (week of Aug 5). Overall, we look forward to a community effort and participants that are interested in making significant investments to achieve the project's goals. Diverse opinions and viewpoints, both on the language and on the process, are very welcome.

At this point, the maintainers/committee is only Amazon members. As PartiQL grows towards a diverse community, we expect to add maintainers/committee (for code and spec) that have non-Amazon affiliations and explore more formalized methods of governance,as they will emerge from our community discussions.

Please email us at partiql-committee@amazon.com to further coordinate.

_msw_6y ago

Disclosure: I work for AWS and provided some opinions and non-prescriptive advice to the team behind PartiQL about Open Source.

The same question was raised on Twitter, and I put my thoughts there: https://twitter.com/_msw_/status/1157405984823758848

TL;DR, my advice is that successful open source projects and open specifications usually have diverse communities. You will have a hard time attracting people to your community of they do not share goals with the rest of the community. We should have some bounding boxes around how the spec evolves through clear tenets. Otherwise welcome diverse opinions, experience, and problems to solve collaboratively.

ohnoesjmr6y ago· 2 in thread

I wonder how this deals with nested parquet data, and whether it's able to optimise on the things parquet provides.

dlurton6y ago

It would be possible to integrate parquet data with PartiQL.

Here is an example of integrating PartiQL with CSV files. https://github.com/partiql/partiql-lang-kotlin/blob/master/e.... Integrating with Parquet would of course be more complex then that.

yannisGP6y ago

(PartiQL team member) AWS Redshift Spectrum supports PartiQL on parquet since last year. Except that the language had not had a name yet and was referred to as "SQL extensions for nested data.

xpe6y ago· 1 in thread

A common query language, while appealing, is unlikely to fully abstract over different types of databases with different features and performance trade offs. It will be a leaky abstraction.

Now, in practice, perhaps with sufficient adoption and integration, PartiQL might be good enough for 80% of use cases.

yannisGP6y ago

(I'm part of the PartiQL effort.) You are right about the challenge you point out and we are realistic about it. Thus this line in the charter: {{{ While the adopting query engines generally may not support all features of PartiQL, a database engine that “supports PartiQL” is expected to be consistent with the PartiQL specification in the syntax subset it supports. }}}

jnordwick6y ago· 1 in thread

'SQL’s ORDER BY orders the output data. Similarly, the PartiQL ORDER BY is responsible for turning its input bag into an array.'

That is the most important thing for my uses. I deal mostly in time series data, SQL windowing queries are too slow. Turning the set into an array to allow indexing and support easy time series queries is enough for me the use it.

dlurton6y ago

ORDER BY is still in the works: https://github.com/partiql/partiql-lang-kotlin/issues/47

1 more reply

pushingice6y ago· 1 in thread

Will this be integrated into AWS Athena? The blog post doesn't mention it.

manigandham6y ago

AWS Athena is basically managed Presto so AWS will have to modify Presto to support it. They might, and hopefully upstream the changes.

mehh6y ago· 1 in thread

So I assume this is a rebranding of some other open source project with the amazon brand stuck on it, or is it actually something distinct?

danso6y ago

The posted article says it was designed and built in house, where it is currently dogfooded, and the specification doc is dated today (2019-08-01):

https://partiql.org/assets/PartiQL-Specification.pdf

manojlds6y ago

What does this offer over Hive SQL and Spark also supports it?

Below are the reasons given in the blog post and I am trying to compare them with Hive SQL + Spark

SQL compatibility - I need to check this as I am not a SQL expert, but Hive SQL seems compatible

First-class nested data - supported

Optional schema and query stability - supported

Minimal extensions - feels same goals in Hive SQL

Format independence - yes

Data store independence - yes.

rdsubhas6y ago

There is one word that every vendor hates: "vendor agnostic". Minor differences in SQL dialects are not a bug, they are features for most vendors.

Most customers running on Amazon (or any cloud) want to move from having to maintain their own databases (which takes a lot of effort) to paying someone else do it. Amazon knows this.

This move looks like Amazon has everything to win and every other vendor has everything to lose. Even if they say the opposite (you can switch from Amazon to your own) - they know that extremely few customers have the will to operationalize their own databases. So they know that only the opposite will happen - customers will switch from self hosted to Amazon services. They have also been openly predatorial towards other open source databases (e.g. aws elasticsearch and mongo). No wonder all Amazon services already support this.

In that context, who is the target audience and what is the deployment model here? Are vendors going to integrate this directly into their databases? Or users have to run their own proxy instances? Or is it compiled into the application as a library?

AtlasBarfed6y ago

AWS is all-in on data lock-in.

This may be powerful and useful, but it is proprietary, nontransparent, unstandardized, and nonportable.

I get that every database has some platform lock-in, but its getting ridiculous. At least amazon's relational offerings need to adhere to binary driver protocols.

pawelduda6y ago

Love the codebase, I never wrote any Kotlin (and very little Java) and was able to (hopefully) complete a good first issue very quickly.

whoevercares6y ago

Awesome! I’d be very interested to see when DynamoDB support this language and a MongoDB like query builder. Then I might sell all my MDB shares...

unnouinceput6y ago

Quote: "PartiQL requires the Java Runtime (JVM) to be installed on your machine."

And that right there is where they lost me. Nooo thank you.

benburleson6y ago

https://xkcd.com/927/

j / k navigate · click thread line to collapse

84 comments

67 comments · 20 top-level

lwansbrough6y ago· 10 in thread

    PartiQL> SELECT * FROM [1,2,3]
       | 
    ===' 
    <<
      {
        '_1': 1
      },
      {
        '_1': 2
      },
      {
        '_1': 3
      }
    >>
    --- 
    OK! (86 ms)

Jeez. 86ms for this query on this data set? Hope that's not representative of the general performance!

ignoramous6y ago

https://partiql.org/tutorial.html

justicezyx6y ago

jiggawatts6y ago

You missed the point. Modern infrastructure's scalability is irrelevant if even one user's experience is poor.

In the era of 64-core processors, scaling horizontally is meaningless for 99.9% of architecture designs. Latency matters to everyone, always.

Trivial queries taking nearly 1/10th of a second on modern kit is absolutely atrocious, and shows a total lack of awareness of performance as a feature.

therapon6y ago

The time reported by the REPL can be misleading.

There is work that we definitely need to do on performance as we develop PartiQL. Performance is something we have been considering since inception and we will keep considering as we go forward.

justicezyx6y ago

I mean modern infrastructure's user experience is very poor compared to decades ago's PC based software...

monsieurbanana6y ago

> Latency matters to everyone, always.

1 more reply

dlurton6y ago

This is on the JVM so the JIT's optimizations probably haven't kicked in yet.

banachtarski6y ago

Who are we kidding? Even without JIT optimizations that seems absurdly slow.

dlurton6y ago

1 more reply

manigandham6y ago

Did you just copy/paste from the tutorial?

Regardless, even the query parsing and compilation should be much faster if it moves into lower-level language like C++ or Rust.

breck6y ago· 7 in thread

topicseed6y ago

Most of your recent comments link to your project's website. Please stop with the abusive promotion.

krapht6y ago

A database is so much more than just a schema and validator, but this is being advertised as a database replacement.

breck6y ago

I have not used TreeBase for anything other than collaborative knowledge bases. Haven't even thought much beyond that. Thanks for letting me know that wasn't clear.

1 more reply

shkkmo6y ago

I don't find it abusive, the comments seem relevant and it's a open source library.

preommr6y ago

FWIW this guy does actually spam his project a lot. I recognize him from reddit where people have also complained about his constant promotion of this project.

1 more reply

breck6y ago

It's extremely relevant to the OP.

majewsky6y ago

Still, it is always appropriate to add a disclosure that you're affiliated with the project you're linking to.

1 more reply

zellyn6y ago· 5 in thread

Anyone know how this compares to Presto and zetasql?

manigandham6y ago

Presto is a distributed query engine that can run queries across different datasources. It accepts a basic ANSI SQL syntax.

ZetaSQL is a custom SQL dialect, along with parser and analyzer, that Google uses for products like BigQuery and Spanner.

PartiQL is a new query language extended from SQL to work with various non-relational data sources and schemaless data formats in a more natural and idiomatic way.

dlurton6y ago

cmollis6y ago

We generally build views to unnest the arrays, maps, and structs and query from them (or build other tables from the views in hive) but something like this is certainly a bit easier

cwyers6y ago

I also am trying to figure out how this stacks up against Apache Drill.

manigandham6y ago

It's just a language, not a query engine. You can add PartiQL to Drill and Presto so that they can support a richer querying syntax over the unstructured/schemaless data sources they handle.

manigandham6y ago· 5 in thread

This is pretty nice. If only because using a SQL dotted syntax seamlessly with JSON data.

smt886y ago

Postres has had this for years. It's arrows instead of dots, but that's the only visual difference.

manigandham6y ago

It's not the same at all, and it gets much more verbose with minor complexity and lacks functionality.

PG is working on adding SQL/JSON support for JSON Path queries for the next version. It'll be a major improvement but still not as nice as what PartiQL has here.

dangoor6y ago

The SQL standard includes JSON support: https://modern-sql.com/blog/2017-06/whats-new-in-sql-2016

manigandham6y ago

The SQL standard is different from what databases actually support. No RDBMS supports this yet. Only Elasticsearch, Couchbase, and managed services like Rockset and CosmosDB.

sixbrx6y ago

Oracle 12.2+ reportedly supports the ISO SQL/JSON though I haven't tried it. Which btw looks like mostly just renames of Postgres functions.

https://docs.oracle.com/en/database/oracle/oracle-database/1...

agentultra6y ago· 5 in thread

Interesting that they opted for a relational rather than a categorical one; the latter is proving to be more flexible [0].

[0] https://www.categoricaldata.net/

cwyers6y ago

Interesting that they opted for the single-most popular query language on the planet, versus somebody's hobby project? Why is that interesting?

frenchman996y ago

Looks like this package is maintained by a single person: https://github.com/CategoricalData/CQL

Not sure it's comparable to something like Amazon, that has probably dedicated funding.

cheez6y ago

How is it proving to be more flexible?

wisnesky6y ago

I'd be happy to showcase our recent progress. But let's connect offline, so as not to hijack the conversation. Feel free to drop me a line at ryan@conexus.ai.

breck6y ago

Interesting project. Thanks for sharing.

k__6y ago· 4 in thread

Is this a GraphQL alternative or more for accessing DBs in the backend?

manigandham6y ago

GraphQL has some similarities in handling complex queries across multiple data sources, but also has lots of functionality and large ecosystem around offering it as a public API to clients.

manojlds6y ago

This is not a GraphQL alternative. APIs that expose GraphQL can potentially use this in the backend to fetch the data, however.

girvo6y ago

Correct me if I'm wrong, but I believe the answer is "both", at least as far as I've read.

TheDong6y ago

GraphQL is intended for public APIs which are interacted with by arbitrary, possibly malicious, queries.

This appears to be for known queries. Unless it is designed for arbitrary queries, DoS is a likely problem.

kodablah6y ago· 3 in thread

Is there a specification in anything besides PDF easily available to link to?

yannisGP6y ago

throwawayoo6y ago

here you go. https://partiql.org/assets/PartiQL-Specification.pdf

majewsky6y ago

OP asked for anything besides PDF.

ahl6y ago· 2 in thread

yannisGP6y ago

Please email us at partiql-committee@amazon.com to further coordinate.

_msw_6y ago

Disclosure: I work for AWS and provided some opinions and non-prescriptive advice to the team behind PartiQL about Open Source.

The same question was raised on Twitter, and I put my thoughts there: https://twitter.com/_msw_/status/1157405984823758848

ohnoesjmr6y ago· 2 in thread

I wonder how this deals with nested parquet data, and whether it's able to optimise on the things parquet provides.

dlurton6y ago

It would be possible to integrate parquet data with PartiQL.

Here is an example of integrating PartiQL with CSV files. https://github.com/partiql/partiql-lang-kotlin/blob/master/e.... Integrating with Parquet would of course be more complex then that.

yannisGP6y ago

(PartiQL team member) AWS Redshift Spectrum supports PartiQL on parquet since last year. Except that the language had not had a name yet and was referred to as "SQL extensions for nested data.

xpe6y ago· 1 in thread

A common query language, while appealing, is unlikely to fully abstract over different types of databases with different features and performance trade offs. It will be a leaky abstraction.

Now, in practice, perhaps with sufficient adoption and integration, PartiQL might be good enough for 80% of use cases.

yannisGP6y ago

jnordwick6y ago· 1 in thread

'SQL’s ORDER BY orders the output data. Similarly, the PartiQL ORDER BY is responsible for turning its input bag into an array.'

dlurton6y ago

ORDER BY is still in the works: https://github.com/partiql/partiql-lang-kotlin/issues/47

1 more reply

pushingice6y ago· 1 in thread

Will this be integrated into AWS Athena? The blog post doesn't mention it.

manigandham6y ago

AWS Athena is basically managed Presto so AWS will have to modify Presto to support it. They might, and hopefully upstream the changes.

mehh6y ago· 1 in thread

So I assume this is a rebranding of some other open source project with the amazon brand stuck on it, or is it actually something distinct?

danso6y ago

The posted article says it was designed and built in house, where it is currently dogfooded, and the specification doc is dated today (2019-08-01):

https://partiql.org/assets/PartiQL-Specification.pdf

manojlds6y ago

What does this offer over Hive SQL and Spark also supports it?

Below are the reasons given in the blog post and I am trying to compare them with Hive SQL + Spark

SQL compatibility - I need to check this as I am not a SQL expert, but Hive SQL seems compatible

First-class nested data - supported

Optional schema and query stability - supported

Minimal extensions - feels same goals in Hive SQL

Format independence - yes

Data store independence - yes.

rdsubhas6y ago

There is one word that every vendor hates: "vendor agnostic". Minor differences in SQL dialects are not a bug, they are features for most vendors.

Most customers running on Amazon (or any cloud) want to move from having to maintain their own databases (which takes a lot of effort) to paying someone else do it. Amazon knows this.

AtlasBarfed6y ago

AWS is all-in on data lock-in.

This may be powerful and useful, but it is proprietary, nontransparent, unstandardized, and nonportable.

I get that every database has some platform lock-in, but its getting ridiculous. At least amazon's relational offerings need to adhere to binary driver protocols.

pawelduda6y ago

Love the codebase, I never wrote any Kotlin (and very little Java) and was able to (hopefully) complete a good first issue very quickly.

whoevercares6y ago

Awesome! I’d be very interested to see when DynamoDB support this language and a MongoDB like query builder. Then I might sell all my MDB shares...

unnouinceput6y ago

Quote: "PartiQL requires the Java Runtime (JVM) to be installed on your machine."

And that right there is where they lost me. Nooo thank you.

benburleson6y ago

https://xkcd.com/927/

j / k navigate · click thread line to collapse