PartiQL> SELECT * FROM [1,2,3]
|
==='
<<
{
'_1': 1
},
{
'_1': 2
},
{
'_1': 3
}
>>
---
OK! (86 ms)
Jeez. 86ms for this query on this data set? Hope that's not representative of the general performance!In the era of 64-core processors, scaling horizontally is meaningless for 99.9% of architecture designs. Latency matters to everyone, always.
Trivial queries taking nearly 1/10th of a second on modern kit is absolutely atrocious, and shows a total lack of awareness of performance as a feature.
The time reported by the REPL can be misleading.
There is work that we definitely need to do on performance as we develop PartiQL. Performance is something we have been considering since inception and we will keep considering as we go forward.
I used to do a lot of BigQuery for analytics. Latency in BigQuery is crap and clearly not it's selling point, we're not talking ms here, we're talking seconds at a minimum. Yet it's a really nice database for it's use cases.
It's an early reference implementation and demo program to show off the language syntax, it doesn't have much to do with whatever engine actually executes the query and that will be the majority of any real query's timing.
Regardless, even the query parsing and compilation should be much faster if it moves into lower-level language like C++ or Rust.
A database is so much more than just a schema and validator, but this is being advertised as a database replacement.
And, I want to stress, that this doesn't mean I don't think it isn't useful. I bet there are lots of times where you want to enforce some sort of structure on a bunch of folders with files in them. That's not a database though.
I have not used TreeBase for anything other than collaborative knowledge bases. Haven't even thought much beyond that. Thanks for letting me know that wasn't clear.
ZetaSQL is a custom SQL dialect, along with parser and analyzer, that Google uses for products like BigQuery and Spanner.
PartiQL is a new query language extended from SQL to work with various non-relational data sources and schemaless data formats in a more natural and idiomatic way.
PG is working on adding SQL/JSON support for JSON Path queries for the next version. It'll be a major improvement but still not as nice as what PartiQL has here.
https://docs.oracle.com/en/database/oracle/oracle-database/1...
Not sure it's comparable to something like Amazon, that has probably dedicated funding.
GraphQL has some similarities in handling complex queries across multiple data sources, but also has lots of functionality and large ecosystem around offering it as a public API to clients.
This appears to be for known queries. Unless it is designed for arbitrary queries, DoS is a likely problem.
I'm interested in adopting PartiQL for our product, but would we get to participate in the evolution of the language or would we purely be downstream of the decisions made to benefit AWS products and services?
At this point, the maintainers/committee is only Amazon members. As PartiQL grows towards a diverse community, we expect to add maintainers/committee (for code and spec) that have non-Amazon affiliations and explore more formalized methods of governance,as they will emerge from our community discussions.
Please email us at partiql-committee@amazon.com to further coordinate.
The same question was raised on Twitter, and I put my thoughts there: https://twitter.com/_msw_/status/1157405984823758848
TL;DR, my advice is that successful open source projects and open specifications usually have diverse communities. You will have a hard time attracting people to your community of they do not share goals with the rest of the community. We should have some bounding boxes around how the spec evolves through clear tenets. Otherwise welcome diverse opinions, experience, and problems to solve collaboratively.
Here is an example of integrating PartiQL with CSV files. https://github.com/partiql/partiql-lang-kotlin/blob/master/e.... Integrating with Parquet would of course be more complex then that.
Now, in practice, perhaps with sufficient adoption and integration, PartiQL might be good enough for 80% of use cases.
That is the most important thing for my uses. I deal mostly in time series data, SQL windowing queries are too slow. Turning the set into an array to allow indexing and support easy time series queries is enough for me the use it.
Below are the reasons given in the blog post and I am trying to compare them with Hive SQL + Spark
SQL compatibility - I need to check this as I am not a SQL expert, but Hive SQL seems compatible
First-class nested data - supported
Optional schema and query stability - supported
Minimal extensions - feels same goals in Hive SQL
Format independence - yes
Data store independence - yes.
Most customers running on Amazon (or any cloud) want to move from having to maintain their own databases (which takes a lot of effort) to paying someone else do it. Amazon knows this.
This move looks like Amazon has everything to win and every other vendor has everything to lose. Even if they say the opposite (you can switch from Amazon to your own) - they know that extremely few customers have the will to operationalize their own databases. So they know that only the opposite will happen - customers will switch from self hosted to Amazon services. They have also been openly predatorial towards other open source databases (e.g. aws elasticsearch and mongo). No wonder all Amazon services already support this.
In that context, who is the target audience and what is the deployment model here? Are vendors going to integrate this directly into their databases? Or users have to run their own proxy instances? Or is it compiled into the application as a library?
This may be powerful and useful, but it is proprietary, nontransparent, unstandardized, and nonportable.
I get that every database has some platform lock-in, but its getting ridiculous. At least amazon's relational offerings need to adhere to binary driver protocols.
And that right there is where they lost me. Nooo thank you.