I think this database is very interesting even if you don't care about the time saving part of it, since it claims to be a hybrid (OLAP and OLTP), it implements postgres' wire protocol and it claims to compile queries to machine code using LLVM [1].
[1]: https://www.youtube.com/watch?v=mzMnyYdO8jk (slideshow: http://www.cs.cmu.edu/~pavlo/slides/selfdriving-nov2016.pdf)
He is on a list of people of mine that fits on 10 fingers. James Mickens is in there.
His work on H-Store was great. I spent 6 years working on VoltDB which is a commercial spinoff of H-store and it was a formative experience for me.
BTW, here are the video lectures to the graduate database course he mentioned in the presentation, where students were developing features for Peloton as part of the course (they're great IMO):
https://www.youtube.com/watch?v=MyQzjba1beA&list=PLSE8ODhjZX...
[1] https://www-ssl.intel.com/content/www/us/en/architecture-and...
How much advantage does this give you? Is there really so many steps in the execution plan (the visible steps are usually < 50) but what about the internal actual compiled steps? Unless this is allowing merging and further simplification steps identifying redundant operation that gets trimmed of not sure where 100x performance improvement comes from.
Though I remember seeing the scala based in-memory query engine that was sort of doing simplification of the actual steps and doing very well in benchmark, maybe this is similar.
This is a promising area of ongoing research. If you are curious about this kind of autonomous tuning of storage layout, you might want to check this out [1].
[1] https://www.cs.cmu.edu/~jarulraj/papers/2016.tile.sigmod.pdf
so from my understanding - the learning part isn't frequently used and caching, it's (attempting to be) generalized workload learning, the part of understanding that every DBA should do but usually doesnt.
If that is successfully and is even marginally able to predict workload skews, then the scheduling of operations can be significantly more efficient -- you're essentially reducing entropy in your database massively.
All db-based apps end fast the need requeriments for transactional code and move into "infinity-reporting-requests".
For certain ERP I work on in the past, it have at least 300 reports in the base package. Most request was for more reports specialized for each customers. And additions to the transactional code was in part driven by the need to add more data for the reports!
So, I think have both styles is exactly what "everyone" want. Even folks that get stuck with NOSQL databases.
---
I have thinking very much about this, I consider the ideal architecture is a relational-db with decoupled modules that work like this:
Write:
Commands -> WAL -> WaLProcessorAndRejector -> EventLog -> EventLogDispatchToOneOrMoreOf:
- Nothing. EventLog just is history - Caches - Relational Tables for up-to-date view on data - Columnar/Index for speed up part of the reports
Read:
ReadRequest -> ReadDispatchToOneOf:
- EventLog - Caches - Relational Tables - Columnar/Index
The need to be modular is that what is need can change by need.
[1] https://www.cs.cmu.edu/~jarulraj/papers/2016.tile.sigmod.pdf
commit 35823950d500314811212282bd68c101e34b9a06
Author: jarulraj <jarulraj@cs.cmu.edu>
Date: Thu Dec 18 16:41:48 2014 -0500
Take a look at the different graphs on GitHub, like code frequency, to get a better idea: https://github.com/cmu-db/peloton/graphs/code-frequency