Peloton – a relational database designed for autonomous operation (opens in new tab)

(pelotondb.io)

155 pointszippoxer9y ago39 comments

39 comments

35 comments · 7 top-level

leftnode9y ago· 8 in thread

How old is this project? I wouldn't be surprised to see a cease and desist from the maker of the exercise bike.

Peloton is in fact the French word for platoon. I'd be highly surprised if the bike maker had the legal standing for issue infringement claims. Just as you can't copyright the word "bicycle", peloton is used widely enough that they should be fine. Then again, Jade the preprocessor was forced to rebrand as Pug.

rasjani9y ago

Peloton is also Finnish word and it means 'fearless'

kod9y ago

Copyright has nothing to do with this. You certainly can register bicycle as a trademark, bicycle brand playing cards, for instance.

lumista9y ago

Peloton is also Finnish word and means fearless.

bjterry9y ago

Peleton is a word referring to the main group in a bicycle endurance race, so it's not the same as calling your software "Wal-Mart." In this case it seems like the fact they are in completely different markets would be sufficient.

daenney9y ago

The first commit was:

  commit 35823950d500314811212282bd68c101e34b9a06
  Author: jarulraj <jarulraj@cs.cmu.edu>
  Date:   Thu Dec 18 16:41:48 2014 -0500

Take a look at the different graphs on GitHub, like code frequency, to get a better idea: https://github.com/cmu-db/peloton/graphs/code-frequency

zirok9y ago

Peloton also means fearless in finnish, so the name could be based on that.

rosser9y ago

That's not how trademarks work.

gigatexal9y ago· 6 in thread

This sure has a lot to live up to: trying to do two thing and do them Well isn't very unix-y. There's a reason relational database are set up to have oltp schemas (highly notmalized tables for supporting transactions etc.) and olap schemas (star schemas for example, large sometimes flat fact and dimension tables etc.). Also I'm not sure about the learning part: any decent database these days will cache frequently used data and tables can be built as in-memory ones.

aerioux9y ago

> addressing your caching point

so from my understanding - the learning part isn't frequently used and caching, it's (attempting to be) generalized workload learning, the part of understanding that every DBA should do but usually doesnt.

If that is successfully and is even marginally able to predict workload skews, then the scheduling of operations can be significantly more efficient -- you're essentially reducing entropy in your database massively.

gigatexal9y ago

Any team of database admins/engineers worth their salary plans for capacity, fixes inefficient queries, And works with development on future goals for what they want out of the database layer.

1 more reply

mamcx9y ago

Is very rare to have a DB that not need both oltp/olap workloads.

All db-based apps end fast the need requeriments for transactional code and move into "infinity-reporting-requests".

For certain ERP I work on in the past, it have at least 300 reports in the base package. Most request was for more reports specialized for each customers. And additions to the transactional code was in part driven by the need to add more data for the reports!

So, I think have both styles is exactly what "everyone" want. Even folks that get stuck with NOSQL databases.

---

I have thinking very much about this, I consider the ideal architecture is a relational-db with decoupled modules that work like this:

Write:

Commands -> WAL -> WaLProcessorAndRejector -> EventLog -> EventLogDispatchToOneOrMoreOf:

- Nothing. EventLog just is history - Caches - Relational Tables for up-to-date view on data - Columnar/Index for speed up part of the reports

Read:

ReadRequest -> ReadDispatchToOneOf:

- EventLog - Caches - Relational Tables - Columnar/Index

The need to be modular is that what is need can change by need.

jarulraj9y ago

That's correct! This is the reason why we support both OLTP and OLAP workloads in Peloton.

gigatexal9y ago

We do just fine with a data warehouse and a bunch of traditional OLTP databases.

jarulraj9y ago

We certainly do :) There happens to be an autonomous mechanism for supporting hybrid workloads (OLTP & OLAP). Peloton supports hybrid storage layouts that are automatically and dynamically adapted over time based on the workload patterns. Row and columnar storage types are special cases of hybrid storage layouts. This is a promising area of ongoing research. If you are curious about this kind of autonomous tuning of storage layout, you might want to check this out [1].

[1] https://www.cs.cmu.edu/~jarulraj/papers/2016.tile.sigmod.pdf

buremba9y ago· 5 in thread

I wonder why they try to support both OLTP and OLAP workloads. Supporting both of these workloads requires too much work (both row and columnar storage types, different algorithms for both storage and querying etc) and they didn't even prove that autonomous systems (which is the main point of the project) can replace the existing databases.

jarulraj9y ago

Great question! There happens to be an autonomous mechanism for supporting hybrid workloads (OLTP & OLAP). Peloton supports hybrid storage layouts that are automatically and dynamically adapted over time based on the workload patterns. Row and columnar storage types are special cases of hybrid storage layouts.

This is a promising area of ongoing research. If you are curious about this kind of autonomous tuning of storage layout, you might want to check this out [1].

[1] https://www.cs.cmu.edu/~jarulraj/papers/2016.tile.sigmod.pdf

allyraza9y ago

I guess it is a trend currently with modern MMDB's (MEMSQL,HyperDb etc) have support for both OLTP & OLAP workloads. You can checkout the git repo give it a spin see if it hold up to the claims.

manigandham9y ago

http://www.memsql.com/ does this today. Fast, distributed, rowstore + columnstore, relational database with mysql protocol.

buremba9y ago

However Peloton also aims to be an autonomous system. That's a lot for undergrad and grad students so I'm not sure if he wants Peloton to be stable in a near future.

aerioux9y ago

Also fits in the niche between people who want both possibilities - though the onus is on the authors to show that it actually is just as good

zippoxerOP9y ago· 4 in thread

OP here. Peloton has been posted here before, but didn't get any attention.

I think this database is very interesting even if you don't care about the time saving part of it, since it claims to be a hybrid (OLAP and OLTP), it implements postgres' wire protocol and it claims to compile queries to machine code using LLVM [1].

[1]: https://www.youtube.com/watch?v=mzMnyYdO8jk (slideshow: http://www.cs.cmu.edu/~pavlo/slides/selfdriving-nov2016.pdf)

arielweisberg9y ago

Andy Pavlo always tries to spice things up and his lectures and presentations are a treat.

He is on a list of people of mine that fits on 10 fingers. James Mickens is in there.

His work on H-Store was great. I spent 6 years working on VoltDB which is a commercial spinoff of H-store and it was a formative experience for me.

hokkos9y ago

Strange talk and slides, I am not sure if he is serious or not.

sceadu9y ago

He's joking about the rehab I'm sure...that's his style (I like it personally).

BTW, here are the video lectures to the graduate database course he mentioned in the presentation, where students were developing features for Peloton as part of the course (they're great IMO):

https://www.youtube.com/watch?v=MyQzjba1beA&list=PLSE8ODhjZX...

2 more replies

jarulraj9y ago

Thanks for the shout-out!

Jweb_Guru9y ago· 4 in thread

Side note, but I really dislike the current trend (in in-memory databases, to be clear) of not bothering to include any real provisions for durability and justifying it by saying "NVRAM exists." It effectively doesn't for anyone who need to be able to deploy to off-the-shelf environments, and it's super expensive (and if you're going for performance, like most of the research projects are, countering by using the database in a clustered configuration would be counterproductive). Are there any cloud providers who provide NVRAM in any configuration?

quickben9y ago

But, it provides a dead easy way to publish a research work, claim insane speedups, and not worry about disk journals, caches, in flight data corner cases when VM is snapshoted, etc.

fulafel9y ago

Flash storage is nvram, so yes, hosting companies offer it.

allyraza9y ago

there are many types of NVM's not all of them are available on most hosting providers one of the big player offering ssd on cloud is digitalocean

Jweb_Guru9y ago

Not in the sense that people mean in these papers, and you know it. It doesn't have even close to the same performance characteristics.

inconclusive9y ago· 1 in thread

The idea of write-behind logging is slick.

http://www.cs.cmu.edu/~pavlo/papers/p337-arulraj.pdf

jarulraj9y ago

Thanks :) We believe that non-volatile memory (NVM) will be a game-changer for database management systems [1].

[1] https://www-ssl.intel.com/content/www/us/en/architecture-and...

tmd839y ago

Does anyone know what happens after the query plan is generated in most database? I'm assuming individual step, like index scan, hashjoin are coded already and the plan steps are iterated and respective methods are called? So the execution steps are already compiled but the step traversal is kind of interpreted. With Peloton LLVM engine everything is merged together in a single sequence of machine code?

How much advantage does this give you? Is there really so many steps in the execution plan (the visible steps are usually < 50) but what about the internal actual compiled steps? Unless this is allowing merging and further simplification steps identifying redundant operation that gets trimmed of not sure where 100x performance improvement comes from.

Though I remember seeing the scala based in-memory query engine that was sort of doing simplification of the actual steps and doing very well in benchmark, maybe this is similar.

j / k navigate · click thread line to collapse

39 comments

35 comments · 7 top-level

leftnode9y ago· 8 in thread

How old is this project? I wouldn't be surprised to see a cease and desist from the maker of the exercise bike.

wcarron9y ago

rasjani9y ago

Peloton is also Finnish word and it means 'fearless'

kod9y ago

Copyright has nothing to do with this. You certainly can register bicycle as a trademark, bicycle brand playing cards, for instance.

lumista9y ago

Peloton is also Finnish word and means fearless.

bjterry9y ago

daenney9y ago

The first commit was:

  commit 35823950d500314811212282bd68c101e34b9a06
  Author: jarulraj <jarulraj@cs.cmu.edu>
  Date:   Thu Dec 18 16:41:48 2014 -0500

Take a look at the different graphs on GitHub, like code frequency, to get a better idea: https://github.com/cmu-db/peloton/graphs/code-frequency

zirok9y ago

Peloton also means fearless in finnish, so the name could be based on that.

rosser9y ago

That's not how trademarks work.

gigatexal9y ago· 6 in thread

aerioux9y ago

> addressing your caching point

gigatexal9y ago

Any team of database admins/engineers worth their salary plans for capacity, fixes inefficient queries, And works with development on future goals for what they want out of the database layer.

1 more reply

mamcx9y ago

Is very rare to have a DB that not need both oltp/olap workloads.

All db-based apps end fast the need requeriments for transactional code and move into "infinity-reporting-requests".

So, I think have both styles is exactly what "everyone" want. Even folks that get stuck with NOSQL databases.

---

I have thinking very much about this, I consider the ideal architecture is a relational-db with decoupled modules that work like this:

Write:

Commands -> WAL -> WaLProcessorAndRejector -> EventLog -> EventLogDispatchToOneOrMoreOf:

- Nothing. EventLog just is history - Caches - Relational Tables for up-to-date view on data - Columnar/Index for speed up part of the reports

Read:

ReadRequest -> ReadDispatchToOneOf:

- EventLog - Caches - Relational Tables - Columnar/Index

The need to be modular is that what is need can change by need.

jarulraj9y ago

That's correct! This is the reason why we support both OLTP and OLAP workloads in Peloton.

gigatexal9y ago

We do just fine with a data warehouse and a bunch of traditional OLTP databases.

jarulraj9y ago

[1] https://www.cs.cmu.edu/~jarulraj/papers/2016.tile.sigmod.pdf

buremba9y ago· 5 in thread

jarulraj9y ago

This is a promising area of ongoing research. If you are curious about this kind of autonomous tuning of storage layout, you might want to check this out [1].

[1] https://www.cs.cmu.edu/~jarulraj/papers/2016.tile.sigmod.pdf

allyraza9y ago

I guess it is a trend currently with modern MMDB's (MEMSQL,HyperDb etc) have support for both OLTP & OLAP workloads. You can checkout the git repo give it a spin see if it hold up to the claims.

manigandham9y ago

http://www.memsql.com/ does this today. Fast, distributed, rowstore + columnstore, relational database with mysql protocol.

buremba9y ago

However Peloton also aims to be an autonomous system. That's a lot for undergrad and grad students so I'm not sure if he wants Peloton to be stable in a near future.

aerioux9y ago

Also fits in the niche between people who want both possibilities - though the onus is on the authors to show that it actually is just as good

zippoxerOP9y ago· 4 in thread

OP here. Peloton has been posted here before, but didn't get any attention.

[1]: https://www.youtube.com/watch?v=mzMnyYdO8jk (slideshow: http://www.cs.cmu.edu/~pavlo/slides/selfdriving-nov2016.pdf)

arielweisberg9y ago

Andy Pavlo always tries to spice things up and his lectures and presentations are a treat.

He is on a list of people of mine that fits on 10 fingers. James Mickens is in there.

His work on H-Store was great. I spent 6 years working on VoltDB which is a commercial spinoff of H-store and it was a formative experience for me.

hokkos9y ago

Strange talk and slides, I am not sure if he is serious or not.

sceadu9y ago

He's joking about the rehab I'm sure...that's his style (I like it personally).

BTW, here are the video lectures to the graduate database course he mentioned in the presentation, where students were developing features for Peloton as part of the course (they're great IMO):

https://www.youtube.com/watch?v=MyQzjba1beA&list=PLSE8ODhjZX...

2 more replies

jarulraj9y ago

Thanks for the shout-out!

Jweb_Guru9y ago· 4 in thread

quickben9y ago

But, it provides a dead easy way to publish a research work, claim insane speedups, and not worry about disk journals, caches, in flight data corner cases when VM is snapshoted, etc.

fulafel9y ago

Flash storage is nvram, so yes, hosting companies offer it.

allyraza9y ago

there are many types of NVM's not all of them are available on most hosting providers one of the big player offering ssd on cloud is digitalocean

Jweb_Guru9y ago

Not in the sense that people mean in these papers, and you know it. It doesn't have even close to the same performance characteristics.

inconclusive9y ago· 1 in thread

The idea of write-behind logging is slick.

http://www.cs.cmu.edu/~pavlo/papers/p337-arulraj.pdf

jarulraj9y ago

Thanks :) We believe that non-volatile memory (NVM) will be a game-changer for database management systems [1].

[1] https://www-ssl.intel.com/content/www/us/en/architecture-and...

tmd839y ago

Though I remember seeing the scala based in-memory query engine that was sort of doing simplification of the actual steps and doing very well in benchmark, maybe this is similar.

j / k navigate · click thread line to collapse