Show HN: my database engine for GPU (opens in new tab)

(sourceforge.net)

364 pointsantonmks14y ago60 comments

60 comments

43 comments · 15 top-level

TobbenTM14y ago· 9 in thread

What is the point of using "å" instead of "a"?

Those are two different characters. They are not interchangeable. It looks like the guy's name is Anton, a Swedish name. Thus he will create names with "å"-s in them if he wants to.

Will this cause any problems? It might. That still does not mean that "a" is a proper substitute for "å".

Should he have tried making a name from just English letters? That might have been a good idea. But that is a different recommendation than the one you gave.

"STARGÅTE", on the other hand, is an example of gratuitous replacement of "A" by "Å".

beaumartinez14y ago

In Stargate, it's a nod to Earth's point-of-origin symbol being Å (well, it's not exactly Å, but it's very similar)[1].

[1] http://stargate.wikia.com/wiki/Point_of_origin

snaky14y ago

>It looks like the guy's name is Anton, a Swedish name

This Anton is Russian clearly. http://ru.linkedin.com/pub/anton-k/32/253/695

And Алёнка (Alyonka) is a diminutive form of Russian name Алёна (Alyona) http://ru.wikipedia.org/wiki/Алёнка

Tuna-Fish14y ago

What's the point of using y instead of u?

Å and A are not at all the same letter. Although I agree that using characters outside of the ascii set for naming projects on the internet is probably a bad idea. If I had to translitterate Ålenkå to Latin chars, it would probably be Oolenko, but even that's not really that close. The Latin alphabet has a horribly limited set of vowels.

qw14y ago

The transliteration of Å is usually "AA". In fact the "circle" above the A is really a lower case "a". It's just that as time passed, it "devolved" into a circle.

1 more reply

tricolon14y ago

One benefit: it's very Googleable.

1 more reply

nakkiel14y ago

I wouldn't say it's related in any way to the alphabet in use. Take the English 'a' as an example: skate and car produce two different sounds out of the same letter.

Alphabets transmit words, much less so sounds.

5 more replies

tzury14y ago

Google will handle it for you my friend.

http://www.google.co.il/search?q=alenka%20gpu

orcadk14y ago

But only if you know to append "gpu" to the search. Just searching for 'Alenka' doesn't yield useful results (yet): http://www.google.dk/search?sourceid=chrome&ie=UTF-8&...

mrich14y ago· 6 in thread

Congratulations, well done!

Our team has built an in-memory DB (hybrid, both column and row store available, also using advanced compression). We have investigated GPUs for scans, we found that while they can outperform CPUs, the bottleneck is the transfer of the data to the card. Since the card's memory is only in the GB's, and our databases are typically much larger this is kind of limiting, since you have to copy data back and forth.

Do you know what the maximum amount of GPU memory for a blade-like system currently is?

antonmksOP14y ago

It is 6 GB for Tesla GPU. PCI Express transfer speed is 5GB/s, with average compression rate averaging 4/1 the speed becomes 20GB/s so transfer speed is not a bottleneck anymore.

mrich14y ago

I see. So for a large dataset you would still hit a bottleneck, compared to a main mem store which also uses compression. But nonetheless it's a good idea to do certain operations on hot data on the GPU.

Which operations benefit the largest in your experience? I would guess scans with simple predicates and expressions. What about aggregations?

2 more replies

alexchamberlain14y ago

Do you compress per row/column or over a given set?

1 more reply

Symmetry14y ago

This makes me think of CPU/GPU architectures where both are on the same die, but of course that won't work with CUDA.

grundprinzip14y ago

can you share a link to the project?

mrich14y ago

You can find info here (unfortunately not just an easy download)

http://www.sap.com/hana/index.epx

1 more reply

beagle314y ago· 3 in thread

Super cool, and thanks for sharing!

Are you familiar with APL/K/Q/J ? These implement column store databases which a programming language significantly more expressive and usable. K and J are also significantly faster than any other SQL database -- it would be interesting to compare K's ultra-optimized in-memory processing to the GPU -- I'm not at all sure GPU is going to win here in practice.

See https://github.com/kevinlawler/kona for an open source K implementation (the wiki has docs). Same language, newer version (with database integration) and english language instead of ascii chars is called Q see http://kx.com/q/d/ (kdb1, q1 if you like longer, kdb, q if you like terser), and http://code.kx.com/wiki/Main_Page

vessenes14y ago

My first thought looking at this was "Ooooh, it would be so nice if my GPU could parse Q.."

In fact, Q thoroughly sold me on tightly coupling a small expressive language with a parallel-friendly datastore. So many data problems require just this combination, and most general computing languages are surprisingly bad at it in comparison.

silentbicycle14y ago

I'm not sure K's in-memory processing is incompatible with running on a GPU, either. I've been working on a fork of Kona that has a very different architecture (e.g. a bytecode VM, generational mark/sweep GC, lexical scope), and I plan on experimenting with CUDA/OpenCL once the fundamentals are set.

beagle314y ago

Oh, K's in memory processing model is surely compatible with modern GPUs - at last for most primitives.

The thing I'm not sure about is whether the transfer overhead (GPU, CPU, Memory) would still be worth it in the general case. There's a question of what you're actually comparing to - e.g. you can have a (relatively) naive implementation, like the first commit of Kona was. You can have Arthur's cache-optimized-but-still-pure-C-and-CPU implementation (I think the main kona branch is comparable these days, but I'm not sure), which is faster.

And then, you can compare it to an SSE2/SSE4 version, still on the CPU; such a version does not exist afaik, but is easier to code than you would expect if you use ispc (https://github.com/ispc/ispc/tree/master/examples) or cilk-plus -- simple things with no data dependencies, as many K primitives are, often get a 4x speedup with ISPC.

Oh, and by the way, thanks for your work on kona and this message -- I've only been following kevinlawler's branch, but I'll start following you too.

1 more reply

breckinloggins14y ago· 2 in thread

Excellent work! I have a very small (marketing-related) nitpick:

Would you consider hosting your project on github instead of SourceForge? I don't say this to be trendy, github just has a much more pleasing interface in my opinion. It's also faster, subjectively at least.

astrodust14y ago

I second this. I'd rather be at the dentist than on a SourceForge page.

I swear it's the "myspace.com" of open source hosting.

davvid14y ago

Another practical consideration: I clicked on the "manual.txt" file and SF redirected me to a download page. I would have preferred to read the file in-place (like on github).

corysama14y ago· 2 in thread

Very, very cool. I'm always happy to see more ways to utilize the tremendous performance/price of GPUs.

So, what are the pros and cons of this approach? For a given hardware and energy budget, what kinds of task would be better suited to this kind of GPU implementation? Conversely, what kind of tasks would still be faster to run on a Core i7?

liquidcool14y ago

Agreed, seems like it has a lot of potential.

I remember some time back when GPUs were first being looked at for HPC uses I suggested a client look into them, but they were quickly dismissed due to their floating point limitation (which may have changed).

But DBs don't have huge FP needs. Another place GPUs are making a big difference is video editing, utilizing CUDA or OpenCL.

alexchamberlain14y ago

I think their floating point performance is improving.

1 more reply

jrmg14y ago· 2 in thread

I love this kind of novel thinking. I'd definitely like to read more detailed articles on how this works.

jdf14y ago

If you look up columnar databases, you can find a whole host of research on how these things store data. Storing values column by column to do vector processing is pretty common, but I can't recall seeing one built for a GPU before. Kudos to the dev for diving in to implementing one, it's pretty neat stuff.

Try finding research papers about MonetDB, Vertica/C-Store, or Vectorwise for some background. Or follow the links here:

http://www.dbms2.com/category/database-theory-practice/colum...

cshesse14y ago

I was looking at this a number of weeks ago to see if it would be feasible to speed up any SQL queries. There is a paper about this for SQLite: http://www.cs.virginia.edu/~skadron/Papers/bakkum_sqlite_gpg...

joshu14y ago· 1 in thread

now this is the kind of thing i come to HN for... but why oh why sourceforge?

6ren14y ago

github would be better for browsing. But the source is only 117k - I'm surprised how much of a barrier a few clicks and a few seconds was for me. (Partly, I expected it to be bigger.) The 21 files total 18832 lines; there's bison, lex, and a bunch of cuda files. Some is structured as doing similar things for similar cases, so not that long. It seems to include a reasonably complete implementation of SQL in the bison etc (and there are two example files of SQL query sequences). I'm impressed.

There are some line comments, but no overviews. :(

chubs14y ago· 1 in thread

That looks absolutely brilliant. So is the database all stored in the GPU memory? Or in main memory? Or does your engine handle storing to disk as well?

Pretty novel use of a GPU though, and if it's quick, it could be really useful.

antonmksOP14y ago

Thank you ! The data are stored on a disk and only the columns necessary for transformations are brought to GPU. In case the data does not fit into a limited GPU memory, it is processed piece by piece. I tested it on terabyte sized files so the data does not have to fit into GPU memory .

grundprinzip14y ago· 1 in thread

Looks like a very interesting approach to me given the recent developments.

Still I have a few questions regarding your numbers. Can you specify how much data you actually stream from disk? Because when doing some back of the envelope calculations (8 columns out of a 1.8B rows assuming 4b each) you would need to stream ~300MB/s from disk. Which seems very unlikely for your setup (except you have a super fast SATA drive, or SSD).

Now, with only 4GB RAM, your are constantly filling your RAM and transporting data to GPU. Plus you need to compress the data somewhere (on GPU?).

Do I understand the manual.txt correctly, that you can only achieve the performance when the data is written compressed to disk before? (While sorting it?)

I may be wrong, but the group by and join looked like versions requiring sorted data.

Can you please give a little more details on that?

antonmksOP14y ago

Here the details : Integers and decimals are both stored as 8 bytes. So one record takes 42 bytes. After compression it takes about 8 bytes ( I use FOR - frame of reference) compression and it works really well with decimal numbers (check the range of decimal numbers in lineitem.tbl file - usually you encode them in just a few bits). So I can stream about 10 million records per second from my disk. Which makes it exactly 180 seconds for 1.8B rows ( CUDA calculations are done in the background process). Records are decompressed in GPU and processed there. Actually the main bottleneck is not the disk access but the latency of CUDA calls - each call takes about 15 ms and when you process the entire file in chunks of ,say, 6 million records, it really adds up.

pavelkaroukin14y ago· 1 in thread

antonmks, did you consider using OpenCL instead? If yes - why CUDA outweighed it?

mappu14y ago

Whether it's still relevant or not CUDA has a lot of mindshare in the academic community. Partially this is due to marketing and having their product out first, but also OpenCL has to go through some more abstractions. There was a quite popular forum post[1] a while ago now where someone claimed a 5x perf improvement from switching from OpenCL to CUDA and, while i havn't tested it myself, it's the sort of statistic that sticks in your mind.

______________________

1. http://forums.nvidia.com/index.php?showtopic=109684

foobarbazetc14y ago

This is awesome. :)

Someone did something similar for PostgreSQL here:

http://wiki.postgresql.org/wiki/PGStrom

ericflo14y ago

Interesting. I wonder how a GPU could best be put to use in a database engine. Joins, I suppose? Would be cool to read more about this implementation, however the readme is a bit sparse.

alecco14y ago

Column store, compare it with MonetDB (open source) and similar. This design is suitable for read-mostly databases. It's a trade-off, faster reads and much better compression. But the price is writes usually become an order of magnitude slower. It isn't good for transactional work (OLTP).

http://en.wikipedia.org/wiki/Column-oriented_DBMS

Also the author talks (other comments) about transfer rate when the real issue with GPUS is latency. Most GPUs don't have decent RAM caches and it's very hard to cover the latency.

Also: http://reddit.com/r/programming/comments/oxq6a/a_database_en...

toblender14y ago

Very nice. I'm glad people are using their GPU's for something more productive than farming bitcoins.

2AM14y ago

nvidia gpus only, right? does it work with ati as well? CUDA is probably more advanced, so it's no surprise.

j / k navigate · click thread line to collapse

60 comments

43 comments · 15 top-level

TobbenTM14y ago· 9 in thread

What is the point of using "å" instead of "a"?

maggit14y ago

Those are two different characters. They are not interchangeable. It looks like the guy's name is Anton, a Swedish name. Thus he will create names with "å"-s in them if he wants to.

Will this cause any problems? It might. That still does not mean that "a" is a proper substitute for "å".

Should he have tried making a name from just English letters? That might have been a good idea. But that is a different recommendation than the one you gave.

"STARGÅTE", on the other hand, is an example of gratuitous replacement of "A" by "Å".

beaumartinez14y ago

In Stargate, it's a nod to Earth's point-of-origin symbol being Å (well, it's not exactly Å, but it's very similar)[1].

[1] http://stargate.wikia.com/wiki/Point_of_origin

snaky14y ago

>It looks like the guy's name is Anton, a Swedish name

This Anton is Russian clearly. http://ru.linkedin.com/pub/anton-k/32/253/695

And Алёнка (Alyonka) is a diminutive form of Russian name Алёна (Alyona) http://ru.wikipedia.org/wiki/Алёнка

Tuna-Fish14y ago

What's the point of using y instead of u?

qw14y ago

The transliteration of Å is usually "AA". In fact the "circle" above the A is really a lower case "a". It's just that as time passed, it "devolved" into a circle.

1 more reply

tricolon14y ago

One benefit: it's very Googleable.

1 more reply

nakkiel14y ago

I wouldn't say it's related in any way to the alphabet in use. Take the English 'a' as an example: skate and car produce two different sounds out of the same letter.

Alphabets transmit words, much less so sounds.

5 more replies

tzury14y ago

Google will handle it for you my friend.

http://www.google.co.il/search?q=alenka%20gpu

orcadk14y ago

But only if you know to append "gpu" to the search. Just searching for 'Alenka' doesn't yield useful results (yet): http://www.google.dk/search?sourceid=chrome&ie=UTF-8&...

mrich14y ago· 6 in thread

Congratulations, well done!

Do you know what the maximum amount of GPU memory for a blade-like system currently is?

antonmksOP14y ago

It is 6 GB for Tesla GPU. PCI Express transfer speed is 5GB/s, with average compression rate averaging 4/1 the speed becomes 20GB/s so transfer speed is not a bottleneck anymore.

mrich14y ago

Which operations benefit the largest in your experience? I would guess scans with simple predicates and expressions. What about aggregations?

2 more replies

alexchamberlain14y ago

Do you compress per row/column or over a given set?

1 more reply

Symmetry14y ago

This makes me think of CPU/GPU architectures where both are on the same die, but of course that won't work with CUDA.

grundprinzip14y ago

can you share a link to the project?

mrich14y ago

You can find info here (unfortunately not just an easy download)

http://www.sap.com/hana/index.epx

1 more reply

beagle314y ago· 3 in thread

Super cool, and thanks for sharing!

vessenes14y ago

My first thought looking at this was "Ooooh, it would be so nice if my GPU could parse Q.."

silentbicycle14y ago

beagle314y ago

Oh, K's in memory processing model is surely compatible with modern GPUs - at last for most primitives.

Oh, and by the way, thanks for your work on kona and this message -- I've only been following kevinlawler's branch, but I'll start following you too.

1 more reply

breckinloggins14y ago· 2 in thread

Excellent work! I have a very small (marketing-related) nitpick:

astrodust14y ago

I second this. I'd rather be at the dentist than on a SourceForge page.

I swear it's the "myspace.com" of open source hosting.

davvid14y ago

Another practical consideration: I clicked on the "manual.txt" file and SF redirected me to a download page. I would have preferred to read the file in-place (like on github).

corysama14y ago· 2 in thread

Very, very cool. I'm always happy to see more ways to utilize the tremendous performance/price of GPUs.

liquidcool14y ago

Agreed, seems like it has a lot of potential.

But DBs don't have huge FP needs. Another place GPUs are making a big difference is video editing, utilizing CUDA or OpenCL.

alexchamberlain14y ago

I think their floating point performance is improving.

1 more reply

jrmg14y ago· 2 in thread

I love this kind of novel thinking. I'd definitely like to read more detailed articles on how this works.

jdf14y ago

Try finding research papers about MonetDB, Vertica/C-Store, or Vectorwise for some background. Or follow the links here:

http://www.dbms2.com/category/database-theory-practice/colum...

cshesse14y ago

joshu14y ago· 1 in thread

now this is the kind of thing i come to HN for... but why oh why sourceforge?

6ren14y ago

There are some line comments, but no overviews. :(

chubs14y ago· 1 in thread

That looks absolutely brilliant. So is the database all stored in the GPU memory? Or in main memory? Or does your engine handle storing to disk as well?

Pretty novel use of a GPU though, and if it's quick, it could be really useful.

antonmksOP14y ago

grundprinzip14y ago· 1 in thread

Looks like a very interesting approach to me given the recent developments.

Now, with only 4GB RAM, your are constantly filling your RAM and transporting data to GPU. Plus you need to compress the data somewhere (on GPU?).

Do I understand the manual.txt correctly, that you can only achieve the performance when the data is written compressed to disk before? (While sorting it?)

I may be wrong, but the group by and join looked like versions requiring sorted data.

Can you please give a little more details on that?

antonmksOP14y ago

pavelkaroukin14y ago· 1 in thread

antonmks, did you consider using OpenCL instead? If yes - why CUDA outweighed it?

mappu14y ago

______________________

1. http://forums.nvidia.com/index.php?showtopic=109684

foobarbazetc14y ago

This is awesome. :)

Someone did something similar for PostgreSQL here:

http://wiki.postgresql.org/wiki/PGStrom

ericflo14y ago

Interesting. I wonder how a GPU could best be put to use in a database engine. Joins, I suppose? Would be cool to read more about this implementation, however the readme is a bit sparse.

alecco14y ago

http://en.wikipedia.org/wiki/Column-oriented_DBMS

Also the author talks (other comments) about transfer rate when the real issue with GPUS is latency. Most GPUs don't have decent RAM caches and it's very hard to cover the latency.

Also: http://reddit.com/r/programming/comments/oxq6a/a_database_en...

toblender14y ago

Very nice. I'm glad people are using their GPU's for something more productive than farming bitcoins.

2AM14y ago

nvidia gpus only, right? does it work with ati as well? CUDA is probably more advanced, so it's no surprise.

j / k navigate · click thread line to collapse