Kudu – Fast Analytics on Fast Data (opens in new tab)

(getkudu.io)

63 pointsstrlen10y ago22 comments

22 comments

19 comments · 4 top-level

bankim10y ago· 7 in thread

Curious what's the reason for implementing Kudu in C++ and not Java/Scala?

I spent a lot of time in 2011 or so struggling with GC on the JVM: http://blog.cloudera.com/blog/2011/02/avoiding-full-gcs-in-h... has some of the gory details. Even hacked a bit on G1: http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2011-A...

With a lot of effort by many folks in the community, HBase has mostly tackled the full-GC problem, but still has occasional issues with some workloads.

So, GC was definitely one factor - not having GC means we can give 99th percentile numbers in the single-digit milliseconds, which is pretty nice. Our master process actually has shown <1ms 99.99th percentile for tablet location requests on an 80 node cluster. So again, numbers that are super difficult to get on the JVM unless you take an allocation-free approach like the HFT guys do.

Another factor was ease of integration of platform-specific code for performance reasons. For example, we make use of SSE prefetch instructions to improve scan speed in our concurrent B-tree by 30% or so. The b-tree itself would be difficult to implement in Java due to lack of control over object layout, etc. While you can eventually get the same performance with enough off-heaping and sun.misc.Unsafe, my feeling is that, by the time you've gone down that road, you might as well be using C++.

I'll admit that, after many years of not writing native code, I was a bit nervous of diving back in. Segfaults are never fun. But, we soon realized that the native code tooling has improved a _ton_ in the last decade. We run all of our tests precommit using the excellent Sanitizer tools from Google (ThreadSanitizer, AddressSanitizer, LeakSanitizer) and those make it nearly trivial to diagnose a leak or crash. We also have pretty strict guidelines around use of pointers, based on the Google C++ guidelines. Many will complain that this is a neutered form of C++, and they're right. But it's also a relatively safe form of C++.

I could probably write a lengthy blog post on our experiences of C++ vs Java, but hopefully the above gives you a taste. Overall I've been happy with the decision. Slightly more time spent on crashes. Less time spent on chasing hard-to-reproduce performance or memory consumption issues. And the thread checking tools are actually far superior, so I'd say less time spent chasing races.

random310y ago

I think writing a long post about your experiences of C++ vs Java would be great (I'd pay you in [choose your drinks and count] for it ;)).

tlipcon10y ago

Quick example of how TSAN makes it easy to understand and fix races -- here's a commit message which shows its output: https://github.com/cloudera/kudu/commit/e402d5ed79a9c98283b6...

Apparently Google has this same tool for Java internally, but hasn't open sourced it due to their litigation with Oracle over Java stuff.

acconsta10y ago

>I could probably write a lengthy blog post on our experiences of C++ vs Java

I'd be interested in reading that. There's still a lot of FUD around using C++ for new projects.

vitalyd10y ago

+1 on writing up a blog post on c++ vs java. I suspect given your background in java, people may heed your words a bit more than usual. There's definitely a lot of outdated thinking in java land with regards to (modern) c++ and accompanying toolchain. Many big data projects could benefit from being written in c++ rather than java (or another jvm language).

bankim10y ago

Thanks for the response, Todd! Since most projects in Hadoop ecosystem are written in Java/Scala on JVM I was pleasantly surprised about choice of C++ :)

nfa_backward10y ago

From my experience and the experience of others ( https://www.eecs.berkeley.edu/~keo/publications/nsdi15-final... ) current big data solutions are more often CPU bound not IO. I think that we will be seeing more and more of big data architecture moving to C++. For example: http://www.scylladb.com/

tlipcon10y ago· 6 in thread

Todd from the Kudu team here. If anyone has any questions, feel free to ask them here, will try to check back throughout the day.

tlipcon10y ago

Should add that those looking for a technical deep dive might enjoy our draft paper: http://getkudu.io/kudu.pdf and/or browsing our source: http://github.com/cloudera/kudu

eanews10y ago

The faq mentions there are no security features at the moment, but do you have any thoughts as to what the security goals are? In particular, will there be cell level security a la accumulo or support for data at rest?

tlipcon10y ago

We haven't scoped out the security features. Cell level security can be difficult to implement efficiently, but if we see enough demand for it, I could imagine it happening.

My guess is that the first pass will be table and column level authorization, plus of course strong authentication. Row/cell/predicate-based security could be added in a later release, but it's a feature that's less commonly required.

As for encryption at rest, I imagine that will also be fairly high priority as we move towards GA or the first few releases after GA. But again, we haven't done the scoping exercise yet, so I'm cautious to throw out dates :)

If you're interested in helping to contribute either feature, let us know! kudu-dev@googlegroups.com

random310y ago

Hey Todd,

The insert latency seems to be related to the random read latency (seems that the unique key constraint has that effect). Do you have some data on the insert latency distribution?

Thanks, Cosmin

tlipcon10y ago

We're working on running some more thorough YCSB benchmarks, but here are some of the percentiles on the uniform "workload A" running on a 9 node cluster for 1 hour:

Throughput: 28280 ops/sec Read: 2821us avg, 467us min, 3519us 95p, 6843 99p Update: 1688us avg, 714us min, 1983us 95p, 8855us 99p

Workload D, which has some inserts (and reads recently written data):

Throughput: 36286 ops/sec Read: 1765us avg, 491us min, 2537us 95p, 4259us 99p Insert: 1614us avg, 838us min, 1595us 95p, 11575us 99p

Hope that helps. I'll try to push our latest YCSB bindings to github later this afternoon/evening if you'd like to reproduce on your own.

1 more reply

nfa_backward10y ago

Does Kudu colocate data sets with identical keys? If so, are there plans to have Impala take advantage of this?

nfa_backward10y ago· 2 in thread

Kudu is being positioned as filling the gap between HDFS and HBase. After reading the overview I see this more as bringing features from HDFS+Parquet+HBase. Does that sound reasonable?

Super excited about this and even more so since it is open source. Thank you!

tlipcon10y ago

Yep, that's correct. HDFS+Parquet is more accurate but doesn't fit quite as well on slides and short descriptions.

The idea is to get the analytic scan performance of Parquet while still allowing for in-place updates and row-by-row access like HBase.

HDFS (with Parquet or other formats) will still be better for unstructured or fully immutable datasets. HBase will still be better when your top priority is ingest rate, random access, and semi-structured data. Kudu should be good when you've got tabular data as described above.

nfa_backward10y ago

Impala has an in-memory columnar format on its road map for 2016. Is that format being design with Kudu in mind?

Edit: I understand that the formats, while both columnar, serve different purposes. I am more curious about overlap if any between the two.

1 more reply

vvladymyrov10y ago

Todd Any plans to add user defined functions? Will it be only UDFs written in C(++)? I'm curious how do you think UDF support can be designed for the native code implementation.

j / k navigate · click thread line to collapse

22 comments

19 comments · 4 top-level

bankim10y ago· 7 in thread

Curious what's the reason for implementing Kudu in C++ and not Java/Scala?

tlipcon10y ago

With a lot of effort by many folks in the community, HBase has mostly tackled the full-GC problem, but still has occasional issues with some workloads.

random310y ago

I think writing a long post about your experiences of C++ vs Java would be great (I'd pay you in [choose your drinks and count] for it ;)).

tlipcon10y ago

Quick example of how TSAN makes it easy to understand and fix races -- here's a commit message which shows its output: https://github.com/cloudera/kudu/commit/e402d5ed79a9c98283b6...

Apparently Google has this same tool for Java internally, but hasn't open sourced it due to their litigation with Oracle over Java stuff.

acconsta10y ago

>I could probably write a lengthy blog post on our experiences of C++ vs Java

I'd be interested in reading that. There's still a lot of FUD around using C++ for new projects.

vitalyd10y ago

bankim10y ago

Thanks for the response, Todd! Since most projects in Hadoop ecosystem are written in Java/Scala on JVM I was pleasantly surprised about choice of C++ :)

nfa_backward10y ago

tlipcon10y ago· 6 in thread

Todd from the Kudu team here. If anyone has any questions, feel free to ask them here, will try to check back throughout the day.

tlipcon10y ago

Should add that those looking for a technical deep dive might enjoy our draft paper: http://getkudu.io/kudu.pdf and/or browsing our source: http://github.com/cloudera/kudu

eanews10y ago

tlipcon10y ago

We haven't scoped out the security features. Cell level security can be difficult to implement efficiently, but if we see enough demand for it, I could imagine it happening.

If you're interested in helping to contribute either feature, let us know! kudu-dev@googlegroups.com

random310y ago

Hey Todd,

The insert latency seems to be related to the random read latency (seems that the unique key constraint has that effect). Do you have some data on the insert latency distribution?

Thanks, Cosmin

tlipcon10y ago

We're working on running some more thorough YCSB benchmarks, but here are some of the percentiles on the uniform "workload A" running on a 9 node cluster for 1 hour:

Throughput: 28280 ops/sec Read: 2821us avg, 467us min, 3519us 95p, 6843 99p Update: 1688us avg, 714us min, 1983us 95p, 8855us 99p

Workload D, which has some inserts (and reads recently written data):

Throughput: 36286 ops/sec Read: 1765us avg, 491us min, 2537us 95p, 4259us 99p Insert: 1614us avg, 838us min, 1595us 95p, 11575us 99p

Hope that helps. I'll try to push our latest YCSB bindings to github later this afternoon/evening if you'd like to reproduce on your own.

1 more reply

nfa_backward10y ago

Does Kudu colocate data sets with identical keys? If so, are there plans to have Impala take advantage of this?

nfa_backward10y ago· 2 in thread

Kudu is being positioned as filling the gap between HDFS and HBase. After reading the overview I see this more as bringing features from HDFS+Parquet+HBase. Does that sound reasonable?

Super excited about this and even more so since it is open source. Thank you!

tlipcon10y ago

Yep, that's correct. HDFS+Parquet is more accurate but doesn't fit quite as well on slides and short descriptions.

The idea is to get the analytic scan performance of Parquet while still allowing for in-place updates and row-by-row access like HBase.

nfa_backward10y ago

Impala has an in-memory columnar format on its road map for 2016. Is that format being design with Kudu in mind?

Edit: I understand that the formats, while both columnar, serve different purposes. I am more curious about overlap if any between the two.

1 more reply

vvladymyrov10y ago

Todd Any plans to add user defined functions? Will it be only UDFs written in C(++)? I'm curious how do you think UDF support can be designed for the native code implementation.

j / k navigate · click thread line to collapse