DDSketch: A fast, fully-mergeable quantile sketch with relative-error guarantees (opens in new tab)

(arxiv.org)

107 pointsjbarciauskas6y ago22 comments

22 comments

20 comments · 4 top-level

homin6y ago· 14 in thread

Author here. We wanted to be able to graph p99, p99.9 metrics with arbitrary ranges, and found the existing solutions were not accurate enough for our needs. Happy to answer any questions.

Code here:

https://github.com/DataDog/sketches-go

https://github.com/DataDog/sketches-py

https://github.com/DataDog/sketches-java

heinrichhartman6y ago

Nice work! Averaging percentiles is well-known to give terrible results. Glad to see more people, taking this problem serious, and providing viable alternatives!

A note on Accuracy: At Circonus, we have been using a version of HDR-Histograms [1] for many years to aggregate latency distributions, and calculate accurate aggregated percentiles. Accuracy was never a problem (worst-case error <5%, usually _much_ better).

If I read your evaluation results correctly, you also found HRD-Histograms to be as-accurate or more-accurate, than DDSketches, correct?

The differentiator to HDR Histograms seems to be merging speed and size, where DDSketches seem to have an edge.

One thing that is not immediately clear to me from reading the paper is, how much of the distribution function can be reconstructed from the sketch? E.g. for SLO calculations one is often interested in latency bands: "How many requests were faster than 100ms?" [2].

Is it possible to approximate CDF values ("lower counts") from the sketch with low/bounded error?

[1] https://github.com/circonus-labs/libcircllhist

[2] http://heinrichhartmann.com/pdf/Heinrich%20Hartmann%20-%20La...

homin6y ago

HDR is great for the use-case when you can bound your range beforehand and merging is not a requirement, but those were also the reasons we needed to develop DDSketch.

1 more reply

homin6y ago

Yes, the sketch can quickly answer questions like "How many requests were faster than 100ms?" 100ms maps to a bucket index, and we would just sum up all the buckets with indices at most that.

oertl6y ago

FYI, already in 2015, I have proposed exactly the same idea as improvement to HdrHistogram. See https://github.com/HdrHistogram/HdrHistogram/issues/54 and corresponding code https://github.com/oertl/HdrHistogram/blob/memory_efficiency.... Unfortuantely, the author of HdrHistogram did not pick up my proposal as that would have lead to major changes and problems regarding compatibility. The mapping to histogram bins is the same as in your code https://github.com/DataDog/sketches-java/blob/master/src/mai.... I am curious, have you been inspired by that?

jamesblonde6y ago

This sucks, but a problem we have in academia is that you can't reference a closed PR request at a top-tier conference like VLDB. If you wrote up your proposal as a referable document, you should rightly claim that you deserve scholarly recognition. Morally, you deserve recognition (which you are getting here, at least). It's decent of the author to admit he was inspired by you - there are others who would think it is easier to deny it. But there is no recourse, if it's only a PR.

CharlesMasson6y ago

I did see that thread and that's what inspired the mapping with the quadratic interpolation.

To clarify though, DDSketch as defined in the paper uses the logarithmic mapping, as our main goal was to make the memory footprint as small as possible. See: https://github.com/DataDog/sketches-java/blob/1650d939f1485f...

The Java implementation abstracts out the index mapping. This let us add alternative ways to map values to indices and we added the method with the quadratic interpolation as it seems to be a good tradeoff between index computation speed and memory footprint.

spenczar56y ago

Thanks for publishing this. I admit I have only skimmed the paper and plan to look closer later today. My first critical thought was "I wonder why section 4 doesn't make comparisons to t-digest?" I think of t-digest as the most common mergeable streaming modern quantile algorithm in practice. Why didn't you include it in your comparisons?

Either way - looks very promising, I am excited to take a closer look and possibly use this. Thanks!

homin6y ago

T-digest is definitely the best rank-accuracy sketch for higher percentiles in terms of size. However, it was much slower than GK, and we found that by increasing the rank-accuracy of GK (taking the penalty of a larger sketch-size), the results were not too different. Both of course had the issues inherent in all rank-accurate sketches in the higher percentiles over long-tailed data.

zawerf6y ago

No background in this, my layman interpretation is: Track a histogram with log-scaled bins. These are easy to increment and only takes up size of the number of bins in terms of memory. Two histograms from different servers can easily be merged together since they use the same bins.

But then I got completely lost in the math to show why this guarantees anything. What's actually preventing the scenario where everything falls into the same or just a few bins? (as so happens in the real world where all your data is of a relatively similar order of magnitude) For example what would happen if my data is random from 1 to 10? And on a different server random 1000 to 1010?

heinrichhartman6y ago

> number of bins in terms of memory

Less than that, with the appropriate compression scheme (sparse encoding).

> But then I got completely lost in the math to show why this guarantees anything.

You get guarantees on _relative_ error:

- 100 to 110 is a 10% error

- 1000 to 1010 is a 1% error

Depending on your logarithmic base, you can detect errors up to a certain relative %-tage.

1 more reply

ryanworl6y ago

Is it fair to say that if you know the min and max values for your dataset, the DDSketch (fast) is strictly better than HDR Histogram for applications where lack of runtime memory allocations and add performance is critical?

homin6y ago

In theory yes, but none of our implementations specifically prevent reallocations as we were not targeting this use-case. It should be easy enough to add though.

morei6y ago

Minor nit from paper: The range for Moments is listed as bounded due to overflow risk using floats.

This may be true for the provided implementation, but it's not inherent to the algorithm. The calcuation of the higher order moments would normally done entirely in the log domain where there's little risk of overflow.

camel_gopher6y ago

I've relied on HDR histograms for that. It looks like your sketches are a bit more performant in the paper, but what was the issue with accuracy that histograms couldn't fulfill?

MrBuddyCasino6y ago· 1 in thread

How does this compare to t-digest?

https://github.com/tdunning/t-digest/blob/master/docs/t-dige...

MrBuddyCasino6y ago

Ah there is a short comparison in "related work":

The problems of having high relative errors on the larger quantiles has been addressed by a line of work that still uses rank error, but promises lower rank error on the quantiles further away from the median by biasing the data it keeps towards the higher (and lower) quantiles [7], [8], [17]. The latter, dubbed t-digest, is notable as it is one of the quantile sketch implementations used by Elasticsearch [18]. These sketches have much better accuracy (in rank) than uniform-rank-error sketches on percentiles like the p99.9, but they still have high relative error on heavy-tailed data sets. Like GK they are only one-way mergeable.

skyde6y ago· 1 in thread

great! so main difference is more accuracy on average or more the fact the maximum error possible is bounded?

homin6y ago

Most previous sketches used what they called "rank accuracy". So if your inputs were [2^1, 2^2, ...., 2^1000]. The actual p95 is 2^950, and a 0.01 rank-accurate sketch would be allowed to give you any value between 2^940 to 2^960 (which can be up to a factor of 1024 away).

A 0.01 relative-accurate sketch has to give you a value within a factor of 100.

The above example is contrived, but with real web request data, there is often a very long tail, and rank accurate sketches quickly start giving values far from the actual percentiles.

So to more directly answer you question, it's that the maximum error possible is bounded.

germanjoey6y ago

Hey, congratulations, this is a really cool algorithm! Thanks for sharing it.

I'm interested in this paper because I worked on a somewhat related problem some time ago, but got stuck on how to handle data that morphs into a mixed-modal distribution. Modes that are close together are no big deal, but modes that are spaced more exponentially apart are tricky to deal with. For an example of something that would be in DataDog's purview, it would be like trying to sketch the histogram of response times from an endpoint that sometimes took a "fast" path (e.g. a request for a query whose result was cached), sometimes took a "normal" path, and sometimes took a "slow" path. (e.g. a query with a flag that requested additional details to be computed) If the response times from the slow path is much bigger than the others, e.g. by an order of magnitude, their statistics might essentially drown-out the data from the other two paths since you're using them to calculate bin size.

I noticed you had some results from measuring DDSketch's performance on a mixed-modal distribution that looked pretty good (that "power" distribution on the last page). I was wondering if you had done any more investigation in this area? E.g. how messy/mixed can the data be before the sketch starts to break down?

j / k navigate · click thread line to collapse

22 comments

20 comments · 4 top-level

homin6y ago· 14 in thread

Author here. We wanted to be able to graph p99, p99.9 metrics with arbitrary ranges, and found the existing solutions were not accurate enough for our needs. Happy to answer any questions.

Code here:

https://github.com/DataDog/sketches-go

https://github.com/DataDog/sketches-py

https://github.com/DataDog/sketches-java

heinrichhartman6y ago

Nice work! Averaging percentiles is well-known to give terrible results. Glad to see more people, taking this problem serious, and providing viable alternatives!

If I read your evaluation results correctly, you also found HRD-Histograms to be as-accurate or more-accurate, than DDSketches, correct?

The differentiator to HDR Histograms seems to be merging speed and size, where DDSketches seem to have an edge.

Is it possible to approximate CDF values ("lower counts") from the sketch with low/bounded error?

[1] https://github.com/circonus-labs/libcircllhist

[2] http://heinrichhartmann.com/pdf/Heinrich%20Hartmann%20-%20La...

homin6y ago

HDR is great for the use-case when you can bound your range beforehand and merging is not a requirement, but those were also the reasons we needed to develop DDSketch.

1 more reply

homin6y ago

Yes, the sketch can quickly answer questions like "How many requests were faster than 100ms?" 100ms maps to a bucket index, and we would just sum up all the buckets with indices at most that.

oertl6y ago

jamesblonde6y ago

CharlesMasson6y ago

I did see that thread and that's what inspired the mapping with the quadratic interpolation.

spenczar56y ago

Either way - looks very promising, I am excited to take a closer look and possibly use this. Thanks!

homin6y ago

zawerf6y ago

heinrichhartman6y ago

> number of bins in terms of memory

Less than that, with the appropriate compression scheme (sparse encoding).

> But then I got completely lost in the math to show why this guarantees anything.

You get guarantees on _relative_ error:

- 100 to 110 is a 10% error

- 1000 to 1010 is a 1% error

Depending on your logarithmic base, you can detect errors up to a certain relative %-tage.

1 more reply

ryanworl6y ago

homin6y ago

In theory yes, but none of our implementations specifically prevent reallocations as we were not targeting this use-case. It should be easy enough to add though.

morei6y ago

Minor nit from paper: The range for Moments is listed as bounded due to overflow risk using floats.

camel_gopher6y ago

I've relied on HDR histograms for that. It looks like your sketches are a bit more performant in the paper, but what was the issue with accuracy that histograms couldn't fulfill?

MrBuddyCasino6y ago· 1 in thread

How does this compare to t-digest?

https://github.com/tdunning/t-digest/blob/master/docs/t-dige...

MrBuddyCasino6y ago

Ah there is a short comparison in "related work":

skyde6y ago· 1 in thread

great! so main difference is more accuracy on average or more the fact the maximum error possible is bounded?

homin6y ago

A 0.01 relative-accurate sketch has to give you a value within a factor of 100.

The above example is contrived, but with real web request data, there is often a very long tail, and rank accurate sketches quickly start giving values far from the actual percentiles.

So to more directly answer you question, it's that the maximum error possible is bounded.

germanjoey6y ago

Hey, congratulations, this is a really cool algorithm! Thanks for sharing it.

j / k navigate · click thread line to collapse