Show HN: New compression algorithm beats ZSTD by 14% (opens in new tab)

(github.com)

1 pointsseccode1y ago3 comments

Hi all, I made a data compression algorithm that works as follows: Replace common bigrams with a fixed value: "of the" => "of e" "and she" => "and e" "when there" => "when e"

To decompress, look up the bigram dictionary for the right value of e.

I've had some weird bugs in the decompression process, so I'm looking forward to others looking at this.

3 comments

3 comments · 1 top-level

metadat1y ago· 2 in thread

What kind of data is it 14% better on?

The algorithm code looks like it's actually using Zstd, is that right?

Please show some benchmarks. I'd like to repeat your test!

seccodeOP1y ago

I tested on the dickens dataset (Charles Dickens novels). This algorithm works well on English text. I do not expect it to be any better than zstd at binary data. I haven't tried code.

It does used zstd. My algorithm is a preprocessing step for zstd.

metadat1y ago

Commit that data + test code :)

Add the results to the readme ::))

1 more reply

j / k navigate · click thread line to collapse