Hi all,
I made a data compression algorithm that works as follows:
Replace common bigrams with a fixed value:
"of the" => "of e"
"and she" => "and e"
"when there" => "when e"
To decompress, look up the bigram dictionary for the right value of e.
I've had some weird bugs in the decompression process, so I'm looking forward to others looking at this.
I tested on the dickens dataset (Charles Dickens novels). This algorithm works well on English text. I do not expect it to be any better than zstd at binary data. I haven't tried code.
It does used zstd. My algorithm is a preprocessing step for zstd.