Fc, a lossless compressor for floating-point streams (opens in new tab)

(github.com)

108 pointsenduku1mo ago33 comments

33 comments

31 comments · 11 top-level

endukuOP1mo ago· 7 in thread

I built "fc", a C library for compressing streams of 64-bit floating-point values without quantization.

It is not trying to replace zstd or lz4. The idea is narrower: take blocks of doubles, try a set of float-specific predictors/transforms/coders, and emit whichever representation is smallest for that block.

It is aimed at time-series, scientific, simulation, and analytics data where the numbers often have structure: smooth curves, repeated values, fixed increments, periodic signals, predictable deltas, or low-entropy mantissas.

The API is intentionally small: "fc_enc", "fc_dec", a config struct, and a few counters to inspect which modes won. Decode is parallel and meant to be fast; encode spends more CPU searching for a better representation.

Current caveats: x86-64 only for now, tuned for IEEE-754 doubles, research-grade rather than production-hardened.

Repo: https://github.com/xtellect/fc

unwind1mo ago

Can you elaborate on how it detects and signals if it runs out of output buffer space? I couldn't see how the amount of available space was even communicated to `fc_enc()`.

Also there some "C icks" (to me, I'm very picky and used to know the standard awfully well from answering many SO questions) that you might want to look into. The two I remember now are the casting of `void` pointers from allocation functions, and (worse) the assumption that "all bits zero" is how a NULL pointer is represented.

jiggawatts1mo ago

> rather than production-hardened.

Please run it through your preferred AI once or twice with instruction to look for bugs. The version of Fc in the main branch has at least a few memory safety bugs that attacker-controlled inputs could exploit.

I'd link a chat history but the tool I used has that feature blocked for some weird reason, and the locals round these parts don't take kindly to copy-pasted AI content...

endukuOP1mo ago

Thank you. Fuzz safety is definitely on my list. Current focus is to broaden the benchmarks , predictors and preprocessors and see what sticks

gus_massa1mo ago

Does it assume the floats come from photos or sound or something?

endukuOP1mo ago

It is intended t obe mainly source agnostic (will try to add custom source predictors too). The idea is to treat input as an ordered stream of doubles and look for numeric structure like repeats, smooth deltas, fixed increments, or low-entropy bits. Target presentlyis scientific/time-series/simulation/analytics data, not photos or sound.

1 more reply

snissn1mo ago

What do you mean by decode is parallel?

magicalhippo1mo ago

It splits the input into blocks which are encoded separately, so the decoder can fire up multiple threads to decode multiple blocks in parallel.

https://github.com/xtellect/fc#how-it-works

userbinator1mo ago· 5 in thread

It splits the input into adaptively-sized blocks (quanta), runs a competition between many specialized codecs on each block, and emits the smallest result.

This is, for lack of a better term, a "metacompressor", but it will be interesting to see which of the choices end up dominating; in my past experiences with metacompression, one algorithm is usually consistently ahead.

endukuOP1mo ago

That’s a fair description. One mode does not dominate in my current harness; the winning mode varies quite a bit by dataset/block. If real workloads show one or two modes dominate, I’d rather simplify the portfolio :) For now the extra encode CPU is intentional: spend time once, get smaller blocks and fast parallel decode.

apodik1mo ago

I’ve never heard of a metacompressor before, what others exist?

whizzter1mo ago

I think the idea is that the compressor is "meta" in the sense that it directs compressors as GP mentions by selecting what's actually producing the best results, so it's not just one comrpessor but a series of supported ones plugged in to be used adaptively (controlled at a "meta" level).

Floating point data is a mess to compress, but I think the idea here is to apply different transforms (and perhaps back-end codecs) on data and see if one fits the data so perfectly that you magically get a lot of compression.

Say you have an audio with a sawtooth, it's linear an gradient but if the peaks is "random" values like 1.245 and PI then the mantissa bits of the interpolation range will look fairly "random" to a classic compressor, whilst this compressor can test to see if there are linear gradient spans (or near linear gradient) where it stores the gradient and dumps out the "difference" bits for a regular compressor.

Or 3d coordinates for 3d models (non-stripified), plenty of repeating 8-byte doubles that will be garbage and not help a classic compressor much, building a float aware dictionary and using that would easily bring down the data by quite a few %.

(I don't agree with GP, one method might win out for certain workloads, but the idea here seems to be a pluggable utility that can help a wide range of developers with something "for free").

CyberDildonics1mo ago

Even back when star wars episode 1 came out with quicktime files of the trailer there were multiple codecs used in the same video file to make it look as good as possible.

Making up a new term isn't necessary, this has been done and everyone just called it compression.

endukuOP1mo ago

I’m regarding that term loosely here- in this case it is 'try several representations/codecs for a block and store the winner.' Similar ideas show up in columnar formats choosing encodings per column/page, OpenZL selectors (asother commenters pointed here), and shuffle/transpose + backend-compressor pipelines. fc’s version is much narrower: a tournament among f64-specific modes per block.

childintime1mo ago· 2 in thread

A lossy compressor might also be useful for common floating point apps. The simplest compressor ever would just chop off a number of bits from the mantissa.

endukuOP1mo ago

Yeah, and also approximating a double (within range) to int32 :)

https://x.com/Densebit/status/1839705674378613043?s=20

dahart1mo ago

That code is absolutely terrible! Never do that. The range is awful, and the relative error is awful.

If you want a double in 32 bits, convert to single precision float. This will beat the relative error of the code you linked to by orders of magnitude, and allow the range of float (~1e38) rather than be limited to +- 1e9.

pella1mo ago· 1 in thread

> "fc is a lossless compressor for streams of IEEE-754 64-bit doubles."

The new OpenZL SDDL2 (Simple Data Description Language) supports several different floating-point types. It would be worthwhile to contribute some of the FC project's experience to OpenZL. Now the OpenZL supported types:

  | Type           | Size    |Endian|
  |----------------|---------|-----|
  | `Int8`         | 1 byte  | N/A |
  | `UInt8`        | 1 byte  | N/A |
  | `Int16LE/BE`   | 2 bytes | Yes |
  | `UInt16LE/BE`  | 2 bytes | Yes |
  | `Int32LE/BE`   | 4 bytes | Yes |
  | `UInt32LE/BE`  | 4 bytes | Yes |
  | `Int64LE/BE`   | 8 bytes | Yes |
  | `UInt64LE/BE`  | 8 bytes | Yes |
  | `Float16LE/BE` | 2 bytes | Yes |
  | `Float32LE/BE` | 4 bytes | Yes |
  | `Float64LE/BE` | 8 bytes | Yes |
  | `BFloat16LE/BE`| 2 bytes | Yes |
  | `Bytes(n)`     | n bytes | N/A |

Some links:

- https://github.com/facebook/openzl/releases/tag/v0.2.0

- https://openzl.org/getting-started/introduction/

- https://openzl.org/sddl/sddl2-announcement/

- https://openzl.org/sddl/core-concepts/

endukuOP1mo ago

Thanks, this looks super relevant. I think the transferable part is the per-block selectrover predictors, strides, deltas, exponent/mantissa-ish structure, byte transpose, fallback raw/LZ, etc.sddl2 looks like a natural place to try some of that.

peterabbitcook1mo ago· 1 in thread

I’ve been skimming the source code and it looks promising for the stated use case. Wondering how to configure and set it up for a producer/consumer scenario where the producer puts compressed bytes on the wire and the consumer processes it; I can definitely see a use case where an edge sensor pumps compressed data to a cloud server with a GPU, though I don’t usually pipe doubles to a GPU.

Something worth thinking about that since you mentioned it’s geared towards “scientific” data streams. If we’re talking about precise measurements from instruments, your sensor is typically an analog signal which you digitize. Digitizers exist that can output floats, but DACs used in industry like a Rincon or Alazar (that sample at multiples of 100 MHz) prefer to output quantized shorts or ints that are rescaled to a float with a magic number (i.e. 32767/pi for a phase measurement, or gain/(16 mA) for industrial transducers) somewhere down the line. I bring this up because you pointed out your max throughput is about 120 MiB/s which would make it a big bottleneck for scientific data coming out of a digitizer that can pump out 800-1600MiB/s. 120 MiB/s throughput of doubles is not really that high for CPU level computations or network Tx bandwidth on modern hardware.

endukuOP1mo ago

Fair points on both. Thanks for aasking.

The 120 MiB/s encode ceiling is the cost of the mode competition. that's where the ratio comes from. At 800-1600 MiB/s off a digitizer, fc is the bottleneck no matter what transport sits behind it; for that regime zstd-3 or lz4 are the better fit, or fc further down the pipeline on aggregated/decimated data.

You're also right on int/short. fc's modes look for IEEE-754 bit patterns, so doubles that started life as rescaled ints lose the structure those modes exploit. A native int16/int32 path is on the list.

For the wiring itself: I have a sister single-header library, vibe (https://github.com/xtellect/vibe), built for this exact pattern: length-prefixed TCP/IPC framing on Linux, with a `telemetry_sink` example close to the edge-sensor --> cloud-ingest case. Producer compresses with fc, ships framed bytes through vibe, consumer decompresses. Doesn't solve the throughput ceiling, but handles the producer/consumer setup cleanly.

edit: i think the comments is flagged automatically because I used `vibe` (bad name I know) :)

loeg1mo ago· 1 in thread

The question is, how close can OpenLZ come? (This is from the same people who develop zstd, but suitable for structured data in a generic way.)

endukuOP1mo ago

I need to add it to the benchmark. My expectation is that OpenZL should be strong when the enclosing format is known and SDDL can separate typed fields cleanly. Running both on the same f64 arrays will give some information

Scaevolus1mo ago· 1 in thread

I see you have ALP, but have you tried Chimp128 or Arrow's byte stream split?

endukuOP1mo ago

I have an XOR128-style mode and a byte-transpose/byte-split-like mode, but I should not claim that as a proper Chimp128 or Arrow Parquet byte-stream-split comparison yet. I willadd direct baselines for Chimp128 and Arrow/Parquet BSS+zstd to the harness.

abcd_f1mo ago· 1 in thread

The most interesting section - How It Works - could really elaborate on details a bit more.

endukuOP1mo ago

Agreed. will work on that :)

KerrickStaley1mo ago· 1 in thread

Another library in this space is pcodec; I'd appreciate a comparison of the two.

endukuOP1mo ago

Agreed; pcodec is probably one of the most relevant comparisons. I will add pcodec to teh benchmark

radford-neal1mo ago

Those interested in this might find my paper on "Representing numeric data in 32 bits while preserving 64-bit precision" to be of interest. Can be found at https://arxiv.org/abs/1504.02914 (note the code available as auxilliary files). In the context of this compressor, it could be one of the compressors competing to compress a block. It works well for data converted from a decimal representation with a small number of digits.

rincebrain1mo ago

I must say, for a library advertising handling of streams of data, the absence of a stream utility to [input] | fc | fc -d surprised me.

I understand this is more the primitive that you would build such a thing on top of, just that the first question I always have for novel compressors is "how do they do on these example streams of data".

j / k navigate · click thread line to collapse

33 comments

31 comments · 11 top-level

endukuOP1mo ago· 7 in thread

I built "fc", a C library for compressing streams of 64-bit floating-point values without quantization.

Current caveats: x86-64 only for now, tuned for IEEE-754 doubles, research-grade rather than production-hardened.

Repo: https://github.com/xtellect/fc

unwind1mo ago

Can you elaborate on how it detects and signals if it runs out of output buffer space? I couldn't see how the amount of available space was even communicated to `fc_enc()`.

jiggawatts1mo ago

> rather than production-hardened.

I'd link a chat history but the tool I used has that feature blocked for some weird reason, and the locals round these parts don't take kindly to copy-pasted AI content...

endukuOP1mo ago

Thank you. Fuzz safety is definitely on my list. Current focus is to broaden the benchmarks , predictors and preprocessors and see what sticks

gus_massa1mo ago

Does it assume the floats come from photos or sound or something?

endukuOP1mo ago

1 more reply

snissn1mo ago

What do you mean by decode is parallel?

magicalhippo1mo ago

It splits the input into blocks which are encoded separately, so the decoder can fire up multiple threads to decode multiple blocks in parallel.

https://github.com/xtellect/fc#how-it-works

userbinator1mo ago· 5 in thread

It splits the input into adaptively-sized blocks (quanta), runs a competition between many specialized codecs on each block, and emits the smallest result.

endukuOP1mo ago

apodik1mo ago

I’ve never heard of a metacompressor before, what others exist?

whizzter1mo ago

(I don't agree with GP, one method might win out for certain workloads, but the idea here seems to be a pluggable utility that can help a wide range of developers with something "for free").

CyberDildonics1mo ago

Even back when star wars episode 1 came out with quicktime files of the trailer there were multiple codecs used in the same video file to make it look as good as possible.

Making up a new term isn't necessary, this has been done and everyone just called it compression.

endukuOP1mo ago

childintime1mo ago· 2 in thread

A lossy compressor might also be useful for common floating point apps. The simplest compressor ever would just chop off a number of bits from the mantissa.

endukuOP1mo ago

Yeah, and also approximating a double (within range) to int32 :)

https://x.com/Densebit/status/1839705674378613043?s=20

dahart1mo ago

That code is absolutely terrible! Never do that. The range is awful, and the relative error is awful.

pella1mo ago· 1 in thread

> "fc is a lossless compressor for streams of IEEE-754 64-bit doubles."

  | Type           | Size    |Endian|
  |----------------|---------|-----|
  | `Int8`         | 1 byte  | N/A |
  | `UInt8`        | 1 byte  | N/A |
  | `Int16LE/BE`   | 2 bytes | Yes |
  | `UInt16LE/BE`  | 2 bytes | Yes |
  | `Int32LE/BE`   | 4 bytes | Yes |
  | `UInt32LE/BE`  | 4 bytes | Yes |
  | `Int64LE/BE`   | 8 bytes | Yes |
  | `UInt64LE/BE`  | 8 bytes | Yes |
  | `Float16LE/BE` | 2 bytes | Yes |
  | `Float32LE/BE` | 4 bytes | Yes |
  | `Float64LE/BE` | 8 bytes | Yes |
  | `BFloat16LE/BE`| 2 bytes | Yes |
  | `Bytes(n)`     | n bytes | N/A |

Some links:

- https://github.com/facebook/openzl/releases/tag/v0.2.0

- https://openzl.org/getting-started/introduction/

- https://openzl.org/sddl/sddl2-announcement/

- https://openzl.org/sddl/core-concepts/

endukuOP1mo ago

peterabbitcook1mo ago· 1 in thread

endukuOP1mo ago

Fair points on both. Thanks for aasking.

edit: i think the comments is flagged automatically because I used `vibe` (bad name I know) :)

loeg1mo ago· 1 in thread

The question is, how close can OpenLZ come? (This is from the same people who develop zstd, but suitable for structured data in a generic way.)

endukuOP1mo ago

Scaevolus1mo ago· 1 in thread

I see you have ALP, but have you tried Chimp128 or Arrow's byte stream split?

endukuOP1mo ago

abcd_f1mo ago· 1 in thread

The most interesting section - How It Works - could really elaborate on details a bit more.

endukuOP1mo ago

Agreed. will work on that :)

KerrickStaley1mo ago· 1 in thread

Another library in this space is pcodec; I'd appreciate a comparison of the two.

endukuOP1mo ago

Agreed; pcodec is probably one of the most relevant comparisons. I will add pcodec to teh benchmark

radford-neal1mo ago

rincebrain1mo ago

I must say, for a library advertising handling of streams of data, the absence of a stream utility to [input] | fc | fc -d surprised me.

j / k navigate · click thread line to collapse