All recordings were successfully compressed. Original size (bytes): 146,800,526 Compressed size (bytes): 123,624 Compression ratio: 1187.47
The eval.sh script was downloaded, and the files were decode and encode without loss, as verified using the "diff" function.
What do you think? Is this true?
https://www.linkedin.com/pulse/neuralink-compression-challen... context: https://www.youtube.com/watch?v=X5hsQ6zbKIo
As a trivial example, if your dataset is one trillion binary digits of pi, it is essentially incompressible by any regular compressor, but you can fit a generator well under 1 kB.
I'm all for challenges, but it is fairly standard to have prizes.
Until this A/D linearity problem is fixed, there is no point pursuing compression schemes. The data is so badly mangled it makes it pretty near impossible to find patterns.
Why didn't every other company think of this?
Yup:
"Submit with source code and build script."
But hey, the reward is a job. Maybe.
I mean, not everyone can be privileged enough to experience Ultra Hardcore™ toxic work culture.
The sample data compresses poorly, getting down to 4.5 bits per sample easily with very simple first-order difference encoding and an decent Huffman coder.
However, lets assume there is massive cross-correlation between the 1024 channels. For example, in the extreme they are all the same, meaning if we encode 1 channel we get the other 1023. That means a lower limit of 4.5/1024 = about 0.0045 bits per sample, or a compression rate of 2275. Viola!
If data patterns exist and can be found, then more complicated coding algorithms could achieve better compression, or tolerate more variations (i.e. less cross-correlation) between channels.
We may never know unless Neuralink releases a full data set, i.e. 1024 channels at 20KHz and 10 bits for 1 hour. That's a lot of data, but if they want serious analysis they should release serious data.
Finally, enforcing the requirement for lossless compression has no apparent reason. The end result -- correct data to control the cursor and so on -- is the key. Neuralink should allow challengers to submit DATA to a test engine that compares cursor output for noiseless data to results for the submitted data, and reports the match score, and maybe a graph or something. That sort of feedback might allow participants to create a satisfactory lossy compression scheme.
It's 2275X
That's the compression ratio for complete cross correlation. It's (10 bits uncompressed / 4.5 bits compressed on 1 channel) * 1024 channels
Here’s why: https://x.com/raffi_hotter/status/1795910298936705098
https://x.com/JohnSmi48253239/status/1794328213923188949?t=_...
Does it mean radio is using portion of this 10mW? If so, how much?