undefined | Better HN

0 pointsmort962y ago0 comments

FYI, your formatting is broken. Hacker News doesn't support backtick code blocks, you have to indent code.

Anyway, so... the xz project has been compromised for a long time, at least since 5.4.5. I see that this JiaT75 guy has been the primary guy in charge of at least the GitHub releases for years. Should we view all releases after he got involved as probably compromised?

0 comments

treffer2y ago

Thank you, formatting fixed.

My TLDR is that I would regard all commits by JiaT75 as potentially compromised.

Given the ability to manipulate gitnhistory I am not sure if a simple time based revert is enough.

It would be great to compare old copies of the repo with the current state. There is no guarantee that the history wasn't tampered with.

Overall the only safe action would IMHO to establish a new upstream from an assumed good state, then fully audit it. At that point we should probably just abandon it and use zstd instead.

ogurechny2y ago

Zstd belongs to the class of speed-optimized compressors providing “tolerable” compression ratios. Their intended use case is wrapping some easily compressible data with negligible (in the grand scale) performance impact. So when you have a server which sends gigabits of text per second, or caches gigabytes of text, or processes a queue with millions of text protocol messages, you can add compression on one side and decompression on the other to shrink them without worrying too much about CPU usage.

Xz is an implant of 7zip's LZMA(2) compression into traditional Unix archiver skeleton. It trades long compression times and giant dictionaries (that need lots of memory) for better (“much-better-than-deflate”) compression ratios. Therefore, zstd, no matter how fashionable that name might be in some circles, is not a replacement for xz.

It should also be noted that those LZMA-based archive formats might not be considered state-of-the-art today. If you worry about data density, there are options for both faster compression at the same size, and better compression in the same amount of time (provided that data is generally compressible). 7zip and xz are widespread and well tested, though, and allow decompression to be fast, which might be important in some cases. Alternatives often decompress much slowly. This is also a trade-off between total time spent on X nodes compressing data, and Y nodes decompressing data. When X is 1, and Y is in the millions (say, software distribution), you can spend A LOT of time compressing even for relatively minuscule gains without affecting the scales.

It should also be noted that many (or most) decoders of top compressing archivers are implemented as virtual machines executing chains of transform and unpack operations defined in archive file over pieces of data also saved there. Or, looking from a different angle, complex state machines initializing their state using complex data in the archive. Compressor tries to find most suitable combination of basic steps based on input data, and stores the result in the archive. (This is logically completed in neural network compression tools which learn what to do with data from data itself.) As some people may know, implementing all that byte juggling safely and effectively is a herculean task, and compression tools had exploits in the past because of that. Switching to a better solution might introduce a lot more potentially exploited bugs.

treffer2y ago

Arch Linux switched switched from xz to zstd, with neglectable increase in size (<1%) but massive speedup on decompression. This is exactly the use case of many people downloading ($$$) and decompressing. It is the software distribution case. Other distributions are following that lead.

You should use ultra settings and >=19 as the compression level. E.g. arch used 20 and higher compression levels do exist, but they were already at a <1% increase.

It does beat xz for these tasks. It's just not the default settings as those are indeed optimized for the lzo to gzip/bzip2 range.

3 more replies

shanipribadi2y ago

Looking forward to the time when Meta will make https://github.com/facebookincubator/zstrong.git public

found it mentioned in https://github.com/facebook/proxygen/blob/main/build/fbcode_..., looks like it's going to be cousin of zstd, but maybe for the stronger compression use cases

tomrod2y ago

Not just Jia. There are some other accounts of concern with associated activity or short term/bot-is names.

jdright2y ago

yes, like this one: https://github.com/facebook/folly/pull/2153

1 more reply

joveian2y ago

Note that zstd (the utility) currently links to liblzma since it can compress and decompress other formats.

account422y ago

Lol as if there weren't enough general archivers already.

account422y ago

> Given the ability to manipulate gitnhistory I am not sure if a simple time based revert is enough.

Rewritten history is not a real concern because it would have been immediately noticed by anyone updating an existing checkout.

> Overall the only safe action would IMHO to establish a new upstream from an assumed good state, then fully audit it. At that point we should probably just abandon it and use zstd instead.

This is absurd and also impossible without breaking backwards compatibility all over the place.

j / k navigate · click thread line to collapse

0 comments

treffer2y ago

Thank you, formatting fixed.

My TLDR is that I would regard all commits by JiaT75 as potentially compromised.

Given the ability to manipulate gitnhistory I am not sure if a simple time based revert is enough.

It would be great to compare old copies of the repo with the current state. There is no guarantee that the history wasn't tampered with.

Overall the only safe action would IMHO to establish a new upstream from an assumed good state, then fully audit it. At that point we should probably just abandon it and use zstd instead.

ogurechny2y ago

treffer2y ago

You should use ultra settings and >=19 as the compression level. E.g. arch used 20 and higher compression levels do exist, but they were already at a <1% increase.

It does beat xz for these tasks. It's just not the default settings as those are indeed optimized for the lzo to gzip/bzip2 range.

3 more replies

shanipribadi2y ago

Looking forward to the time when Meta will make https://github.com/facebookincubator/zstrong.git public

found it mentioned in https://github.com/facebook/proxygen/blob/main/build/fbcode_..., looks like it's going to be cousin of zstd, but maybe for the stronger compression use cases

tomrod2y ago

Not just Jia. There are some other accounts of concern with associated activity or short term/bot-is names.

jdright2y ago

yes, like this one: https://github.com/facebook/folly/pull/2153

1 more reply

joveian2y ago

Note that zstd (the utility) currently links to liblzma since it can compress and decompress other formats.

account422y ago

Lol as if there weren't enough general archivers already.

account422y ago

> Given the ability to manipulate gitnhistory I am not sure if a simple time based revert is enough.

Rewritten history is not a real concern because it would have been immediately noticed by anyone updating an existing checkout.

> Overall the only safe action would IMHO to establish a new upstream from an assumed good state, then fully audit it. At that point we should probably just abandon it and use zstd instead.

This is absurd and also impossible without breaking backwards compatibility all over the place.

j / k navigate · click thread line to collapse