Git archive generation meets Hyrum's law (opens in new tab)

(lwn.net)

136 pointsJamesCoyne3y ago76 comments

76 comments

59 comments · 13 top-level

skywal_l3y ago· 7 in thread

Everybody who had to maintain an API knows this.

1. You can't just rely on documentation ("we never said we would guarantee this or that") to push back on your users' claims that you introduced a breaking change. If you care more about your documentation than your users, they will turn their back on you.

2. However if you start guaranteeing too much stability, innovation and change will become too costly or even impossible. In this instance, if the git team has to guarantee their hashes (which seems impossible anyway because it depends on the external gzip program) then they can never improve on compression.

Tough situation to be in.

ablob3y ago

Someone once stated that every observable behaviour will be depended upon by someone sooner or later.

I can only imagine someone going to great lengths to avoid such "a stable order of operations was never guaranteed" discussion by just randomizing the order of execution or something similar (I bet someone will then use that as a seed for prng).

edit: skipping the first paragraph lead to repeating hyrums law.

Macha3y ago

I once got a bug report from a user's manager about the values in our application's private database.

It was an internal user interface, intended for employees of our conpany. Once upon a time, we had a process for adding a new record where it had to be added manually to multiple internal systems. So the internal UI had its own copy of the data. But then we built a single source of truth for this data source, that single source of truth had an API which our application would query and so the database table updates were abandoned as they were for the database's internal use only, however nobody ever bothered to remove the old table with a few hundred rows of stale data.

Two years later, we got the bug report then. The users' manager was complaining that the dataset was incomplete, that it was impeding his work, and that it needed to be fixed asap.

It turned out at some stage he had requested and was granted read only access to that DB, and had been querying the records of user actions in that DB to track the volume and quality of work his subordinates did. And then at some other point he realised that he could join against this table to get readable labels rather than opaque identifiers for the types of data said reports were working on. Except of course, the data was two years stale so he was noticing an increasing amount of "missing" labels in his report.

Said user escalated all the way to a VP of engineering before accepting that no, a private database is not a supported interface of our product.

2 more replies

avgcorrection3y ago

> Someone once stated that every observable behaviour will be depended upon by someone sooner or later.

......... Hyrum’s law?

1 more reply

CuriousCosmic3y ago

> I can only imagine someone going to great lengths to avoid such "a stable order of operations was never guaranteed" discussion by just randomizing the order of execution or something similar (I bet someone will then use that as a seed for prng).

Bingo. https://news.ycombinator.com/item?id=34631275#34636529

rwmj3y ago

The issue is that they didn't look closely at what their users / API consumers were actually doing. Even a cursory look at CI, packaging systems etc would have seen that those were expecting the hashes to be stable. If they'd done that early enough they might have been able to plan a transition to unstable hashes, or at least been able to emphasise the problem in documentation.

BeFlatXIII3y ago

However, you can force through the change if you are Google, GitHub, or Cloudflare. The users will still complain, but where will they go?

efreak3y ago

Sourceforge?

DelightOne3y ago· 7 in thread

Can't Github just keep the old archive as it is for the already-existing releases and use the new format for new releases? Over time old releases phase out and the advantage of the new format is completely in effect. You can even use a time-based cut-off date if you somehow want to get it in sync.

cratermoon3y ago

The article explicitly says, "Internally, this archive is created at request time by the `git archive` subcommand". In other words, there is no pre-existing archive and apparently no cache of generated archives. Which means a request for an archive gets one generated with whatever format is in effect at that moment.

Why github doesn't cache archives instead of regenerating them on the fly is unclear, and maybe something the developers should address. Or maybe there was a cache and it got blown away by the change that caused the archive checksums to change.

friendzis3y ago

Yes, they can. The thing is that tarballs are not part of release artifacts, even though they do appear to be among them. If you look closely, even the endpoints that user uploaded artifacts and generated tarballs point to are different.

Github could just generate the tarball once and store it in the same way as other release artifacts. But for some reason they chose not to.

usr11063y ago

Some reason? Storage is not free after all and many tarballs won't be fetched frequently I'd assume.

1 more reply

ARandomerDude3y ago

> Can't Github just…You can even…

In my experience this sort of simplistic proposal by an outsider is almost always born of ignorance about the complexities of the actual system.

brigandish3y ago

Isn't that why it's a question?

ilyt3y ago

I'd imagine they create it on demand and just cache for some time. That's way less storage needed than having every single commit be also a tarball stored somewhere

bobbylarrybobby3y ago

Mentioned in the article

pcj-github3y ago· 6 in thread

If it can't be made stable, `git archive` should specifically add random content (under a feature flag to be removed after a year or two) to so as to make the generated checksum completely unreliable and force users to adopt different workflows.

astockwell3y ago

Golang does this with hashmaps, it deliberately randomizes the keys’ order so that you can never depend on it. People hated it for a few months, but now it’s just another known idiom.

cpeterso3y ago

Hyrum’s Law applies, even when you take into account Hyrum’s Law. :)

A story about a Golang program that had assumed map iteration was uniformly random but it’s not, which caused a load balancer to assign work unevenly:

https://dev.to/wallyqs/gos-map-iteration-order-is-not-that-r...

A graph of the map iteration order’s distribution showing that it’s not uniformly random:

https://twitter.com/cafxx/status/1135190309514620928

2 more replies

ilyt3y ago

Perl did that too IIRC, although motivation was to avoid predictable hash collisions that might put program into slow path

1 more reply

ahrzb3y ago

That’s mainly for security reasons, mainly preventing Hash DDoS attacks, therefore a requirement.

dunham3y ago

Randomly ordering the files, without adding random content, would do the trick.

nikonyrh3y ago

Interesting idea, to make "undefined behavior" explicit.

avgcorrection3y ago· 5 in thread

Mob engineering: you don’t have to read the documentation if a million other people also do not.

ajross3y ago

I think that's uncharitable. Almost no one realized these things were being generated. We all assumed that links to github's "releases" were just links to files because they look like links to files! Here's one to Zephyr 3.2.0: https://github.com/zephyrproject-rtos/zephyr/archive/refs/ta...

You pull that and get a tarball that is presented to the world as an "official release". Looks like a file. Acts like a file. It's a file.

So now your package manager or reproducible build engine or whatever needs a reference to the "official source code release", and what do you point it to? That file, obviously. It's right there on the "release" page for the download. And of course you checksum it for security, because duh.

Then last week all of a sudden that file changed! Sure, it has the same contents. But the checksum that you computed in good faith based on the official release tarball doesn't match!

If there's a misunderstanding here, it's on github and not the users. They can't be providing official release tarballs if they won't guarantee consistency. "As documented", this feature was a huge footgun[1]. That's bad.

[1] Actually it's worse: as documented it's basically useless. If you can't externally validate the results of that archive file, then the only way to use it is to tell your users that they have to trust Microsoft not to do anything bad, because you can't make any promise about the file that they can verify!

dwattttt3y ago

The contents of the archive could be verified rather than the archive itself, since that's what needs to not change. If the compression level changed the archives would also have a different checksum, but no one would say they require archive's to always have a specified compression level.

The fact it looked like an immutable file is much more relevant though.

AlotOfReading3y ago

This is entirely right. The feature as it exists is insane if it doesn't guarantee consistent hashes. However, there are alternatives even in the face of an adversarial GitHub. Every software project could agree on a manifest format and some kind of PKI/WoT to distribute certificates.

Pigs would fly first, but it's possible!

avgcorrection3y ago

> We all assumed that

Uh huh.

colatkinson3y ago

In this case, it seems that GitHub was asked about it. From the thread linked in the article:

> After a fruitful exchange with GitHub support staff, I was able to confirm the following (quoting with their permission):

>> I checked with our team and they confirmed that we can expect the checksums for repository release archives, found at /archive/refs/tags/$tag, to be stable going forward. That cannot be said, however, for repository code download archives found at archive/v6.0.4.

>> It's totally understandable that users have come to expect a stable and consistent checksum value for these archives, which would be the case most of the time. However, it is not meant to be reliable or a way to distribute software releases and nothing in the software stack is made to try to produce consistent archives. This is no different from creating a tarball locally and trying verify it with the hash of the tarball someone created on their own machine.

>> If you had only a tag with no associated release, you should still expect to have a consistent checksum for the archives at /archive/refs/tags/$tag.

> In summary: It is safe to reference archives of any kind via the /refs/tags endpoint, everything else enjoys no guarantees.

(posted 4 Feb 2022)

https://github.com/bazel-contrib/SIG-rules-authors/issues/11...

There's even a million linked PRs and issues where people went around and specifically updated their code to point to the URLs that were, nominally, stable.

I suspect that the GH employee who made these comments just misunderstood how these archives were being generated, or the behavior was depending on some internal implementation detail that got wiped away at some point. But if an employee at a big-ass company publicly says "yeah that's supported" to employees at another big-ass company, people are gonna take it as somewhat official.

syntheticnature3y ago· 5 in thread

2018 Gentoo-dev called, wants to let you know this is old news: https://www.mail-archive.com/gentoo-dev@lists.gentoo.org/msg...

bentley3y ago

Indeed. The proper thing (also read as: the friendliest way for distro packagers) is for software projects to generate and publish a tarball themselves as part of their tag+release process.

That provides multiple advantages. Unlike GitHub’s unreliable automatically generated files, a fixed file can be hashed or cryptographically signed by the project (with SSH signatures, Signify, PGP, etc.), and later verified without having to extract the files first or check out the underlying repo.

Another thing many projects aren’t aware of: if your project uses Git submodules, anyone using GitHub’s autogenerated tarballs will be unable to build your software, because those don’t contain submodules.

ajross3y ago

> Unlike GitHub’s unreliable automatically generated files, a fixed file can be hashed or cryptographically signed by the project (with SSH signatures, Signify, PGP, etc.), and later verified without having to extract the files first

Or how about this: Microsoft could provide that as a feature in their "official release" page for projects, which is exactly what we all thought that page was for in the first place.

Seriously: if archive links are unreliable they're basically useless anyway. Who wants tarballs in the modern world except for package management or build automation?

1 more reply

elesiuta3y ago

> Indeed. The proper thing (also read as: the friendliest way for distro packagers) is for software projects to generate and publish a tarball themselves as part of their tag+release process.

And this is easy enough to do automatically with GitHub actions, I have a workflow [1] which runs on each release to create a stable archive of the source and attaches it to the release.

[1] https://github.com/elesiuta/picosnitch/blob/master/.github/w...

GauntletWizard3y ago

The proper thing is for the software build processes that rely on tarballs from GitHub to switch to using git directly; either by shallow clone or storing a full repo and checking out worktrees as appropriate. Tarballs at a tagged revision are fine as release artifacts if your upstream is publishing them as release artifacts, but the whole point of this is that they aren't.

jenadine3y ago

> And no, a git tag is not a release.

Why not?

jmclnx3y ago· 5 in thread

> more easily support compression across operating systems

I cannot help but wonder if this change was forced upon github by Microsoft because gzip is GPL 3, maybe this other version is a clean room clone. We all know corporations hate GPLv3, including the large corporation I work for.

https://www.gnu.org/software/gzip/

eklitzke3y ago

First of all the change was made upstream in git, which is not controlled by GitHub (even though GitHub does have some developers who work on git). And the stated reason (not relying on third party tools/libraries) is compatible with many other changes made to git over its history, e.g. the conversion of many git commands from Perl to C.

Furthermore, gzip isn't even necessarily the best tool to produce gzip data. If you want multi-core parallelism there's pigz, and if you're willing to trade higher CPU usage to get a better compression ratio you can use zopfli. I don't know the details of the implementation in git and whether it tries to leverage multi-threading or zopfli-like techniques, but the point stands that gzip isn't the final word on producing gzip data.

fukawi23y ago

It was git that implemented the change, then github upgraded to the affected version. AFAIK, MS has no influence over upstream git.

As much as I distrust Microsoft, I don't think there were any ulterior motives here.

MikusR3y ago

IIRC the change was made by git-for-windows developer. And some of top contributors for git are from Github/Microsoft.

vore3y ago

If this were true, this would have been a problem a long time ago. Why would Microsoft wait such a long time to change this when under your assumption it would have been a continuous legal liability?

int_19h3y ago

Why would that matter for an internal tool? Esp. there's plenty of much more visible GPLv3 bits in WSL images that ship via the Windows app store.

jancsika3y ago· 3 in thread

> Hyrum's law

Didn't Google beat Hyrum's law by using their weight to force middleboxes to accept some variation in some datum of an http header or something?

Edit: hint: something about rotating a value for some number of decades. Either forcing the hand of middleboxes or CAs, I can't remember. In either case, it seemed like a real pain in the ass to keep the API observability concrete from hardening. :)

X-Cubed3y ago

You might be thinking of GREASE for TLS: https://chromestatus.com/feature/6475903378915328

jancsika3y ago

Yep, thanks! So not middleboxes or CAs, but old rusty servers.

The other example of evading Hyrum's Law that comes to mind was when early Javascript users of JSON observed how they could intersperse comments.

Crocker said he noticed people were using the comments to stuff preprocessing directives into JSON.

He then devised the most ingenious hack: He told people they weren't allowed to put comments in JSON. Then people stopped putting comments in JSON.

I'm starting to wonder whether Hyrum's Law is really more of a suggestion. :)

BeFlatXIII3y ago

That's the advantage of being Google, GitHub, or Cloudflare.

cratermoon3y ago· 3 in thread

From the post: "it may well become necessary for anybody who wants consistent results to decompress archive files before checking checksums."

I'm certain there's some exploit waiting to subvert the decompress algorithm and substitute malicious content in place of the actual archive files.

MereInterest3y ago

Depending on the format, yes. Usually it is by exhausting some resource, such as a file that decompresses to an impossibly large dummy file. If the dummy file crashes an analyzer of the compressed archive, then other malicious files could be hidden.

https://en.wikipedia.org/wiki/Zip_bomb

hamandcheese3y ago

Even before the untar phase, couldn't there be a vulnerability in:

- your HTTPS stack - gzip encoded HTTP - your sha256 program

ilyt3y ago

You'd probably want to limit the size of decompression so no 10TB of zeroes compressed to small size. But that's not that hard

vlovich1233y ago· 2 in thread

I wonder if transfer encoding the archive might be a better strategy. The client benefits from a stable format (tar) provided it’s generated in a stable order which generally easier for the server to guarantee. The network transfer occurs transparently compressed (transfer-encoding header in http parlance).

Checksums still work and protect against malicious tarballs which are generally riskier to unpack than plain steam compression / decompression. The server and client gets the smaller file transfers and compression improvements can evolve transparently by negotiating the transfer encoding. The server can still cache the encoded form to avoid needing to compress the same file repeatedly.

Seems like a win win solution without requesting a drastic redesign of package managers everywhere and everyone walks away having won the properties of the system they value.

ilyt3y ago

Could just calculate checksum for decompressed archive at that point. Still want to store it on server/client as compressed file and there is no point making transfer side more complex.

Would probably want to store expected file size together with checksum to avoid the "compressed stream of endless zeroes" attack vector

vlovich1233y ago

To be fair, the transfer side is already HTTP which already has the transfer encoding support (client and server) so it should be transparent as l long as standards compliant clients are in use (the default ones always are in terms of transparent decoding) and CDNs are perfectly capable of caching the compressed response afaik.

The main simplification is that there’s less work on the current side at scale - the file you download is the file you checksum. That’s different if package maintainers have to do it manually for each package.

Can you say more about the endless zeroes attack? Are you thinking about finding a sha256 collision? You have to keep computing the sha256 for every new byte which is expensive. And if that ever becomes practical, the ecosystem will switch to 384, 512 or 512/256. But sure, storing file size + hash is generally a good idea to make it that much harder (in practice no one bothers and this advice would apply regardless of compression or not because the expensive bit is the digest computation to find a collision)

AJRF3y ago· 1 in thread

HANG ON!

I think this just made me realise an issue I was having with Swift Package Manager a few months back. We have a bunch of ObjC frameworks in our app that we don't want people to update anymore so we can rewrite them, and we just threw them all into a big umbrella project, but for some reason we couldn't get the binary target URL from Github Enterprise to work on our self hosted Enterprise instance because the checksum would be different every time, but it worked perfectly for Github Cloud.

Is there anyone from Github here - Can you confirm that is the cause of issue for GH Enterprise?

jamesfinlayson3y ago

Might be related to https://github.blog/changelog/2023-01-30-git-archive-checksu... somehow, though that was only live for a couple of days last month I thought.

travisgriggs3y ago· 1 in thread

Had to follow the links to figure out what Hyrum’s Law was (I like laws). The best link from that law is the obligatory xkcd at the very bottom. Reshared here:

https://xkcd.com/1172/

ElijahLynn3y ago

I found a good video about this too https://lwn.net/SubscriberLink/921787/949cf79f2599f734/ (Original Post) --> https://www.hyrumslaw.com/ --> https://twitter.com/hyrumwright --> https://twitter.com/dret/status/1573897062785032192 --> https://www.youtube.com/watch?v=5Wdgjw6IGDM (Hyrum's Law: Hyrum Wright on Programming over Time - Interview of by Erik Wilde)

meling3y ago· 1 in thread

Couldn’t they include two checksums; one for compressed, and if that fails, decompress and check the uncompressed content?

viraptor3y ago

Typically when working with automated deployment/build systems you don't want to decompress unknown data for security reasons. Checking the checksum of the compressed content solves that issue.

mjw10073y ago

I don't think this is really an example of Hyrum's law. Hyrum's law claims that even if you carefully document your contract, someone will rely on the observable behaviour rather than the documentation anyway.

But this is an example of a much weaker proposition: if you don't document your contract, then people will guess what the contract is and some of them will guess wrong.

(In fact in this case it seems it's more like "if you don't document your contract and your support staff sometimes say the behaviour is A, people will rely on the behaviour being A".)

1 more reply

j / k navigate · click thread line to collapse

76 comments

59 comments · 13 top-level

skywal_l3y ago· 7 in thread

Everybody who had to maintain an API knows this.

Tough situation to be in.

ablob3y ago

Someone once stated that every observable behaviour will be depended upon by someone sooner or later.

edit: skipping the first paragraph lead to repeating hyrums law.

Macha3y ago

I once got a bug report from a user's manager about the values in our application's private database.

Two years later, we got the bug report then. The users' manager was complaining that the dataset was incomplete, that it was impeding his work, and that it needed to be fixed asap.

Said user escalated all the way to a VP of engineering before accepting that no, a private database is not a supported interface of our product.

2 more replies

avgcorrection3y ago

> Someone once stated that every observable behaviour will be depended upon by someone sooner or later.

......... Hyrum’s law?

1 more reply

CuriousCosmic3y ago

Bingo. https://news.ycombinator.com/item?id=34631275#34636529

rwmj3y ago

BeFlatXIII3y ago

However, you can force through the change if you are Google, GitHub, or Cloudflare. The users will still complain, but where will they go?

efreak3y ago

Sourceforge?

DelightOne3y ago· 7 in thread

cratermoon3y ago

friendzis3y ago

Github could just generate the tarball once and store it in the same way as other release artifacts. But for some reason they chose not to.

usr11063y ago

Some reason? Storage is not free after all and many tarballs won't be fetched frequently I'd assume.

1 more reply

ARandomerDude3y ago

> Can't Github just…You can even…

In my experience this sort of simplistic proposal by an outsider is almost always born of ignorance about the complexities of the actual system.

brigandish3y ago

Isn't that why it's a question?

ilyt3y ago

I'd imagine they create it on demand and just cache for some time. That's way less storage needed than having every single commit be also a tarball stored somewhere

bobbylarrybobby3y ago

Mentioned in the article

pcj-github3y ago· 6 in thread

astockwell3y ago

Golang does this with hashmaps, it deliberately randomizes the keys’ order so that you can never depend on it. People hated it for a few months, but now it’s just another known idiom.

cpeterso3y ago

Hyrum’s Law applies, even when you take into account Hyrum’s Law. :)

A story about a Golang program that had assumed map iteration was uniformly random but it’s not, which caused a load balancer to assign work unevenly:

https://dev.to/wallyqs/gos-map-iteration-order-is-not-that-r...

A graph of the map iteration order’s distribution showing that it’s not uniformly random:

https://twitter.com/cafxx/status/1135190309514620928

2 more replies

ilyt3y ago

Perl did that too IIRC, although motivation was to avoid predictable hash collisions that might put program into slow path

1 more reply

ahrzb3y ago

That’s mainly for security reasons, mainly preventing Hash DDoS attacks, therefore a requirement.

dunham3y ago

Randomly ordering the files, without adding random content, would do the trick.

nikonyrh3y ago

Interesting idea, to make "undefined behavior" explicit.

avgcorrection3y ago· 5 in thread

Mob engineering: you don’t have to read the documentation if a million other people also do not.

ajross3y ago

You pull that and get a tarball that is presented to the world as an "official release". Looks like a file. Acts like a file. It's a file.

Then last week all of a sudden that file changed! Sure, it has the same contents. But the checksum that you computed in good faith based on the official release tarball doesn't match!

dwattttt3y ago

The fact it looked like an immutable file is much more relevant though.

AlotOfReading3y ago

Pigs would fly first, but it's possible!

avgcorrection3y ago

> We all assumed that

Uh huh.

colatkinson3y ago

In this case, it seems that GitHub was asked about it. From the thread linked in the article:

> After a fruitful exchange with GitHub support staff, I was able to confirm the following (quoting with their permission):

>> If you had only a tag with no associated release, you should still expect to have a consistent checksum for the archives at /archive/refs/tags/$tag.

> In summary: It is safe to reference archives of any kind via the /refs/tags endpoint, everything else enjoys no guarantees.

(posted 4 Feb 2022)

https://github.com/bazel-contrib/SIG-rules-authors/issues/11...

There's even a million linked PRs and issues where people went around and specifically updated their code to point to the URLs that were, nominally, stable.

syntheticnature3y ago· 5 in thread

2018 Gentoo-dev called, wants to let you know this is old news: https://www.mail-archive.com/gentoo-dev@lists.gentoo.org/msg...

bentley3y ago

Indeed. The proper thing (also read as: the friendliest way for distro packagers) is for software projects to generate and publish a tarball themselves as part of their tag+release process.

ajross3y ago

Or how about this: Microsoft could provide that as a feature in their "official release" page for projects, which is exactly what we all thought that page was for in the first place.

Seriously: if archive links are unreliable they're basically useless anyway. Who wants tarballs in the modern world except for package management or build automation?

1 more reply

elesiuta3y ago

> Indeed. The proper thing (also read as: the friendliest way for distro packagers) is for software projects to generate and publish a tarball themselves as part of their tag+release process.

And this is easy enough to do automatically with GitHub actions, I have a workflow [1] which runs on each release to create a stable archive of the source and attaches it to the release.

[1] https://github.com/elesiuta/picosnitch/blob/master/.github/w...

GauntletWizard3y ago

jenadine3y ago

> And no, a git tag is not a release.

Why not?

jmclnx3y ago· 5 in thread

> more easily support compression across operating systems

https://www.gnu.org/software/gzip/

eklitzke3y ago

fukawi23y ago

It was git that implemented the change, then github upgraded to the affected version. AFAIK, MS has no influence over upstream git.

As much as I distrust Microsoft, I don't think there were any ulterior motives here.

MikusR3y ago

IIRC the change was made by git-for-windows developer. And some of top contributors for git are from Github/Microsoft.

vore3y ago

If this were true, this would have been a problem a long time ago. Why would Microsoft wait such a long time to change this when under your assumption it would have been a continuous legal liability?

int_19h3y ago

Why would that matter for an internal tool? Esp. there's plenty of much more visible GPLv3 bits in WSL images that ship via the Windows app store.

jancsika3y ago· 3 in thread

> Hyrum's law

Didn't Google beat Hyrum's law by using their weight to force middleboxes to accept some variation in some datum of an http header or something?

X-Cubed3y ago

You might be thinking of GREASE for TLS: https://chromestatus.com/feature/6475903378915328

jancsika3y ago

Yep, thanks! So not middleboxes or CAs, but old rusty servers.

The other example of evading Hyrum's Law that comes to mind was when early Javascript users of JSON observed how they could intersperse comments.

Crocker said he noticed people were using the comments to stuff preprocessing directives into JSON.

He then devised the most ingenious hack: He told people they weren't allowed to put comments in JSON. Then people stopped putting comments in JSON.

I'm starting to wonder whether Hyrum's Law is really more of a suggestion. :)

BeFlatXIII3y ago

That's the advantage of being Google, GitHub, or Cloudflare.

cratermoon3y ago· 3 in thread

From the post: "it may well become necessary for anybody who wants consistent results to decompress archive files before checking checksums."

I'm certain there's some exploit waiting to subvert the decompress algorithm and substitute malicious content in place of the actual archive files.

MereInterest3y ago

https://en.wikipedia.org/wiki/Zip_bomb

hamandcheese3y ago

Even before the untar phase, couldn't there be a vulnerability in:

- your HTTPS stack - gzip encoded HTTP - your sha256 program

ilyt3y ago

You'd probably want to limit the size of decompression so no 10TB of zeroes compressed to small size. But that's not that hard

vlovich1233y ago· 2 in thread

Seems like a win win solution without requesting a drastic redesign of package managers everywhere and everyone walks away having won the properties of the system they value.

ilyt3y ago

Could just calculate checksum for decompressed archive at that point. Still want to store it on server/client as compressed file and there is no point making transfer side more complex.

Would probably want to store expected file size together with checksum to avoid the "compressed stream of endless zeroes" attack vector

vlovich1233y ago

AJRF3y ago· 1 in thread

HANG ON!

Is there anyone from Github here - Can you confirm that is the cause of issue for GH Enterprise?

jamesfinlayson3y ago

Might be related to https://github.blog/changelog/2023-01-30-git-archive-checksu... somehow, though that was only live for a couple of days last month I thought.

travisgriggs3y ago· 1 in thread

Had to follow the links to figure out what Hyrum’s Law was (I like laws). The best link from that law is the obligatory xkcd at the very bottom. Reshared here:

https://xkcd.com/1172/

ElijahLynn3y ago

meling3y ago· 1 in thread

Couldn’t they include two checksums; one for compressed, and if that fails, decompress and check the uncompressed content?

viraptor3y ago

Typically when working with automated deployment/build systems you don't want to decompress unknown data for security reasons. Checking the checksum of the compressed content solves that issue.

mjw10073y ago

But this is an example of a much weaker proposition: if you don't document your contract, then people will guess what the contract is and some of them will guess wrong.

(In fact in this case it seems it's more like "if you don't document your contract and your support staff sometimes say the behaviour is A, people will rely on the behaviour being A".)

1 more reply

j / k navigate · click thread line to collapse