I wonder how it scales with the numbers of blocks (e.g. with 10 TB of data)? I would guess that it would really help with the amount of data stored but then slow way down as more data gets added due to the extra overhead of tracking and deduping.
Also, does it work across repositories?
What specifically is meant by "Modern Development Experience"? I see github eliminating a lot of useful development pain points with its tools for repository collaboration and integration, which is what I associate with the current trend with repositories, but this post is focusing just on having data with the code which is an interesting assumption...
One of core motivations behind XetHub is to enable teams across industries to benefit from the workflow we've used in software for 15+ years. We've used this workflow for so long it is easy to overlook its benefits.
Software teams have a clear picture of who is working on what, what is in flight, what is in review, and what is remaining. Anyone on the team can easily pick up work in progress from someone else or start a new derivation of work without concern about interference. Teams can be distributed across timezones and yet everyone feels connected to the project and is able to contribute without disruption.
The power of a GitHub-style workflow for team collaboration comes from being able to experiment freely (branches or forks), review easily (pull requests), and observe (passively learn) best practices from the team (issues, code review feedback).
Last year we benchmarked this set along with LakeFS, should we add LakeFS back to this set?