Announcing GVFS: Git Virtual File System (opens in new tab)

(blogs.msdn.microsoft.com)

805 pointsjanwh9y ago274 comments

274 comments

158 comments · 40 top-level

kentt9y ago· 21 in thread

It's disappointing that all the comments are so negative. This is a great idea and solves a real problem for a lot of use cases.

I remembering years ago Facebook says it had this problem. A lot of the comments were centered around that you could change your codebase to for what git can do. I'm glad there's another option now.

mox19y ago

Yes they did. They choose to scale out Mecurial to solve their problem. Wonder if they still use Mercurial?

https://code.facebook.com/posts/218678814984400/scaling-merc...

kyrra9y ago

Both Facebook and Google are continuing to contribute to Mercurial, so they both have some vested interest in it. If you poke around the commits on the repo[0] you'll see commits from people with @fb.com and @google.com email addresses. The mailing lists also has activity from both companies still.

As well, the Mercurial team does quarterly sprints (I believe), and Google is hosting the next one[1].

[0] https://www.mercurial-scm.org/repo/hg

[1] https://www.mercurial-scm.org/wiki/4.2sprint

1 more reply

janwhOP9y ago

They do. Durham Goode (Tech Lead on Source Control at Facebook) just held a talk at Git-Merge about how they scaled Mercurial at Fb. They seem to be quite happy with it, albeit applying quite a few restrictions on their internal users that are not really transferable to the general (outside-corporate) usage of VCS (for example only rebases are allowed, directly committing to master all the time, etc.)

1 more reply

jbergens9y ago

They did a couple of months ago so I assume they still do.

outcoldman9y ago

Also don't think that this is a good idea. Git is a Distributed Version Control https://en.wikipedia.org/wiki/Distributed_version_control, the main benefit of which is "allows many software developers to work on a given project without requiring them to share a common network". Seems like with GVFS they are making DVC to be a CVS (https://en.wikipedia.org/wiki/Concurrent_Versions_System) again. What is the point? There are a lot of good CVS systems around. They just to give cool kids access to cool tools? I believe there are plenty bridges between CVS and git already implemented, which also allows you to checkout only part of the CVS tree.

At Splunk we had the same problem, our source code was stored in CVS (perforce), but we wanted to switch to git. And not only because we really wanted to use git, but to simplify our development process, mainly because of the much easier branching model (lightweight branching also is available in perforce, but to get it we still needed to do some upgrades on our servers). We also had a problem that at the beginning we had very large working tree, don't think it was 200-300Gb, I believe it was 10x less, and actually required 4-5 seconds for git status. This was not appropriate for us, so we worked on our source code and release builds to split it in several git repos to make sure that git status will take not more than 0.x seconds.

My point is use right tools for right jobs. 4-5 seconds for git status is still a huge problem, I would prefer to use CVS instead if that will not require me to wait 5 seconds for each git status invocation.

klodolph9y ago

> I believe there are plenty bridges between CVS and git already implemented, which also allows you to checkout only part of the CVS tree.

How many of them have you used? I've used a couple, to interact with large code bases on the rough order of 300GB. In my experience they don't work very well, because you have to be hygienic about the commands you run or some part of your Git state gets out of sync with some part of your state for the other source control system. So I gave up on those, and I use something similar to Microsoft's solution at work on a daily basis. It's a real pleasure by comparison, and in spite of that I still call myself a Git fan (about 10 years of heavy Git use now). At work the code base is monolithic and everyone commits directly to trunk (at a ridiculous rate, too).

I've heard horror stories about back when people had to do partial checkouts of the source code, and I'm glad that the tooling I use is better.

The idea of breaking up a repository merely because it is too large reminds me the story of the drunkard looking for his keys under the streetlights. The right tools for the right job, sometimes you change the job to match the tools, and sometimes you change the tools to match the job.

2 more replies

jonknee9y ago

> Seems like with GVFS they are making DVC to be a CVS again. What is the point?

It sounds like they answered that:

> In a repo that is this large, no developer builds the entire source tree. Instead, they typically download the build outputs from the most recent official build, and only build a small portion of the sources related to the area they are modifying. Therefore, even though there are over 3 million files in the repo, a typical developer will only need to download and use about 50-100K of those files.

Source will still be distributed among the developers that touch it. Seems like a decent compromise.

1 more reply

acqq9y ago

> Seems like with GVFS they are making DVC to be a CVS

> just to give cool kids access to cool tools

Yes. DVCS with the huge code bases, large binary objects and large teams is hardly the optimal approach. But the "cool kids" are just used to use what they use. And now they can pretend to do it even when they have to be always connected, because the files are virtual and remain on the server until really used.

If Microsoft is giving the solution to the "cool kids," no reason to complain about the fact that Microsoft is willing to care for them.

And if you'd ask the "cool kids" why do they need git at all for such scenarios, have fun with the amount of arguments you'll get. Why this one "needs" vi and another "Emacs" etc. The same reasons. You'll find the arguments also in the comments here. Including mentions of Mercurial, the competition, just like "vi or Emacs". Because. Don't ask.

And no, as far as I understand, Google doesn't primarily "use Mercurial", they use something called Piper, and before they used a customized Perforce just like Microsoft did.

https://www.wired.com/2015/09/google-2-billion-lines-codeand...

"Piper spans about 85 terabytes of data" "and Google’s 25,000 engineers make about 45,000 commits (changes) to the repository each day. That’s some serious activity. While the Linux open source operating spans 15 million lines of code across 40,000 software files, Google engineers modify 15 million lines of code across 250,000 files each week."

espadrine9y ago

It is not clear from the announcement nor the code, but in principle, I don't see a reason that it can't be a DVCS.

Sure, GVFS downloads files only when first read; but maybe it keeps them cached? Maybe you can still work on them and commit changes after you get offline? At least in principle, nothing prevents that.

youdontknowtho9y ago

I was actually surprised that there was only as much negative sentiment as there is. Microsoft could cure cancer and the post to HN would be mostly negative. It's tribal. It doesn't even matter what they do at this point.

That being said, you can see more and more people getting off the "Microsoft is evil" train. It's super slow and every bone headed thing that Microsoft does resets the needle for lots of people.

I've always been surprised how much sympathy a company like IBM or Intel gets on HN. They both sue people over patents. That both contribute to non-free software. They were early backers of Linux, though, and that is what people care about superficially.

Piskvorrr9y ago

To be honest, I was pretty neutral about MS, for a long time now, carefully optimistic even: IE8 was fair enough (when it was new), Win8 was kinda okay, Azure is great...and just when you think they're a normal company, they take out the old guns and start shoving (first GWX and then) WinX down people's throats, never mind any consent.

So, I'm very, very, very sorry that I can't hear their words over the noise of their actions; and in the light of this, I eye each new gift-bearing Redmondian with suspicion.

jjjingleheimer9y ago

to be fair, i cant say that i care if people are fair to a multinational corporation. whether linux fans are right or not they are still only doing whats best for their bottom line. should a company get a trophy for doing what its customers want?

2 more replies

floopidydoopidy9y ago

This is just more of their embrace, extend, extinguish campaign. This is the extend part.

1 more reply

ajross9y ago

> This is a great idea and solves a real problem for a lot of use cases.

I don't know if "a lot" is the right qualifier. Solitary repos of millions of files have scalability problems even outside the source control system (I mean: how long does it take your workstation to build that 3.5 million-file windows tree?)

A full Android system tree is roughly the same size and works fine with git via a small layer of indirection (the repo tool) to pull from multiple repositories. A complete linux distro is much larger still, and likewise didn't need to rework its tooling beyond putting a small layer of indirection between the upstream repository and the build system.

Honestly I'd say this GVFS gadget (which I'll fully admit is pretty cool) exists because Microsoft misapplied their source control regime.

anon9879y ago

It's because the 'problem' it solves is a corner case that's rarely encountered. I love their absurd examples of repos that take 12 hours to download. How many people have that problem, really?

All they did is create a caching layer.

ska9y ago

   How many people have that problem, really?

An easy lower bound is 10s of thousands of engineers : developers at several large tech companies (e.g. MS, facebook, google, ?)

nine_k9y ago

If you deal with code, the case is marginal for you.

If you deal with graphics, audio assets, etc, the binary-blob type of data, the case is central.

1 more reply

rplnt9y ago

Well it's a problem for thousands of employees of Microsoft, isn't it? We've had much smaller repository (10GB IIRC) and it really was annoying how long everything took, even with various caches and what not enabled.

sebastos9y ago

"I don't have this problem, so nobody does."

Lacking support for large binary blobs is, like, THE #1 reason that an engineer might have to use an alternative.

daxelrod9y ago

Ok, but you'll encounter similar git limitations with repos several orders of magnitude smaller than that too.

All you need is several hundred engineers and your monorepo becomes unwieldy for git to handle.

aanm19889y ago

It's not a caching layer, it's lazy evaluation.

tambourine_man9y ago· 20 in thread

It's interesting how all the cool things seem to come from Microsoft these days.

I still think we need something better than Git, though. It brought some very cool ideas and the inner workings are reasonably understandable, but the UI is atrociously complicated. And yes, dealing with large files is a very sore point.

I'd love to see a second attempt at a distributed version control system.

But I applaud MS's initiative. Git's got a lot of traction and mind share already and they'd probably be heavily criticized if they tried to invent its own thing, even if it was open sourced. Will take a long time to overcome its embrace, extend and extinguish history.

Analemma_9y ago

> I still think we need something better than Git, though. It brought some very cool ideas and the inner workings are reasonably understandable, but the UI is atrociously complicated. And yes, dealing with large files is a very sore point.

Note that Google and Facebook ran into the same problems Microsoft did, and their solution was to use Mercurial and build similar systems on top of it. Microsoft could've done that too, but instead decided to improve Git, which deserves some commendation. I'd rather Git and hg both got better rather than one "taking over".

aanm19889y ago

Google uses some variant of perforce, just like MS has been doing.

1 more reply

severino9y ago

> Microsoft could've done that too, but instead decided to improve Git,

They didn't improve git, they only made this for themselves and for their product users. Git doesn't restrict you to a single operating system.

2 more replies

verytrivial9y ago

> It's interesting how all the cool things seem to come from Microsoft these days.

I've assumed Microsoft have been making all this stuff all along, but keeping it internal then throwing it away on the probably false assumption that every bit of it is some sort of competitive advantage. I think they're coming around to the idea that at least appearing constructive and helpful to the developer community will help with trying to hire good developers.

sytse9y ago

Maybe something that has the data models of git but has a more consistent interface? Today on Git Merge there was a presentation about http://gitless.com/

For example one of the goals is to always allow you to switch branches. Stash and stash pop would happen automatically and it would even work if you're in the middle of a merge.

Ajedi329y ago

I'm still waiting for a decent GUI that takes full advantage of the simplicity of git's underlying data model. The CLI is okay and I've gotten really good with it, but fundamentally I think git's DAG is something that would be best represented and manipulated graphically.

[Reinventing the Git Interface][1] was written almost 3 years ago now and yet to my knowledge nobody's implemented anything quite like that yet.

[1]: http://tonsky.me/blog/reinventing-git-interface/

2 more replies

StavrosK9y ago

I quite love both the motivation and the implementation of gitless (and the choices they've made). I find it much more usable than git.

tambourine_man9y ago

I'd never heard of Gitless, I'll check it out, thanks.

coherentpony9y ago

> I'd love to see a second attempt at a distributed version control system.

Out of curiosity, why a whole new attempt? Personally, I'd prefer the approach of "making our current tools better."

gumby9y ago

"Let a thousand flowers bloom." Competition helps both sides. Clang became good enough that it spurred GCC to become a lot better.

Until 1997, forking a project was considered a tragedy. I think things have improved since then :-).

1 more reply

moosingin3space9y ago

What are your thoughts on Pijul? (https://pijul.org)

1 more reply

ska9y ago

   I'd love to see a second attempt at a distributed version control system.

Git wasn't the first, and even then had several contemporaries at 2nd gen.

toyg9y ago

Yup. I remember when git came along the field was already pretty crowded (DVCS, Darcs, Bazaar, BitKeeper, Mercurial...). I've always suspected Linus wrote git in a panic simply to sidestep the months of flames that switching VCS again would have inevitably generated, once BitKeeper stopped being viable. I also remember people jumping at the occasion like they would have never done to improve someone else's tool.

The story of git is a good case-study for people interested in group dynamics.

1 more reply

blunte9y ago

Indeed, but I also had a pause when I considered how heavily Microsoft depended on a system originally built by Linus Torvolds :)

ma2rten9y ago

Mercurial is very similar to git but more user friendly.

Florin_Andrei9y ago

> It's interesting how all the cool things seem to come from Microsoft these days.

It's like a whole'nother company after they got rid of Steve Ballmer.

Piskvorrr9y ago

Meh. If I'm watching the 3E cycle right, they're currently in the Embrace phase and heading to Extend. And it's been a cycle for a number of repetitions - it doesn't take a genius to see where it goes next.

greyman9y ago

> but the UI is atrociously complicated

Linus himself admitted that he isnt good at UI. Anyway, I think git just wasnt designed to be used directly, but via another UI. For example, I use it within Visual Studio Code, and that covers about 90 percent of usecases, and then Git Extensions can take care of almost everything else. Sometimes cli is needed, though.

sjellis9y ago

Git Extensions was a fantastic project back when the idea of Microsoft supporting Git would seemed impossible. It made my life better everyday when I was a C# developer. These days, I use the Git CLI on Windows, but VS integration with Git seems good, so I haven't felt the need to install Git Extensions.

I've still not yet seen a stand-alone GUI for Git that is better than the one that ships with Git Extensions, though.

aairey9y ago

What a change of cash-cow placement can do ...

dewyatt9y ago· 12 in thread

I think they could have picked a name that doesn't conflict with GNOME Virtual File System (GVfs).

melling9y ago

When they picked the "Windows" product name, they could have picked a name that didn't conflict with the use of windows. Picking on an obscure file system doesn't even register in comparison.

X Windows

NeWS - https://en.m.wikipedia.org/wiki/NeWS

Remember the mess on usenet?

comp.windows.x.motif

comp.windows.new - not news about Microsoft Windows

hobarrera9y ago

In this particular case, the name is THE SAME (GVFS / GVFS). And they're both virtual filesystems, so there's lots of rooms for confusion.

I can image people at a forum:

"Hey, GVFS isn't working for me. It crashes with error -504" when I try to mount /nfs/company_data".

Try guessing which GVFS that is.

wslh9y ago

They used to choose very bad names like .NET or COM [1] (this predates Internet) makes searching information very tricky. MSDN doesn't help.

[1] https://en.wikipedia.org/wiki/Component_Object_Model

FLGMwt9y ago

.NET is still rough naming-wise. We're porting a project to .NET Core Runtime which requires porting over to EF Core and ASP.NET Core (neither of which require being on Core Runtime).

Our internal libraries need to be compatible with the Core Runtime, so we have to have them target .NET Standard, which is compatible w/ the full .NET Framework or .NET Core. To target .NET Standard, you need the .NET Core SDK/CLI which includes the `dotnet` tool, which is almost never clarified as "the SDK/CLI" in documentation or in talks, but usually just ".NET Core".

Another minor annoyance: to build a .NET Standard-compatible library, you reference the "NETStandardLibrary" NuGet package. Makes a fair amount of sense, but is hard to talk about.

If you're running on Windows and want a smaller server footprint, you can use Windows Server Nano, which requires your apps to target .NET Core Runtime (not .NET Full Framework). Note that this requirement is not true for Windows Server Core. -_-

foxrob929y ago

A few years ago I had to do some interfacing between python and some modelling software. I went through a COM interface, and it was a bloody nightmare to find docs.

I later found out I could have looked for "ActiveX" and found similar results.

1 more reply

greggyb9y ago

Their relational database is called SQL Server, which might otherwise be a colloquial generic name for an RDBMS.

They have a product in Azure named simply DocumentDB. I don't think "used to" is necessarily the best tense here (:

xorcist9y ago

> NET or COM [1] (this predates Internet)

What could you possibly mean by that? The .com TLD was introduced in 1985, with microsoft.com registered already in 1991. Microsoft COM was created in 1993. (Of course, "the Internet" in any sense of the word predates all of this.)

1 more reply

sametmax9y ago

That's what my first thought. "dick move". But they probably didn't know about it.

Longhanks9y ago

This is Microsoft. Before announcing a product they have more than enough lawyers to check the name for any clashes.

They just came to the conclusion thas GNOME's product is no threat and that they can just claim the name. Smaller companies [1] tried that before.

[1]: https://www.groupon.com/blog/cities/groupon-launches-gnome

grovegames9y ago

I've built a personalized linux from the kernel up a couple times, and it never even popped into my head. Lets be fair to Microsoft, the nix world is vast, and not exactly easy to navigate if you don't live there. I live in all three worlds, and there are only so many letters. It drives me nuts when I just want a code name for a project, because I can no longer find unique words that aren't used by some project somewhere.

I mean, git itself did this in the beginning.

Microsoft, under Nadella has made me not hate Microsoft again, and that's a tall order because I'm over 40. This is an impressive move, and if they effectively execute all the bits that are possible here, this is just some great work.

(Oh, and I can't even use the word nix now as a catch all for all the POSIX/ POSIX(like) OSs because of nixOS.)

I think going forward, we just have to accept name collision.

1 more reply

discreditable9y ago

> But they probably didn't know about it.

GNOME Virtual Filesystem is first search result for "gvfs". Even if you use bing!

hobarrera9y ago

I understand that sometimes two products have the same name, but they usually have very different scopes/usages.

In this case, they're both called GVFS AND three of the letters have the same meaning, and they both do relatively similar things.

Even the tooling, and the output of `mount` is bound to be incredible confusing.

greg7mdp9y ago· 9 in thread

This is similar to what Google uses internally. See http://cacm.acm.org/magazines/2016/7/204032-why-google-store...:

"Most developers access Piper through a system called Clients in the Cloud, or CitC, which consists of a cloud-based storage backend and a Linux-only FUSE13 file system. Developers see their workspaces as directories in the file system, including their changes overlaid on top of the full Piper repository. CitC supports code browsing and normal Unix tools with no need to clone or sync state locally. Developers can browse and edit files anywhere across the Piper repository, and only modified files are stored in their workspace. This structure means CitC workspaces typically consume only a small amount of storage (an average workspace has fewer than 10 files) while presenting a seamless view of the entire Piper codebase to the developer."

This is a very powerful model when dealing with large code bases, as it solves the issue of downloading all the code to each client. Kudos to Microsoft for open sourcing it, and under the MIT license no less.

krupan9y ago

Holy cow, it sounds like they reinvented Clearcase!

sdesol9y ago

If they get this right, this can be MASSIVE for Microsoft in Enterprise. ClearCase was the reason why IBM was able to charge $1000+ per developer license fees.

ClearCase did what a lot of Enterprise companies needed at the time, and most importantly, it created hooks, that were mostly too difficult to remove. Once you create deep integration with ClearCase, you are very much committed to using it long term.

2 more replies

wyldfire9y ago

Agreed, same w/GVFS IMO.

I ruminated about ccase/git elsewhere in this thread: https://news.ycombinator.com/item?id=13560108

kentonv9y ago

Google's Piper is impressive (I used it), but it emulates Perforce. Having something Git-based is a lot more exciting. Hope someone ports it to platforms other than Windows...

general_ai9y ago

Google is far more advanced than this. They have one giant monorepo (Piper) that's backed by Bigtable (or at least it was, when I was there). Piper was mostly created in response to Perforce's inability to scale and be fault tolerant. Until Piper came along, they would have to periodically restart The Giant Perforce Server in Mountain View. Piper is 24x7x365 and doesn't need any restarts at all. But the key bit here is not Piper per se. Unlike Microsoft, Google also has a distributed, caching, incremental build system (Blaze), and a distributed test system (Forge), and they are integrated with Piper. The vast majority of the code you depend on never actually ends up on your machine. Thanks to this, what takes hours at Microsoft takes seconds at Google. This enables pretty staggering productivity gains. You don't think twice about kicking off a build, and in most cases no more than a minute or two later you have your binaries, irrespective of the size of your transitive closure. Some projects take longer than that to build, most take less time. Tests are heavily parallelized. Dependencies are tracked (so tests can be re-run when dependencies change), there are large scale refactoring tools that let you make changes that affect the entire monorepo with confidence and without breaking anyone.

Google's dev infra is pretty amazing and it's at least a decade ahead of anything else I've seen. Every single ex-Googler misses it quite a bit.

jeremyepling9y ago

I'm a Microsoft employee on the Git team. We do have a distributed, caching, incremental build system and a distributed test system. Right now, they're completely internal - like Google. They're called CloudBuild and CloudTest. They're very fast and no one thinks twice about kicking off a build.

1 more reply

kentonv9y ago

> Every single ex-Googler misses it quite a bit.

I dunno. I don't miss the 1-minute incremental builds. (Maybe they've improved since I left, though.)

BTW Forge is not just the test runner, but the thing that runs all build tasks, farmed out to all servers. Blaze interprets the build language and does dependency tree analysis but then hands off the tasks to Forge. Blaze has been (partially) open sourced: https://bazel.build/

1 more reply

d0vs9y ago

> Google's dev infra is pretty amazing and it's at least a decade ahead of anything else I've seen. Every single ex-Googler misses it quite a bit.

This may be naive but why not recreate it as an open source project?

2 more replies

kevincox9y ago

I think this would be better the piper+citc. While the virtual filesystem aspect is nice the perforce model is far inferior to git's. (IMO) Of course Google has tools on top of it. But it's not fair to compare a VCS and interface to it to a complete development infrastructure. Hell, with the content addressing of git it would even make it easy to build something similar.

1 more reply

gvb9y ago· 9 in thread

Using git with large repos and large (binary blob) files has been a pain point for quite a while. There have been several attempts to solve the problem, none of which have really taken off. I think all the attempts have been (too) proprietary – without wide support, it doesn’t get adopted.

I'll be watching this to see if Microsoft can break the logjam. By open sourcing the client and protocol, there is potential...

Other attempts:

* https://github.com/blog/1986-announcing-git-large-file-stora...

* https://confluence.atlassian.com/bitbucketserver/git-large-f...

Article on GitHub’s implementation and issues (2015): https://medium.com/@megastep/github-s-large-file-storage-is-...

cies9y ago

I think Joey Hess' attempt at "solving the problem" deserves a mention.

It is open source (GPLV3) licensed. [not proprietary]

Written in Haskell. [cool aid]

Currently has 1200+ stars on Github and is part of at least Ubuntu (http://packages.ubuntu.com/search?keywords=git-annex) since 12.04. [shows something for support and adoption]

edit: Link to Github https://github.com/joeyh/git-annex -- thanks dgellow

Ajedi329y ago

For the problem of large files I think Git LFS has largely won out over git annex, mostly because it's natively supported by GitHub and GitLab and requires no workflow changes to use.

2 more replies

pjc509y ago

There's a small but important trap to people who might want to use git-annex as a backup tool, namely that you can't store a git repo in git-annex.

http://git-annex.branchable.com/forum/Storing_git_repos_in_g...

1 more reply

dgellow9y ago

Link to the github mirror https://github.com/joeyh/git-annex

EuAndreh9y ago

git-annex is, IMHO, by far the best solution.

Pros of git-annex:

- it is conceptually very simple: use symlinks instead of ad-hoc pointer files, virtual files system, etc. to represent symbolic pointer that point to the actual blob file;

- you can add support for any backend storage you want. As long as it support basic CRUD operations, git-annex can have it as a remote;

- you can quickly clone a huge repo by just cloning the metadata of the repo (--no-content in git-annex) and just download the necessary files on-demand;

And many other things that no other attempt even consider having, like client-side encryption, location tracking, etc.

klodolph9y ago

Git Annex is only a partial solution, since it only solves issues with binary blobs. It doesn't solve problems with large repos.

vvanders9y ago

That still only solves half the problem with large binary blobs.

The other half is that almost all of the binary formats can't be merged and so you need a mechanism to lock them to prevent people from wiping out other people's changes. Unfortunately that runs pretty much counter the idea of DCVS.

to3m9y ago

I always wonder why this never gets discussed much. We seem to have tons of solutions for storing large files outside the repo, but so what? OK, so I don't deny that storage still isn't cheap enough to just say "oh well" to the idea of a multi-TByte repo, so it's certainly solving a problem. But there's still another major problem left!

Don't their artists and designers use version control too? Maybe they just have one such person per team, or each person owns one file, or something like that. Hard to say.

Maybe it's like how I used to work on teams that never used branches - you have various problems that you figure there's probably a solution for, but there's never time to (a) figure out what the solution looks like, (b) shift the whole team over to a brand new workflow and set of tools, and (c) clean up the inevitable mess. So you just work around the problems the same way you always have - because at least that's a known quantity.

1 more reply

EuAndreh9y ago

When git-annex finds a conflict it can't solve, it gives back to you the two versions of the same file with the SHA of the original versions suffixed.

This way you can look at both and resolve the conflict.

1 more reply

krishoog9y ago· 7 in thread

Does this article imply that Microsoft itself is also moving towards Git? Instead of e.g. using their own product like TFS?

daigoba669y ago

TFS has first-class support for Git repositories (in addition to the classic TFSVC repositories). So yes, they're moving more and more to Git. But no, they're not abandoning TFS.

Interestingly, however, most of their "open source" efforts (.NET, C#, and related) are all on GitHub rather than their own hosted offerings: CodePlex (which is basically dead) or "Visual Studio Team Services".

vtbassmatt9y ago

Not sure why Visual Studio Team Services is in scare quotes -- that's the product's name. And it's not an open source hosting service, which handily explains why Microsoft's open source isn't hosted there.

Disclosure: I'm a PM on VSTS/TFS, and I own part of version control.

2 more replies

jeremyepling9y ago

I'm a Microsoft employee on the Git team.

Microsoft is moving to Git and we use Team Services / TFS as our Git server for all private repositories. GitHub is only used for OSS since that's where the OSS community is.

hobarrera9y ago

Any comments on why you picked GitHub (propietary) instead of GitLab (FLOSS) for FLOSS projects?

1 more reply

ohitsdom9y ago

Reading release notes of TFS, they seem to be putting much more effort into improving integration with Git compared with TFVC. This may be just to catch up to acheive parity with TFVC in TFS, but it was enough for me to abandon TFVC for Git in all new projects.

warcode9y ago

Team Foundation Server supports both git and TFSVC.

justinlaster9y ago

You can use TFS with Git. Git will act as the underlying SCM.

ianopolous9y ago· 5 in thread

Couldn't they use git over IPFS?

kevincox9y ago

No. The problem isn't only the storage or fetching of the files (this is the easy bit :) ), it's the operations that detect changes in the working tree. If you have a large tree scanning it becomes slow.

Using a vfs allows you to track which files have changed so that these operations no longer need to scan. Now they are O(changed files) which is generally small.

Now IPFS has a vfs, but it is just a simple read/write interface. This vfs needs slightly more logic to do things like change the base revision and track changes.

ianopolous9y ago

IPFS clearly does a lot more than storing and fetching files. Seriously, go have a read. A single hash can represent an arbitrarily large subtree of data (Microsoft's entire repo). Using an IPLD selector (in its simplest form, a path beyond the hash) an arbitrary sub component can be addressed. This can be used to avoid scanning entire subtrees (maintaining your O(changed files)). To commit your modifications is O(changed files + tree depth to the root of your modifications) you never need to do anything with the rest of the repo.

For tracking changes (i.e. mutable data) you can use IPNS and create a signed commit history. This will be built on IPFS eventually so it's only a matter of time.

janwhOP9y ago

It was explained in the talk at Git-Merge that their problem is not large files per se. The codebase is huge in the amount of source files alone. It was stated that the repo contains about 3.5 million files. Having IPFS here wouldn't help, would it?

ianopolous9y ago

yes, IPFS is designed to host the entire internet. You can selectively mount sub-graphs arbitrarily, which means only downloading locally exactly what you need.

1 more reply

ianopolous9y ago

Not sure why I'm being down voted. It was a serious question. IPFS solves the handling of large files (by chunking them), and works in a P2P way in which you can locally mount remote merkle trees (the core data structure of git). I believe this use case is also actually one of the original design goals of IPFS.

cafebabbe9y ago· 4 in thread

My sysadmin: "we won't switch to git because it can't handle binary files and our code base is too big"

Our whole codebase is 800MB.

angry_octet9y ago

I hope that was a conversation from 5 years ago.

Otherwise, I hope you replaced your sysadmin.

alkonaut9y ago

Our codebase (latest tree) is similar, but switching to git it's the total history size that is the problem. Our history is well over 25GB which git doesn't handle very gracefully.

kevincox9y ago

History shouldn't be a problem, you can do a shallow checkout. But you will have to store the working tree at least on your workstation.

This solves the next scaling problem of avoiding managing the whole working tree. (without requiring narrow clones which have significant downsides)

1 more reply

Spivak9y ago

Isn't that the case that LFS solves? I've got 30ish gigs of binary blobs stored in my repos.

OJFord9y ago· 4 in thread

> when you run “git checkout” and it takes up to 3 hours, or even a simple “git status” takes almost 10 minutes to run. That’s assuming you can get past the “git clone”, which takes 12+ hours.

How on Earth can anybody work like that?

I'd have thought you may as well ditch git at that point, since nobody's going to be using it as a tool, surely?

    git commit -m 'Add today\'s work - night all!' && git push; shutdown

aanm19889y ago

You can't, and microsoft isn't. They built this so they can use git without those problems.

stinos9y ago

How on Earth can anybody work like that?

Since it's look like they are still migrating I don't think a lot of people actually did work like that. Maybe just a couple of times to figure out how long it would actually take. Or maybe those who really use it are actually doing shallow clones which would probably take much less time. Actually shallow clone is nice but doesn't seem to be known very well. I use it often if I know I won't ever need the full history anyway. Also great to shave time of CI builds.

OJFord9y ago

Shallow clones are great, until they're not. I don't think I've ever (having tried a few times) cleanly cloned 'below' the graft point when I've needed to, or a different branch.

metamet9y ago

It's called Pomodoro++.

wyldfire9y ago· 3 in thread

I'm immediately reminded of MVFS and clearcase. Lots of companies still use clearcase, but IMO it's not the best tool for the job. git is superior in most dimensions. From what this article says, it's not quite the same as clearcase but there's certainly some hints of similarities.

The biggest PITA with clearcase was keeping their lousy MVFS kernel module in sync with ever-advancing linux distros.

I really liked Clearcase in 1999, it was an incredible advancement over other offerings then. MVFS was like "yeah! this is how I'd design a sweet revision control system. Transparent revision access according to a ranked set of rules, read-only files until checked out." But with global collaborators, multi-site was too complex IMO. And overall, clearcase was so different from other revision control systems that training people on it was a headache. Performance for dynamic views would suffer for elements whose vtrees took a lot of branches. Derived objects no longer made sense -- just too slow. Local disk was cheap now, it got bigger much faster than object files.

> However, we also have a handful of teams with repos of unusual size! ... You can see that in action when you run “git checkout” and it takes up to 3 hours, or even a simple “git status” takes almost 10 minutes to run. That’s assuming you can get past the “git clone”, which takes 12+ hours.

This seems like a way-out-there use case, but it's good to know that there's other solutions. I'd be tempted to partition the codebase by decades or something.

tcbawo9y ago

I used Clearcase (on Solaris) in 1999 and was not a fan. It slowed our build times by at least 10x. I'm sure it was probably set up wrong, but this was a Fortune 100 company with lots of dedicated resources.

foobiekr9y ago

Clearcase performance for many builds was specifically impacted by the very poor performance of stat(). You could make very real improvements on build times by reducing the number of calls to stat(). It was sort of amazing.

Clearcase also suffered, at least in my experience, from a clumsy and ugly merging process and deeply unintuitive command set which meant everyone who "used clearcase" actually tended to use some terrible homegrown wrapper scripts.

Still, considering it was the last remaining vestige of the Apollo Domain OS, not bad.

david-given9y ago

I used Clearcase a while back in an office in China. Every few days there'd be an email going round pleading with people to keep their antivirus software up-to-date; because someone had a virus somewhere, and apparently the virus would try to infect executables on the Clearcase virtual drive, at which point Clearcase would obligingly check the infected file into the VCS and distribute it to all clients...

daigoba669y ago· 3 in thread

The article doesn't directly say it, but are they migrating the Windows source code repository to git? That seems like a big deal.

I seem to recall that Microsoft has previously used a custom Perforce "fork" for their larger code bases (Windows, Server, Office, etc.).

fsckin9y ago

Source Depot. Forked years ago with tons of added features. Various Halo titles also used it and had easy to use integrations with most of the art and design pipeline tools.

vtbassmatt9y ago

Yes, Windows is migrating to Git.

adrianN9y ago

Do you have a citation for that?

2 more replies

chokolad9y ago· 2 in thread

There is a discussion thread on r/programming, where MS folks, who implemented this answer questions. A lot of questions like why not use multiple repos, why not git-lfs, why not git subtree, etc. are answered there

https://www.reddit.com/r/programming/comments/5rtlk0/git_vir...

stinos9y ago

Thanks for bringing this up, it was actually a more interesting read than this thread. Less trolling, more facts and also interesting to read stuff I didn't happen to know. Like

One of the core differences between Windows and Linux is process creation. It's slower - relatively - on Windows. Since Git is largely implemented as many Bash scripts that run as separate processes, the performance is slower on Windows. We’re working with the git community to move more of these scripts to native cross-platform components written in C, like we did with interactive rebase. This will make Git faster for all systems, including a big boost to performance on Windows.

EdHominem9y ago

> "We’re working with the git community to move more of these scripts to native cross-platform components written in C"

Sad. Rather than fix the root problem they rewrite the product in a less-agile language and require everyone to run opaque binaries.

They probably even think they're doing a good thing.

1 more reply

Ericson23149y ago· 2 in thread

If I understand this correctly, unlike git-annex and git lfs, this not about extending the git format with special large files, but changing the algorithm for the current data format.

A custom filesystem is indeed the correct approach, and one that git itself should have probably supported long ago. In fact, there should really only be one "repo" per machine, name-spaced branches, and multiple mountpoints a la `git worktree`. In other words there should be a system daemon managing a single global object store.

I wonder/hope IPFS can benefit from this implementation on Windows, where FUSE isn't an option.

manojlds9y ago

The blog post does mention that some changes have been made to git (in their fork)

sdesol9y ago

I did a quick comparison of Microsoft's fork and it appears they have done quite a bit with it.

Microsoft's fork contains 67,522 commits. The official Git repo contains 45,810. It appears the bulk of the work started in 2010, with significant ramp up of development in 2015.

https://gitsense.com/mgit-vs-git/history.png

Looks like Microsoft only really introduced about 100 more new files.

https://gitsense.com/mgit-vs-git/files.png

Microsoft's repo contains 1712 contributors. Git's repo contains 1685 contributors. So it looks 20 - 30 employees worked on Microsoft's fork.

https://gitsense.com/mgit-vs-git/mgit-contributors.png https://gitsense.com/mgit-vs-git/git-contributors.png

rethab9y ago· 2 in thread

Assuming that the repo was this big in the beginning, I wonder why the ever migrated to git (I'm assuming they did, because they can tell how long it takes to checkout). At least when somebody "tries" do the migration, wouldn't they realize that maybe git is not the right tool for them? Or did they actually migrate and then work with "git status" that take 10 minutes for some time until they realize they may need to change something?

Also, it would have been interesting if the article mentioned whether they tried other approaches taken by facebook (mercurial afaik) or google.

xearl9y ago

To me it sounds like these numbers are from a migration-in-progress. So they are trying, but instead of giving up and saying "not the right tool for us" they are trying to improve the tool.

becarefulyo9y ago

Because of the productivity benefits of using public tools instead of internal ones. Devs are more familiar with them, more documentation and examples, morale benefit because skills are transferable to other jobs, etc.

imron9y ago· 2 in thread

> repos of unusual size

Sounds like they've almost solved the secrets of the fire swamp!

krallja9y ago

Repos of Unusual Size? I don't think they exist.

olkid9y ago

They can live there quite happily for some time.

srott9y ago· 2 in thread

I remember few years ago Git under Windows was very slow, is it still true?

WorldMaker9y ago

Git on Windows has gotten very fast and stable in the last few years. Microsoft employees themselves, among others of course, have directly contributed to a much better Git experience on Windows.

Groxx9y ago

The reddit thread has quite a few people with opposing opinions, fwiw. Mostly "stuff that's ~instant on unix takes many seconds on Windows" and the like. It's true that Microsoft has contributed a lot (to the benefit of all), but from what I'm seeing it sounds like it's still lagging quite a bit.

I haven't touched Windows in quite a while, so I can't really make a claim either way.

1 more reply

pjmlp9y ago· 2 in thread

Quite nice use of C# and C++/CX for a virtual system implementation.

contextfree9y ago

looks like C++/CLI (C++/CX reused its syntax and maybe parsing code, but they're still distinct)

pjmlp9y ago

Yes, hence why Microsoft had lots of trouble to convince developers that don't read documentation, that they are distinct and C++/CX gets compiled to just pure native code, as they were spreading misinformation about it.

In any case, when C++/WinRT gets feature parity, I imagine it will eventually be deprecated, depending which one gets more developer love.

zahreeley9y ago· 2 in thread

Don't believe in modular development with smaller repos?

Groxx9y ago

Yeah, I see things like this, and I always wonder why they don't make a submodule tree.

It wasn't an option a couple years ago, but submodules work fine now. With a little bit of scripting to wrap common uses, they're practically pain-free.

shandor9y ago

Could you elaborate a little what has changed there? My understanding is that submodules are still considered a mess, but would be really nice if some actual improvements have happened.

1 more reply

testUser699y ago· 2 in thread

Why is that so hard to believe? America is run by Donald Trump.

The problems with these companies is that developers aren't making technical decisions, it's executives who know nothing about computer science. That's why Windows 10 is such a mess with spyware and adware.

Now they have some FOSS advocate who doesn't really know anything about software or VCS but saw that an internal problem they were trying to solve was making their code base work with git. So he decided it would be really cool for Microsofts image to develop an open source extension of git, instead of actually solving the underlying problems (because he didn't recognize them). Now he's probably got a promotion at Microsoft for "fixing" their problem with git.

dang9y ago

We detached this subthread from https://news.ycombinator.com/item?id=13559893 and marked it off-topic.

klodolph9y ago

If there's something you need to talk about, by all means find a place to talk about it. But if what you want to talk about is how other people are stupid, maybe this is the wrong place for that. I hope that you find the healing that you need.

ksec9y ago· 2 in thread

Interesting M$ is moving to Git and the rest of the world is pretty much Github & alternatives while Facebook and Google are going with Mercurial. I actually liked Mercurial apart from its name being little hard to pronounce, but it doesn't seems to get used anywhere.

So are the DVCS converging to Git and Git only?

jgalt2129y ago

In the open source areana, it certainly seems like the game is over. But as others mentioned, both FB and GOOG are big Mercurial users and contributors.

Our shop uses Mercurial becuase of its Python basis and the amount of time and effort it takes to master Git makes me draw strong and uncomfortable parallels to emacs.

stephenr9y ago

> the rest of the world is pretty much Github & alternatives

By "the world" you mean, the HN/SV/startup crowd of cool-kids who feel the need to use whatever is popular without regard for how appropriate it is?

yakk09y ago· 1 in thread

I appreciated the Princess Bride reference with "repos of unusual size"

wyldfire9y ago

I don't believe in them.

0X1A9y ago· 1 in thread

Just to make sure I have this right, this has to do with the _amount_ of files in their repo and not the _size_ of the files? So projects like git annex and LFS would not help the speed of the git repos?

WorldMaker9y ago

That's how I read it, that this is about monorepos with file trees with large numbers of files where users don't necessary need every single file in their local worktree to get work done.

I'd assume this GVFS would work hand in hand with Git LFS for the use case of large files.

mfontani9y ago· 1 in thread

So... what happens when one runs "git grep foo" on it?

kevincox9y ago

It will be slow. Small steps. But in practice companies with large repos have other search solutions so that each user doesn't have to do a raw search on the entire working tree.

hoov9y ago

This is pretty big news. I know that when I was at Adobe, the only reason that Perforce was used for things like Acrobat, is because it was simply the only source control solution that could handle the size of the repo. Smaller projects were starting to use Git, but the big projects all stuck with Perforce.

kevincox9y ago

I love this approach. From working at Google I appreciate the virtual filesystem, it makes a lot of things a lot easier. However all my repos are large enough to fit on a single machine so I wish there was a mode where it was backed by a local repository, however the filesystem allows git to avoid tree scans.

Basically most operations in git are O(modified files) however there are a few that are O(working tree size). For example checkout and status were mentioned by the article. However these operations can be made to O(modified) files if git doesn't have to scan the working tree for changes.

So pretty much I would be all over this if:

- It worked locally.

- It worked on Linux.

Maybe I'll see how it's implemented and see if I could add the features required. I'm really excited for the future of this project.

rbanffy9y ago

Did they really need to make a name collision?

https://en.wikipedia.org/wiki/GVfs

1 more reply

Navarr9y ago

This sounds like a solid use case and a solid extension for that use case - but definitely not the end-all-be-all.

For one, it's not really distributed if you're only downloading when you need that specific file.

But that doesn't change the merrits of this at all, I think.

mortdeus9y ago

Or how about we start some compartmentalizing your codebase so that you can like. You know, organize your code and restore sanity to the known universe.

I think when the powers that be said that whole thing about geniuses and clutter, they were specifically talking about their living spaces and not their work...

zwischenzug9y ago

Does anyone know Microsoft's open source policy works internally? I'm thinking from a governance perspective, as I'm involved in a similar effort at $WORK.

scotty799y ago

I had a medium sized project in Ruby on Rails as git repo inside vm.

It was slow to do 'git status' and other common commands. Restarting RoR app was also slo. I've put repo on RAM disk which made the whole experience at least few times faster.

Since all was in vm that I rarely restarted I didn't have to recreate files on ram disk all that often. I was syncing changes with the persistent disk with rsync running periodically.

myrandomcomment9y ago

"For example, the Windows codebase has over 3.5 million files and is over 270 GB in size."

Okay, so this is a networking issue. Or is it a stick everything in the same branch issue?

Whatever the reason here the issue is pure size vs. network pipe, pure and simple. Hum, when can I get a laptop with a 10GBaseT interface?

One of the issue with the way they are doing this (only grab files when needed) is you cannot really work offline anymore.

amingilani9y ago

I'm no expert but if most single developers only use 5-10% of the codebase in their daily life, wouldn't it make to maybe break the project into multiple codebases of about 5% each and use a build pipeline that combines them together when needed?

Although I could definitely be wrong but this sounds a lot like monolith vs microservices to me.

nojvek9y ago

Microsoft is moving away from source depo to git it seems. I think its fantastic that a company like Microsoft is adapting git for its big king and queen projects such as office and windows. Also open sourcing the underlying magic tells a lot about the new Microsoft. They're really moving away from not-invented here syndrome

b1gtuna9y ago

MS has been doing really neat stuff lately. I never worked on a project that takes hours to clone. The largest repository I regularly clone is the Linux repo. It still takes only a few minutes. Yet I can see the GVFS being beneficial for me as I spend most of the time just reading the code (so no need to compile) on my laptop.

alkonaut9y ago

Could this also help a smaller repo but with long history, making the total repo size too large?

The whole repo is needed for every developer - i.e it's not possible to do a sparse checkout but many gigs of old versions of small binaries I would prefer to keep only at the server until I need it (which is never).

acqq9y ago

And for all those who still try to stick to anything older:

https://github.com/Microsoft/gvfs

"GVFS requires Windows 10 Anniversary Update or later."

dstaheli9y ago

Check out the GVFS back story and details here: https://news.ycombinator.com/item?id=13563439

lolikoisuru9y ago

Is it really that fucking hard to check if your package name is unique?

Here is another virtual filesystem with the exact same name: https://wiki.gnome.org/Projects/gvfs

Debian package for it: https://packages.debian.org/jessie/gvfs

igtztorrero9y ago

Anybody knows what does Linus think about it ?

cikey9y ago

Can we use this together with git LFS?

j / k navigate · click thread line to collapse

274 comments

158 comments · 40 top-level

kentt9y ago· 21 in thread

It's disappointing that all the comments are so negative. This is a great idea and solves a real problem for a lot of use cases.

I remembering years ago Facebook says it had this problem. A lot of the comments were centered around that you could change your codebase to for what git can do. I'm glad there's another option now.

mox19y ago

Yes they did. They choose to scale out Mecurial to solve their problem. Wonder if they still use Mercurial?

https://code.facebook.com/posts/218678814984400/scaling-merc...

kyrra9y ago

As well, the Mercurial team does quarterly sprints (I believe), and Google is hosting the next one[1].

[0] https://www.mercurial-scm.org/repo/hg

[1] https://www.mercurial-scm.org/wiki/4.2sprint

1 more reply

janwhOP9y ago

1 more reply

jbergens9y ago

They did a couple of months ago so I assume they still do.

outcoldman9y ago

klodolph9y ago

> I believe there are plenty bridges between CVS and git already implemented, which also allows you to checkout only part of the CVS tree.

I've heard horror stories about back when people had to do partial checkouts of the source code, and I'm glad that the tooling I use is better.

2 more replies

jonknee9y ago

> Seems like with GVFS they are making DVC to be a CVS again. What is the point?

It sounds like they answered that:

Source will still be distributed among the developers that touch it. Seems like a decent compromise.

1 more reply

acqq9y ago

> Seems like with GVFS they are making DVC to be a CVS

> just to give cool kids access to cool tools

If Microsoft is giving the solution to the "cool kids," no reason to complain about the fact that Microsoft is willing to care for them.

And no, as far as I understand, Google doesn't primarily "use Mercurial", they use something called Piper, and before they used a customized Perforce just like Microsoft did.

https://www.wired.com/2015/09/google-2-billion-lines-codeand...

espadrine9y ago

It is not clear from the announcement nor the code, but in principle, I don't see a reason that it can't be a DVCS.

youdontknowtho9y ago

That being said, you can see more and more people getting off the "Microsoft is evil" train. It's super slow and every bone headed thing that Microsoft does resets the needle for lots of people.

Piskvorrr9y ago

So, I'm very, very, very sorry that I can't hear their words over the noise of their actions; and in the light of this, I eye each new gift-bearing Redmondian with suspicion.

jjjingleheimer9y ago

2 more replies

floopidydoopidy9y ago

This is just more of their embrace, extend, extinguish campaign. This is the extend part.

1 more reply

ajross9y ago

> This is a great idea and solves a real problem for a lot of use cases.

Honestly I'd say this GVFS gadget (which I'll fully admit is pretty cool) exists because Microsoft misapplied their source control regime.

anon9879y ago

It's because the 'problem' it solves is a corner case that's rarely encountered. I love their absurd examples of repos that take 12 hours to download. How many people have that problem, really?

All they did is create a caching layer.

ska9y ago

   How many people have that problem, really?

An easy lower bound is 10s of thousands of engineers : developers at several large tech companies (e.g. MS, facebook, google, ?)

nine_k9y ago

If you deal with code, the case is marginal for you.

If you deal with graphics, audio assets, etc, the binary-blob type of data, the case is central.

1 more reply

rplnt9y ago

sebastos9y ago

"I don't have this problem, so nobody does."

Lacking support for large binary blobs is, like, THE #1 reason that an engineer might have to use an alternative.

daxelrod9y ago

Ok, but you'll encounter similar git limitations with repos several orders of magnitude smaller than that too.

All you need is several hundred engineers and your monorepo becomes unwieldy for git to handle.

aanm19889y ago

It's not a caching layer, it's lazy evaluation.

tambourine_man9y ago· 20 in thread

It's interesting how all the cool things seem to come from Microsoft these days.

I'd love to see a second attempt at a distributed version control system.

Analemma_9y ago

aanm19889y ago

Google uses some variant of perforce, just like MS has been doing.

1 more reply

severino9y ago

> Microsoft could've done that too, but instead decided to improve Git,

They didn't improve git, they only made this for themselves and for their product users. Git doesn't restrict you to a single operating system.

2 more replies

verytrivial9y ago

> It's interesting how all the cool things seem to come from Microsoft these days.

sytse9y ago

Maybe something that has the data models of git but has a more consistent interface? Today on Git Merge there was a presentation about http://gitless.com/

For example one of the goals is to always allow you to switch branches. Stash and stash pop would happen automatically and it would even work if you're in the middle of a merge.

Ajedi329y ago

[Reinventing the Git Interface][1] was written almost 3 years ago now and yet to my knowledge nobody's implemented anything quite like that yet.

[1]: http://tonsky.me/blog/reinventing-git-interface/

2 more replies

StavrosK9y ago

I quite love both the motivation and the implementation of gitless (and the choices they've made). I find it much more usable than git.

tambourine_man9y ago

I'd never heard of Gitless, I'll check it out, thanks.

coherentpony9y ago

> I'd love to see a second attempt at a distributed version control system.

Out of curiosity, why a whole new attempt? Personally, I'd prefer the approach of "making our current tools better."

gumby9y ago

"Let a thousand flowers bloom." Competition helps both sides. Clang became good enough that it spurred GCC to become a lot better.

Until 1997, forking a project was considered a tragedy. I think things have improved since then :-).

1 more reply

moosingin3space9y ago

What are your thoughts on Pijul? (https://pijul.org)

1 more reply

ska9y ago

   I'd love to see a second attempt at a distributed version control system.

Git wasn't the first, and even then had several contemporaries at 2nd gen.

toyg9y ago

The story of git is a good case-study for people interested in group dynamics.

1 more reply

blunte9y ago

Indeed, but I also had a pause when I considered how heavily Microsoft depended on a system originally built by Linus Torvolds :)

ma2rten9y ago

Mercurial is very similar to git but more user friendly.

Florin_Andrei9y ago

> It's interesting how all the cool things seem to come from Microsoft these days.

It's like a whole'nother company after they got rid of Steve Ballmer.

Piskvorrr9y ago

greyman9y ago

> but the UI is atrociously complicated

sjellis9y ago

I've still not yet seen a stand-alone GUI for Git that is better than the one that ships with Git Extensions, though.

aairey9y ago

What a change of cash-cow placement can do ...

dewyatt9y ago· 12 in thread

I think they could have picked a name that doesn't conflict with GNOME Virtual File System (GVfs).

melling9y ago

When they picked the "Windows" product name, they could have picked a name that didn't conflict with the use of windows. Picking on an obscure file system doesn't even register in comparison.

X Windows

NeWS - https://en.m.wikipedia.org/wiki/NeWS

Remember the mess on usenet?

comp.windows.x.motif

comp.windows.new - not news about Microsoft Windows

hobarrera9y ago

In this particular case, the name is THE SAME (GVFS / GVFS). And they're both virtual filesystems, so there's lots of rooms for confusion.

I can image people at a forum:

"Hey, GVFS isn't working for me. It crashes with error -504" when I try to mount /nfs/company_data".

Try guessing which GVFS that is.

wslh9y ago

They used to choose very bad names like .NET or COM [1] (this predates Internet) makes searching information very tricky. MSDN doesn't help.

[1] https://en.wikipedia.org/wiki/Component_Object_Model

FLGMwt9y ago

.NET is still rough naming-wise. We're porting a project to .NET Core Runtime which requires porting over to EF Core and ASP.NET Core (neither of which require being on Core Runtime).

Another minor annoyance: to build a .NET Standard-compatible library, you reference the "NETStandardLibrary" NuGet package. Makes a fair amount of sense, but is hard to talk about.

foxrob929y ago

A few years ago I had to do some interfacing between python and some modelling software. I went through a COM interface, and it was a bloody nightmare to find docs.

I later found out I could have looked for "ActiveX" and found similar results.

1 more reply

greggyb9y ago

Their relational database is called SQL Server, which might otherwise be a colloquial generic name for an RDBMS.

They have a product in Azure named simply DocumentDB. I don't think "used to" is necessarily the best tense here (:

xorcist9y ago

> NET or COM [1] (this predates Internet)

1 more reply

sametmax9y ago

That's what my first thought. "dick move". But they probably didn't know about it.

Longhanks9y ago

This is Microsoft. Before announcing a product they have more than enough lawyers to check the name for any clashes.

They just came to the conclusion thas GNOME's product is no threat and that they can just claim the name. Smaller companies [1] tried that before.

[1]: https://www.groupon.com/blog/cities/groupon-launches-gnome

grovegames9y ago

I mean, git itself did this in the beginning.

(Oh, and I can't even use the word nix now as a catch all for all the POSIX/ POSIX(like) OSs because of nixOS.)

I think going forward, we just have to accept name collision.

1 more reply

discreditable9y ago

> But they probably didn't know about it.

GNOME Virtual Filesystem is first search result for "gvfs". Even if you use bing!

hobarrera9y ago

I understand that sometimes two products have the same name, but they usually have very different scopes/usages.

In this case, they're both called GVFS AND three of the letters have the same meaning, and they both do relatively similar things.

Even the tooling, and the output of `mount` is bound to be incredible confusing.

greg7mdp9y ago· 9 in thread

This is similar to what Google uses internally. See http://cacm.acm.org/magazines/2016/7/204032-why-google-store...:

krupan9y ago

Holy cow, it sounds like they reinvented Clearcase!

sdesol9y ago

If they get this right, this can be MASSIVE for Microsoft in Enterprise. ClearCase was the reason why IBM was able to charge $1000+ per developer license fees.

2 more replies

wyldfire9y ago

Agreed, same w/GVFS IMO.

I ruminated about ccase/git elsewhere in this thread: https://news.ycombinator.com/item?id=13560108

kentonv9y ago

Google's Piper is impressive (I used it), but it emulates Perforce. Having something Git-based is a lot more exciting. Hope someone ports it to platforms other than Windows...

general_ai9y ago

Google's dev infra is pretty amazing and it's at least a decade ahead of anything else I've seen. Every single ex-Googler misses it quite a bit.

jeremyepling9y ago

1 more reply

kentonv9y ago

> Every single ex-Googler misses it quite a bit.

I dunno. I don't miss the 1-minute incremental builds. (Maybe they've improved since I left, though.)

1 more reply

d0vs9y ago

> Google's dev infra is pretty amazing and it's at least a decade ahead of anything else I've seen. Every single ex-Googler misses it quite a bit.

This may be naive but why not recreate it as an open source project?

2 more replies

kevincox9y ago

1 more reply

gvb9y ago· 9 in thread

I'll be watching this to see if Microsoft can break the logjam. By open sourcing the client and protocol, there is potential...

Other attempts:

* https://github.com/blog/1986-announcing-git-large-file-stora...

* https://confluence.atlassian.com/bitbucketserver/git-large-f...

Article on GitHub’s implementation and issues (2015): https://medium.com/@megastep/github-s-large-file-storage-is-...

cies9y ago

I think Joey Hess' attempt at "solving the problem" deserves a mention.

It is open source (GPLV3) licensed. [not proprietary]

Written in Haskell. [cool aid]

Currently has 1200+ stars on Github and is part of at least Ubuntu (http://packages.ubuntu.com/search?keywords=git-annex) since 12.04. [shows something for support and adoption]

edit: Link to Github https://github.com/joeyh/git-annex -- thanks dgellow

Ajedi329y ago

For the problem of large files I think Git LFS has largely won out over git annex, mostly because it's natively supported by GitHub and GitLab and requires no workflow changes to use.

2 more replies

pjc509y ago

There's a small but important trap to people who might want to use git-annex as a backup tool, namely that you can't store a git repo in git-annex.

http://git-annex.branchable.com/forum/Storing_git_repos_in_g...

1 more reply

dgellow9y ago

Link to the github mirror https://github.com/joeyh/git-annex

EuAndreh9y ago

git-annex is, IMHO, by far the best solution.

Pros of git-annex:

- it is conceptually very simple: use symlinks instead of ad-hoc pointer files, virtual files system, etc. to represent symbolic pointer that point to the actual blob file;

- you can add support for any backend storage you want. As long as it support basic CRUD operations, git-annex can have it as a remote;

- you can quickly clone a huge repo by just cloning the metadata of the repo (--no-content in git-annex) and just download the necessary files on-demand;

And many other things that no other attempt even consider having, like client-side encryption, location tracking, etc.

klodolph9y ago

Git Annex is only a partial solution, since it only solves issues with binary blobs. It doesn't solve problems with large repos.

vvanders9y ago

That still only solves half the problem with large binary blobs.

to3m9y ago

Don't their artists and designers use version control too? Maybe they just have one such person per team, or each person owns one file, or something like that. Hard to say.

1 more reply

EuAndreh9y ago

When git-annex finds a conflict it can't solve, it gives back to you the two versions of the same file with the SHA of the original versions suffixed.

This way you can look at both and resolve the conflict.

1 more reply

krishoog9y ago· 7 in thread

Does this article imply that Microsoft itself is also moving towards Git? Instead of e.g. using their own product like TFS?

daigoba669y ago

TFS has first-class support for Git repositories (in addition to the classic TFSVC repositories). So yes, they're moving more and more to Git. But no, they're not abandoning TFS.

vtbassmatt9y ago

Disclosure: I'm a PM on VSTS/TFS, and I own part of version control.

2 more replies

jeremyepling9y ago

I'm a Microsoft employee on the Git team.

Microsoft is moving to Git and we use Team Services / TFS as our Git server for all private repositories. GitHub is only used for OSS since that's where the OSS community is.

hobarrera9y ago

Any comments on why you picked GitHub (propietary) instead of GitLab (FLOSS) for FLOSS projects?

1 more reply

ohitsdom9y ago

warcode9y ago

Team Foundation Server supports both git and TFSVC.

justinlaster9y ago

You can use TFS with Git. Git will act as the underlying SCM.

ianopolous9y ago· 5 in thread

Couldn't they use git over IPFS?

kevincox9y ago

Using a vfs allows you to track which files have changed so that these operations no longer need to scan. Now they are O(changed files) which is generally small.

Now IPFS has a vfs, but it is just a simple read/write interface. This vfs needs slightly more logic to do things like change the base revision and track changes.

ianopolous9y ago

For tracking changes (i.e. mutable data) you can use IPNS and create a signed commit history. This will be built on IPFS eventually so it's only a matter of time.

janwhOP9y ago

ianopolous9y ago

yes, IPFS is designed to host the entire internet. You can selectively mount sub-graphs arbitrarily, which means only downloading locally exactly what you need.

1 more reply

ianopolous9y ago

cafebabbe9y ago· 4 in thread

My sysadmin: "we won't switch to git because it can't handle binary files and our code base is too big"

Our whole codebase is 800MB.

angry_octet9y ago

I hope that was a conversation from 5 years ago.

Otherwise, I hope you replaced your sysadmin.

alkonaut9y ago

Our codebase (latest tree) is similar, but switching to git it's the total history size that is the problem. Our history is well over 25GB which git doesn't handle very gracefully.

kevincox9y ago

History shouldn't be a problem, you can do a shallow checkout. But you will have to store the working tree at least on your workstation.

This solves the next scaling problem of avoiding managing the whole working tree. (without requiring narrow clones which have significant downsides)

1 more reply

Spivak9y ago

Isn't that the case that LFS solves? I've got 30ish gigs of binary blobs stored in my repos.

OJFord9y ago· 4 in thread

How on Earth can anybody work like that?

I'd have thought you may as well ditch git at that point, since nobody's going to be using it as a tool, surely?

    git commit -m 'Add today\'s work - night all!' && git push; shutdown

aanm19889y ago

You can't, and microsoft isn't. They built this so they can use git without those problems.

stinos9y ago

How on Earth can anybody work like that?

OJFord9y ago

Shallow clones are great, until they're not. I don't think I've ever (having tried a few times) cleanly cloned 'below' the graft point when I've needed to, or a different branch.

metamet9y ago

It's called Pomodoro++.

wyldfire9y ago· 3 in thread

The biggest PITA with clearcase was keeping their lousy MVFS kernel module in sync with ever-advancing linux distros.

This seems like a way-out-there use case, but it's good to know that there's other solutions. I'd be tempted to partition the codebase by decades or something.

tcbawo9y ago

foobiekr9y ago

Still, considering it was the last remaining vestige of the Apollo Domain OS, not bad.

david-given9y ago

daigoba669y ago· 3 in thread

The article doesn't directly say it, but are they migrating the Windows source code repository to git? That seems like a big deal.

I seem to recall that Microsoft has previously used a custom Perforce "fork" for their larger code bases (Windows, Server, Office, etc.).

fsckin9y ago

Source Depot. Forked years ago with tons of added features. Various Halo titles also used it and had easy to use integrations with most of the art and design pipeline tools.

vtbassmatt9y ago

Yes, Windows is migrating to Git.

adrianN9y ago

Do you have a citation for that?

2 more replies

chokolad9y ago· 2 in thread

https://www.reddit.com/r/programming/comments/5rtlk0/git_vir...

stinos9y ago

Thanks for bringing this up, it was actually a more interesting read than this thread. Less trolling, more facts and also interesting to read stuff I didn't happen to know. Like

EdHominem9y ago

> "We’re working with the git community to move more of these scripts to native cross-platform components written in C"

Sad. Rather than fix the root problem they rewrite the product in a less-agile language and require everyone to run opaque binaries.

They probably even think they're doing a good thing.

1 more reply

Ericson23149y ago· 2 in thread

If I understand this correctly, unlike git-annex and git lfs, this not about extending the git format with special large files, but changing the algorithm for the current data format.

I wonder/hope IPFS can benefit from this implementation on Windows, where FUSE isn't an option.

manojlds9y ago

The blog post does mention that some changes have been made to git (in their fork)

sdesol9y ago

I did a quick comparison of Microsoft's fork and it appears they have done quite a bit with it.

Microsoft's fork contains 67,522 commits. The official Git repo contains 45,810. It appears the bulk of the work started in 2010, with significant ramp up of development in 2015.

https://gitsense.com/mgit-vs-git/history.png

Looks like Microsoft only really introduced about 100 more new files.

https://gitsense.com/mgit-vs-git/files.png

Microsoft's repo contains 1712 contributors. Git's repo contains 1685 contributors. So it looks 20 - 30 employees worked on Microsoft's fork.

https://gitsense.com/mgit-vs-git/mgit-contributors.png https://gitsense.com/mgit-vs-git/git-contributors.png

rethab9y ago· 2 in thread

Also, it would have been interesting if the article mentioned whether they tried other approaches taken by facebook (mercurial afaik) or google.

xearl9y ago

To me it sounds like these numbers are from a migration-in-progress. So they are trying, but instead of giving up and saying "not the right tool for us" they are trying to improve the tool.

becarefulyo9y ago

imron9y ago· 2 in thread

> repos of unusual size

Sounds like they've almost solved the secrets of the fire swamp!

krallja9y ago

Repos of Unusual Size? I don't think they exist.

olkid9y ago

They can live there quite happily for some time.

srott9y ago· 2 in thread

I remember few years ago Git under Windows was very slow, is it still true?

WorldMaker9y ago

Git on Windows has gotten very fast and stable in the last few years. Microsoft employees themselves, among others of course, have directly contributed to a much better Git experience on Windows.

Groxx9y ago

I haven't touched Windows in quite a while, so I can't really make a claim either way.

1 more reply

pjmlp9y ago· 2 in thread

Quite nice use of C# and C++/CX for a virtual system implementation.

contextfree9y ago

looks like C++/CLI (C++/CX reused its syntax and maybe parsing code, but they're still distinct)

pjmlp9y ago

In any case, when C++/WinRT gets feature parity, I imagine it will eventually be deprecated, depending which one gets more developer love.

zahreeley9y ago· 2 in thread

Don't believe in modular development with smaller repos?

Groxx9y ago

Yeah, I see things like this, and I always wonder why they don't make a submodule tree.

It wasn't an option a couple years ago, but submodules work fine now. With a little bit of scripting to wrap common uses, they're practically pain-free.

shandor9y ago

Could you elaborate a little what has changed there? My understanding is that submodules are still considered a mess, but would be really nice if some actual improvements have happened.

1 more reply

testUser699y ago· 2 in thread

Why is that so hard to believe? America is run by Donald Trump.

dang9y ago

We detached this subthread from https://news.ycombinator.com/item?id=13559893 and marked it off-topic.

klodolph9y ago

ksec9y ago· 2 in thread

So are the DVCS converging to Git and Git only?

jgalt2129y ago

In the open source areana, it certainly seems like the game is over. But as others mentioned, both FB and GOOG are big Mercurial users and contributors.

Our shop uses Mercurial becuase of its Python basis and the amount of time and effort it takes to master Git makes me draw strong and uncomfortable parallels to emacs.

stephenr9y ago

> the rest of the world is pretty much Github & alternatives

By "the world" you mean, the HN/SV/startup crowd of cool-kids who feel the need to use whatever is popular without regard for how appropriate it is?

yakk09y ago· 1 in thread

I appreciated the Princess Bride reference with "repos of unusual size"

wyldfire9y ago

I don't believe in them.

0X1A9y ago· 1 in thread

WorldMaker9y ago

That's how I read it, that this is about monorepos with file trees with large numbers of files where users don't necessary need every single file in their local worktree to get work done.

I'd assume this GVFS would work hand in hand with Git LFS for the use case of large files.

mfontani9y ago· 1 in thread

So... what happens when one runs "git grep foo" on it?

kevincox9y ago

It will be slow. Small steps. But in practice companies with large repos have other search solutions so that each user doesn't have to do a raw search on the entire working tree.

hoov9y ago

kevincox9y ago

So pretty much I would be all over this if:

- It worked locally.

- It worked on Linux.

Maybe I'll see how it's implemented and see if I could add the features required. I'm really excited for the future of this project.

rbanffy9y ago

Did they really need to make a name collision?

https://en.wikipedia.org/wiki/GVfs

1 more reply

Navarr9y ago

This sounds like a solid use case and a solid extension for that use case - but definitely not the end-all-be-all.

For one, it's not really distributed if you're only downloading when you need that specific file.

But that doesn't change the merrits of this at all, I think.

mortdeus9y ago

Or how about we start some compartmentalizing your codebase so that you can like. You know, organize your code and restore sanity to the known universe.

I think when the powers that be said that whole thing about geniuses and clutter, they were specifically talking about their living spaces and not their work...

zwischenzug9y ago

Does anyone know Microsoft's open source policy works internally? I'm thinking from a governance perspective, as I'm involved in a similar effort at $WORK.

scotty799y ago

I had a medium sized project in Ruby on Rails as git repo inside vm.

It was slow to do 'git status' and other common commands. Restarting RoR app was also slo. I've put repo on RAM disk which made the whole experience at least few times faster.

Since all was in vm that I rarely restarted I didn't have to recreate files on ram disk all that often. I was syncing changes with the persistent disk with rsync running periodically.

myrandomcomment9y ago

"For example, the Windows codebase has over 3.5 million files and is over 270 GB in size."

Okay, so this is a networking issue. Or is it a stick everything in the same branch issue?

Whatever the reason here the issue is pure size vs. network pipe, pure and simple. Hum, when can I get a laptop with a 10GBaseT interface?

One of the issue with the way they are doing this (only grab files when needed) is you cannot really work offline anymore.

amingilani9y ago

Although I could definitely be wrong but this sounds a lot like monolith vs microservices to me.

nojvek9y ago

b1gtuna9y ago

alkonaut9y ago

Could this also help a smaller repo but with long history, making the total repo size too large?

acqq9y ago

And for all those who still try to stick to anything older:

https://github.com/Microsoft/gvfs

"GVFS requires Windows 10 Anniversary Update or later."

dstaheli9y ago

Check out the GVFS back story and details here: https://news.ycombinator.com/item?id=13563439

lolikoisuru9y ago

Is it really that fucking hard to check if your package name is unique?

Here is another virtual filesystem with the exact same name: https://wiki.gnome.org/Projects/gvfs

Debian package for it: https://packages.debian.org/jessie/gvfs

igtztorrero9y ago

Anybody knows what does Linus think about it ?

cikey9y ago

Can we use this together with git LFS?

j / k navigate · click thread line to collapse