Submodules are difficult to use in practice for a wide variety of reasons. There are serious, complex proposals that have made it into git-contrib to build a "better" submodule, but for various reasons these have produced systems that merely make the tradeoffs in a different way that some people prefer.
This is not like any of those proposals. His problem is that "git add" "git diff", etc., don't "understand" submodules. It would be as if ls, cd etc. don't "follow" symlinks, so that you had to navigate to the correct directory yourself before you can use standard unix tools.
This is a serious problem, but his solution is essentially "we should use hardlinks instead of symlinks". That is, he wants to take the code that understands submodules out of the individual tools, and pop them in the filesystem somewhere where they are "shared" among more of the tools and don't have to exist in any of them.
There are many objections to this proposal. The chief one seems to be that this does not seem to directly address any particular problem. I think Ramkumar perceives that the reason git add/diff/rm don't support submodules is as a metaproblem "it is too hard to add submodule support to arbitrary tool". Whereas the git maintainers are saying "It is possible to add submodule support to arbitrary tool." So that's the initial standoff.
Another problem is that this requires a filesystem change, and that is essentially the most stable part of git that breaks incompatibility with other versions. If you read Linus's rants, you know that he generally applies an enormous amount of scrutiny to breaking compatibility. And so from his desk, you would need not just one clear benefit, but an overwhelming number of them, to break the contract like this.
But what I suspect is the True Rejection here is that this will pan out like all the proposals before it: to be different, but not strictly better, than the current implementation. To return to the POSIX analogy: we have both symlinks and hardlinks, and which one is better depends on what you are doing, there is no "one true link". If you replace all the symlinks with hardlinks, I think you will run into trouble with the hardlinks too.
Finally, it is unfortunate that the flamewar is about the monolithic patch rather than about some of the principles that led to the patch. I think Ramkumar has had (at least) two very good insights: that "git add" and friends should understand submodules a lot better than they do, and also that they should have this understanding by way of consuming some API that understands them rather than incorporating separate code for submodules into every tool. These strike me as a concrete improvement over the existing system, and I wish that the energy that leads to huge unusable patches like this could be redirected into usable ones.
The chief one seems to be that this does not seem to directly address any particular problem.
Except that you later say: I think Ramkumar has had (at least) two very good insights: that "git add" and friends should understand submodules a lot better than they do, and also that they should have this understanding by way of consuming some API that understands them rather than incorporating separate code for submodules into every tool.
This is exactly the problem this solution solves. Instead of having a weird configuration file in the working tree for something that should be an integral part of the repository, there will be a generic system for adding links. With this generic system in place it is much easier to implement "git add" and friends support for submodules.He repeatedly makes this clear but no one reacts to this point.
But what I suspect is the True Rejection here is that this will pan out like all the proposals before it: to be different, but not strictly better, than the current implementation.
Implementing code in a different but not strictly better way that allows you to more easily understand and extend your library is called refactoring. This 'True Rejection' is essentially rejecting the merit of refactoring code.I also don't think that the hardlinks/symlinks analogy holds very well. Hardlinks and symlinks are both features in their own rights. Having submodules be defined as a weird file instead of as a part of your repositories objects is a superficial change, he also states this. Everything the current submodules do could be achieved using the proposed solution. (As he repeatedly has to make clear to Linus and Junio)
> weird configuration file
One of the disputes here is that the maintainers are of the opinion that config files are actually good, on the face of them. They point to examples of well-settled uses like .gitignore to claim that config files are The Git Way.
It may very well be that configuration files are in fact weird, or are weird in this particular case, but since the convention is and has been for git's history that config-files-are-good it would require a well-reasoned essay to move the needle of discourse on this subject, not just to use "they are weird" as a claim to prove something else.
> This 'True Rejection' is essentially rejecting the merit of refactoring code.
I don't want to get into a big meta-meta flamewar here, but there are many people who do reject the merits of refactoring working code, for some definitions of "refactor", for some definitions of "working", and this has been the subject of many popular essays, most notably Spolsky et al. This is another place where moving the needle of discourse would require writing a well-reasoned essay that quotes the appropriate authorities, and it is not sufficient just to appeal to a particular view of the merits of refactoring as a claim to prove something else.
> Hardlinks and symlinks are both features in their own rights.. [this] is a superficial change.
This is another one of those thorny semantic problems that are preventing us from understanding each other. There is a sense in which it is superficial, and another sense in which it is a substantial change. If you are using "git add", or are implementing it, it is a superficial change. If you are writing subtree-merge or git-submodule or something that really needs to understand the storage of submodules, it is substantial.
And so they are both features in their own right, in the sense that: git-add-and-friends will want to access things with a certain pattern, and git-submodule-and-friends will want to access things in a very different pattern. This is why I suspect the solution here is to have two distinct APIs, that access the same underlying storage mechanism. And if it makes sense to continue to support something very much like the old API, it probably does not make sense to redesign the FS to look like the new API.
Of course, there is a lot of resistance in the git community to have two ways to do the same thing. So when I say "I suspect the solution is to have two APIs" I mean only that it would address most of the objections raised thus far, not that it would actually be implemented in mainline.
> Everything the current submodules do could be achieved using the proposed solution. (As he repeatedly has to make clear to Linus and Junio)
And as Linus and Junio have repeatedly made clear, merely doing everything the current implementation does is not within a few galaxies of meeting the burden for breaking FS compatibility. The compatability-break burden is extremely high.
I'm not interested in wasting any more of my time with this nonsense.
I give up."
http://thread.gmane.org/gmane.comp.version-control.git/22051...
I have utmost respect for Junio, Linus and the others, but realize that they have some negative attributes like all human beings do. Junio can be especially defensive when it comes to something new, although it's not completely without reason. After all, we do have an ultra-stable and well-maintained piece of software because of him.
- Linus is the original author of Git, and he wrote it in April 2005. He doesn't contribute anymore, and is rarely seen on the Git mailing list these days (except when something like this happens). In number of patches, he's #4, after Junio, Jeff, and Shawn.
- Junio is the maintainer of the Git project. He took over maintainership of Git a few months after it was originally built, in July 2005.
- Jonathan is a very big contributor at #6. He doesn't focus on any one part of the codebase, and contributes to a wide spectrum.
- Jens primarily contributes to submodule.c/ git-submodule.sh, the current submodule implementation. Along with Heiko, he's one of the authorities on the current submodule system.
- Ram is a small contributor. He started out in Jan 2010 with two GSoC projects: one in 2010, and another in 2011 (neither were in submodules).
Submodules are for tying project parts together, where you have control over all of them. For example, the clang compiler frontent could submodule the LLVM backend. Both are under the LLVM project, so people usually work on both of them at the same time. They should not be in the same repo, since LLVM also has other users unrelated to clang.
Subtrees are for integrating external projects, which are not really under your control, but you probably want to follow upstream developments. Since a subtree includes all the repo data, you can cleanly check out, even if the external origin repository vanishes.
Submodules provide weaker coupling and make the most sense when the submodule has its own healthy upstream and you want to track those versions. It's awkward if all submodule development is happening from within the parent.
Too bad it's not enabled by default: http://engineeredweb.com/blog/how-to-install-git-subtree/
In fact, many people getting started with git get confused about whether subtree or submodule is appropriate, and end up wanting parts of both.
Big meh, and I'm normally interested in the evolution of Git.
Linus makes a rather unconvincing argument against the system, saying the current system allows for submodules be different for local sites. As if the proposed system would not support that, and as if the current 'dirty submodule' system is a better solution. He's being an absolute moron.
And Junio is just being very unproductive, he seems fully incapable of inducing anything from the design Ramkumar proposes and fails to see implications that anyone could see, even though he is a core git guy. And frankly he's being an ass too.
What I see is someone enthousiastically trying to fix a core problem of git in an ambitious but well constructed way, and a bunch of old guys just bashing the life out of him.
I think he's better off just not asking Juno or Linus for advice and just keep on hacking on his fork. I know I would use it.
Correct me if I'm wrong, but isn't the problem that Linus brought up this: If you introduce a new object type, you need to get it right. A new object type would create non-backwards-compatible repositories, so you'd have a new minimum Git version. If you were to use this fork, then everyone who checks out your code would have to use it. Also, it would preclude tooling support (eg GitHub). Once such important repository versioning decisions are made, they can't be unmade. Git, at it's core, is basically just a well designed repository model.
If he had been making his own VCS he wouldn't need this kind of review, but Git is an agreed-upon format and protocol; it is absolutely necessary to start by considering the downsides when core changes will affect a large user base.
Yes! And in the next breath, Linus flat-out admits he doesn't know if anyone uses/keeps a dirty local .gitmodules file. A great example of being out of touch with your users and still thinking you know best. Arguing to keep a (IMO minor & weird) feature around without knowing if anyone even uses it is folly.
I think he's better off just not asking Juno or Linus for advice and just keep on hacking on his fork. I know I would use it.
If only it were that easy. His changes would create a git implementation incompatible with everyone else's.
He also, in the thread, explicitly acknowledges that he's not sure about the best design and asks for help.
I'd be honored to have an idea shot down so mercifully by Torvalds.
Git's SubmittingPatches document says to use an imperative tone in commit messages. That's why it reads the way it does.
Am I missing something?