https://codewinsarguments.co/2016/05/01/git-submodules-vs-gi...
Or the non-standard git-subrepo?
Have you run into any "slow push speeds" with subtrees, as the person complains about in the first article I linked?
Is git-subrepo good enough that it's worth the risk of using something not built into git (and the hassle of installing something extra)?
The simple fact that switching branches doesn't auto update the submodule is awful.
git config --global submodule.recurse true
https://stackoverflow.com/questions/1899792/why-is-git-submo...They're fine if they are completely hidden from human interaction and used in a programmatic way, but the human interface with the concept seems to always be so awkward.
git <git-command> [args]
You would expect to do the same thing with a submodule, it would be: git submodule <git-command> [args]
But no, submodules have their own set of commands. It's like it's purposefully obtuse for no good reason.At least it's consistent with the rest of git then...
Git is one of those tools I just can't muster the willpower to truly learn. I use SourceTree and hope for the best, and search the web frantically when something weird happens.
I used Mercurial from only the command line for many years, never felt I needed a GUI. But git, there's just something about it.
git submodule update
and not git submodule pull
I understand pull in this context would be a submodule command, not a git command, but why use different terminology from git for what is the same idea?For languages with proper package management (ruby, python, go, node, etc...) put in the extra effort to utilize your package manager to update your dependencies instead of bothering with submodules. If you're still set on doing submodules, I'm willing to be you're just "doing it wrong" (TM).
Subversion is way easier to use than git, and it takes minutes to setup a server. Pushing your project history is just a few steps with git-svn, though it may take a while depending on the size of your project.
TortoiseSVN is probably the most straightforward and easy to use version control GUI there is.
(Having used both I'd say I dislike both equally.)
The author also gives (IMO) 2 weak reasons against submodules.
(1) He says its hard to know which repo you're editing (main repo or submodule). I agree, but in practice this hasn't been an issue for me. A simple `git status` or `pwd` is usually enough to know which repo I'm editing.
(2) The author also says that committing changes with submodules can be confusing since it involves multiple commits: one commit in the submodule, and another in the main repo to update the commit it points to. I agree this is a little confusing at first and definitely tedious, but conceptually I think it is pretty simple.
That said, I do agree that submodules are confusing -- just for different reasons.
My main gripe with submodules are that they don't work well with the rest of git. Why isn't adding a submodule just a `git add` to a directory with a git repo inside of it? If there are new commits in a submodule, why doesn't `git checkout .` reset it back to the commit the main repo points to? If I clone a repo with submodules, why do I need to run additional submodule commands to get an exact copy of the codebase? Basically, to me it feels like submodules were slapped onto git as an afterthought and little care was taken to think the git CLI experience as a whole. I think submodules would be a lot less confusing if git had designed a better CLI for it.
>If I clone a repo with submodules, why do I need to run additional submodule commands to get an exact copy of the codebase?
`git config [--global] submodule.recurse true` resolves both questions.
As to why is that not the default? I _believe_ there are security concerns, but I'm not 100% sure.
For any project where I could choose not to use them, I choose not to use them.
In the first[1], it's to include the project's JS implementation in the project's website. Considering that this would be the only JS code running on the entire site, it seemed like overkill to throw in some newfangled asset manager like Bower just for a single NPM package.
In the second[2], it's to include the encoding/decoding test cases alongside the implementations. This way, instead of having to maintain a bunch of independent per-implementation unit tests, I can maintain all the tests in one place, and then have the per-implementation test suites snarf the test cases, and I then know with reasonable certainty that all my implementation libraries have equivalent behavior.
There are probably other, "better" ways to do both these things - I could bite the bullet and use Bower for the website, and I could have test suites download test cases on-the-fly - but submodules were the path of least resistance, and I've yet to encounter any significant downsides.
----
[0]: https://base32h.github.io
[1]: https://github.com/Base32H/base32h.github.io - specifically /assets/base32h/, which points to https://github.com/Base32H/base32h.js
[2]: https://github.com/Base32H/base32h.rb - specifically /spec/cases, which points to https://github.com/Base32H/base32h-tests
When you have library that is useful to more than one project, but not popular enough to have its own package on multiple package managers. Then the easiest way to reference a specific version of the library is to submodule it.
First line of the article explains the title.
> Spoiler alert. I do not hate submodules.
Clickbait much? Make up your mind?
It's an architectural advantage to separate each module into a different repo as it encourages careful separation of concerns.
If you find that you often need to update many modules together every time you want to add a new feature to your project, this is often an indication that your modules do not have proper separation of concerns and your abstractions are leaking. It means your project exhibits low cohesion and/or tight coupling between modules.
The difficulty in maintaining separate module dependencies is actually a very useful signal to you as a developer that your code is too tightly coupled and needs to be refactored into modules which are more independent.
Monorepos are a bandaid patch solution which covers up the root problem. The real problem is incorrect separation of concerns, AKA low cohesion which leads to tight coupling between your components.
It's not possible to design simple interfaces between components when these components have overlapping responsibilities.
- in 2 repos -> 2 PRs -> 2 test suites -> 2 code reviews
- in a monorepo -> 1 PR -> 1 test suite -> 1 code review
When your project grows in complexity, there are some concerns that cross the boundaries of your repositories (CI/CD pipelines, testing and QA being a few examples).Having a monorepo helps.
Consider having all your docker images and helm charts alongside the source code of the many parts of your big project. Is that really an anti-pattern?
EDIT: also a new dev arriving in the team, having to clone only one repository is easier for them. I also try to have a simple docker-compose stack so they have only one command to spin up the whole dev environment.
The notion of 'cross-cutting concerns' is also an anti-pattern. It's a violation of the 'separation of concerns' principle. A violation of the 'cross-cutting' kind, to be exact.
There are almost always better alternative solution which don't involve cross-cutting concerns but which require a slightly more carefully thought out architecture.
When it comes to testing, I agree that (for example) integration tests are extremely valuable but I disagree that having the source code of your dependencies in the same repo yields any benefits for integration testing.
Ideally each module dependency should have its own set of tests which test its features based on the appropriate level of abstraction. Dependencies should be more 'general purpose' (suit more different use cases) while higher level logic should be more fitted to the specific business domain. Integration tests should not test the implementation of module dependencies; dependencies should have their own tests.
Higher level tests can sometimes help to uncover issues in dependencies and thus help you to design the tests of those dependencies but keeping them separate is essential because the dependencies should represent a completely different level of abstraction.
You don't want to end up tightly coupling the tests of the main project with the implementation details of its dependencies. Separating the tests correctly helps you to ensure that the scope of your tests is limited to the correct level of abstraction.
My point is that while it's desirable to integration-test a project with its dependencies plugged into it, those tests should not reference any specific implementation details of those dependencies... Because, otherwise, unrelated changes in the implementation of the dependencies are likely to break your higher level tests (which should not be the case); code changes within dependencies should only break your higher level tests if those changes affect higher level behavior.
For example, changing method names and arguments of a dependency should not break your top level integration tests (assuming you've made the matching code changes in your main project source, you shouldn't need to change the top level integration tests at all, they should still pass), the top level tests shouldn't care what the method names of dependencies are and they especially shouldn't care about how those methods are implemented.
I dislike having to warn in the README file, "somewhere in this repo are submodules, and you need to use this incantation to clone them."
I think you can now have the submodule reference point to a branch but it can't point to a tag, which is what I want most of the time.
And there's how `git status` just reports "modified content" without going into details.
The "correct" (albeit still annoying) way:
- git submodule deinit <path/to/submodule>
- rm -f <path/to/submodule>
- git submodule add ... new/path
In an emergency situation, you can almost always recover. I've not corrupted a repo in almost 10 years and I do some unspeakable things to them :)
To remove a module manually:
- git reset . (from root; NOT --hard)
- Remove the <path/to/submodule> from working directory.
- Remove entry from .gitmodules
- Remove entry from .git/config
- Remove (-rf) the folder .git/module/<path/to/submodule> directory (it follows the same structure as the working directory)
- git add -A .gitmodules <path/to/submodule> (Tab completion might not work but the command will)
This can be used to forcefully remove a submodule.
To remove all submodules without starting over (I've personally never needed this in the last X years):
- Remove all working directory paths for each submodule
- Remove .gitmodules
- Remove .git/modules/
- Remove any mention of submodules in .git/config
- git add -A
Usually the first manual one solves whatever problem you're facing.
EDIT: I have no idea how to format HN comments, sorry :| nothing I try works. Hope it's readable.
So there is a scale from quicker iteration to less coupled: same repo, submodule, different package. The question is in which cases the middle step "submodule" is worth it or if you should rather switch to one of the others always.
Therefore I like submodules and would argue that people might underestimate the increased round trip time for publishing modules and then referencing them in code compared to using submodules
But they really do enforce a very strict version control. If we want to be absolutely sure of a version, Git submodules will give that to you.
But I only use them in one PHP backend project that I plan to barely ever change, because every change means that I have to crawl through a raft of repos, updating a submodule chain.
It's exactly the kind of operation that calls for a scripting solution, but it is also one of those projects, where I change it so seldom, that it isn't worth it to write a script.
For my frontend (Swift) work, I use SPM (Swift Package Manager). Much easier on my nerves.
That being said, maybe submodules would have solved some problem here or there that we had. I'm open to arguments in favor of them in that case. But we've always been able to get the work done without them, so I don't personally think they're indispensable.
I used to have a bunch of hacky scripts for working around this, but lately I've just been giving up and avoiding submodules as much as possible.
Not sure how mature it is or if any other projects use it. But seems to be working well for Zephyr.
The main thing I learned is if you mess up any part of your submodule during creation - do not try to fix it. Just delete it from the parent repo and start over.
Also do not bother deleting it using git commands. Delete it in the .gitmodules file, then search your .git folder for every reference of the repo you want to delete (including folders named after it) and delete everything.
Either that or start with a clean parent repo clone.
They came about because everyone hates submodules.