What happens when you ship bugged code and need to roll back?
Rolling back mechanically is sometimes safe but not always safe, because you have data.
Your production deploys continue monotonically into the future as usual, and you now have version-controlled documentation of the rollback, instead of needing to maintain a separate mapping of production <-> code state.
At my company rolling back is a matter of a minute or 2, while pushing a revert is more in the 5 to 10 minutes. Your mileage can definitely vary though.
However it requires all the shipped changes to be rollback compatible, which isn't that hard with a bit of experience.
It's definitely not ideal, but I'd bet it happens more often than we'd like to admit as professionals.
Another thought - at that same company there were I think 5 or 6 different maintained enterprise mobile apps. Jumping between repos and dealing with the various app stores/approval processes it was easy to get lost regarding what was in production.
Does anyone else encounter very large diffs as standard operating procedure in golang codebases? Here I'm considering "very large" to mean 500+ lines changed - however github computes lines changed.
I mean, I'm looking at this from the perspective of someone who contributes primarily to Ruby codebases. It's understandable that there would be a difference when comparing ruby/go, but the magnitude of the difference is what I'm getting at.
The difference I'm observing in the average size of a PR in a plain old Rails codebase compared to the average size of a PR in a golang codebase is dramatic - roughly an order of magnitude. In some cases it's even more extreme.
Is that anyone else's observation, or am I just doing it wrong?
If there's some secret to architecture or tooling (e.g. feature flags) that people use to keep PR size down in Go codebases, would love to hear it.
Haven't worked in an enterprise Java setting before but would be curious to compare diff size there.
Go is a concise language but it's not expressive - you'll write 4 lines to call a func and check an error.
This means that PRs are inevitably going to be larger than a language like Ruby or other languages with metaprogramming.
Adding whole new chunks of code, refactoring the public interface of widely used systems, etc. tend to result in large diffs - lots of 'churn'. Extending existing code, bugfixes, more localized refactoring and doc fixes - more 'stable' codebases - tend to result in smaller diffs.
Also some difference between e.g. git and perforce - branching is easier, so I can make smaller commits without worrying about breaking the dev branch...
That being said, the churn is typically much easier to review and understand, and thus have confidence in, especially compared to Ruby or Python.
Sorry, no. People have been shipping, and understanding how to ship web applications for... what, 10, 15, 20 years now?
There is like 10 million people around that shipped a web app.
What I do is -- look at what very big players are doing (Salesforce, Facebook, etc) and follow that.
Also, look at what all random minor players are doing, and absolutely ignore that.
If it's conceptually a single unit, keep it a single unit. Pushing dead code in small pieces that waits for a small master switch doesn't give you additional safety (though it can give you easier merges).
>Reviews for large diffs are closed with a single “lgtm,” or miss big-picture problems for the weeds.
That means you have bad reviews. If it can't be figured out, how was it written? Take the time and do it right. Yes, it's a large commitment - is it larger than the value being created? If no, why does it exist?
Each thing can interact with each other thing (in theory), so if you have double the diff, you might need 4 times the amount of work. And the increased workload increases the risk of a mistake.
But splitting up the diff in two doesn't change the result, right? The reality is that everything doesn't interact with everything else. But the reviewer doesn't know this, and needs to confirm non-interaction (hence a review).
If you, as an implementer, know the divisions, then shipping the pieces that are independent will do two things. First, it makes the review easier, letting it serve its main purpose: catching errors in implementation. Secondly, it will let _you_ confirm you know the divisions. It could be that your conception is wrong! Your small diff might not work because of an interaction you didn't realize.
Personally, I think that by the time you're sending diffs, the general architecture should have been decided. We tend to have at least one person (not the implementor) who knows how a feature is going to be implemented, so the architecture can be confirmed. You want someone to be looking at the forest! I think that doing this during code review is a bit harder though.
Taking the time to do it right is hard, and chopping things up makes it easier to do "it" (implementation review) right. Architecture review should probably be happening in a different space.
On a social level, you are asking for your code to be reviewed. Please have empathy for the reviewer.
Absolutely. Which is why I draw the line at "conceptual units". If there are rational sub-divisions that don't require context, go ahead, divide! If you're baking in cleanup + feature + bugfix + convert tabs to spaces, you're mixing things in the same way that you would usually avoid when coding. Same thing with too-large commits - we avoid 10,000 line files and functions, do the same with commits.
I caution only against artificially small diffs. Small is a fine target, but there's a lot of dogma around it everywhere I've been, which is why I think it's worth calling out. Breaking things up too much can remove context, which can be dangerous.
---
If you're introducing a large thing in 1 vs N units (say a full feature in one go, which is unusable until complete), the line is of course fuzzier. In some ways, sub-units may be more reviewable - that's a solid benefit, though they're not likely a dozen or two lines of code (if they are, and it's a large feature, you probably now have dozens of reviews). But you aren't getting any additional safety, because it's all either on or off - which is half of the post.
For the other half of the post, if you're not getting the same reviewer(s) for all the pieces, unless you've had a detailed architectural review[1], has anybody but you read and understood the whole system? When it's turned on and it breaks in a subtly-interacting way and you're on vacation, does someone else know where to look? Doing it all in one go, as it will be when turned on, forces an architectural review of the system implemented, not just the plans. Say it takes a day or two to thoroughly review - is that bad?
---
Always tradeoffs. For the situation described in the blog post, where large reviews get insufficient attention, I don't think small diffs will solve the problem. The problem is the lack of attention given to reviews - fix that. It's critical regardless of diff sizes.
[1]: No place I've worked for has truly done this, which I would claim is the (vast) majority, but I wholly believe some places do. I would probably even like it! But it's perceived as slowing things down, so it hasn't happened for me yet.
Architecture review distinct from deploys is key, as is a separate operability review. And it's also important to mentally decouple the feature release to end users from the code releases.
Avoiding review fatigue has to be done some other way than merging smaller diffs, for example splitting changesets among reviewers so that person A reviews the backend code and person B reviews the frontend code etc. A meaningful feature of 1000 lines can still be reviewed in terms of 20 50-line changes, and doesn't have to be reviewed in one sitting or by one person.
You are perfectly right that pushing dead code and half baked features (that won't be used, so aren't actually in production!) is useless and adds no safety. Adding a new feature is risky, and adding it as 99% dead code and then finally adding the 1% code with the button that actually enables the feature in the UI doesn't help - 100% of the new code will hit production with that final change.
Keeping patch sizes to a minimum is good, but it's a minimum of what's feasible based on other concerns. If "few dozen lines" works for you that's great, but don't overgeneralize from that.
A potential solution: Non-fast-forward merge commits. Best of both worlds: Small diffs and large diffs. (I'll also note that I haven't found this to be a problem in practice.)
> and can slow things down if all the pieces have to be pushed separately through a slow CI pipeline.
And can speed things up if the CI pipeline can pinpoint the exact change that broke the build. Fat commits just obscure the underlying problem - although sometimes that's the best you can hope for.
None of which are intrinsic. And maybe they've been fixed since my last attempt? But noticeably painful.
I do feel like this becomes almost an artform. Small but concise PRs aren't that easy in some situations and smaller PRs that address large refactoring in pieces seem to win over one large PR that is difficult to parse mentally. What that looks like varies by project or organization and I feel like one size definitely doesn't fit all.
Walking through a half dozen or so commits in a PR is still manageable.
Too many small things deployed over time can be a pain to roll back too.
On the flip side, huge change-the-world deploys are bad too.
It's a balancing act of managing the initial feature size, getting it out, layering on more in subsequent releases.
Where tiny deploys work really well is when deploying refactorings, little housekeeping chores like upgrading libs. If, while working on a feature or bug, I need to add a missing test, refactor, or upgrade a lib, I extract those commits out and deploy them ahead of time when possible. That way the resulting feature branch is very focused on the feature at hand.
Doing git fu to reverse that is not worth the time imho. I'd rather invest time in factoring (and testing) tiny deliverables.
At least then, you still have the diff as a travelogue.
A single line change can easily break an entire system. Applying this size fallacy, however is just as dangerous. Certain classes of changes should be grouped together so they can be reviewed with the context required.
I've had the unfortunate task of working in environments where I've been forced to artificially break up my PR's to suit some arbitrary rule, no doubt because an article like this got read at some point.
What ends up happening is instead of having everything related to a specific changeset or feature in a single PR/commit, you now have it strewn across multiple PR's/commits, non-nonsensically, and with artificial borders most of the time. This makes review more difficult and creates additional work to support the partial implementation working at each slice.
Process should not replace thinking. If a feature warrants it, who cares if the PR is 2,000 lines or 12. Not every task is small and not every task is large. As long as the scope is well defined, many times it does make sense to do tasks in whole rather than as a sum of parts.
Agree with this. I've worked in teams before where the order of tasks to be done was chosen by someone not on the development team and I found it seriously inefficient. When you're reviewing code, you have to think "what could this break?". When you group related changes together the surface area of things that can break is lower so you can more efficiently check for issues.
Individual commits don't necessarily make sense in the whole. Submit diffs which make up a logical, conceptual block of new changes. Don't mix formatting changes, whitespace cleanups, or rearranging changes into diffs with real meat. Ship new prerequisite tools and new changes to utility functions or libraries first, then the meat: i.e. split general codebase upgrades and distinct features into different shipments.
Make it so that "git log" will produce a list of sensible steps if someone else reads the listing. Remember, 90% of programming is writing good code and the other 90% of programming is communication to others.
If a developer is creating one bug per day. Is it better to release it every day or is it better to wait 1 month and release 30 bugs all at once?
Continuous improvements are also highly motivational. The difference is between saying "We can fix that at the next release" and losing the motivation to fix it later and saying "Just shipped the fix" and feeling great.
https://en.wikipedia.org/wiki/Release_early,_release_often
What's old is new again.
So, he just sends a gist link to his teammates? why not create a small pull request and then asking them to look over it?
haha