But I don't get how they went from that to "estimations are bad". It seems to me that they're just doing better estimations now by looking at historical data?
> I could tell confusion from the look of my team lead. How could I say it would take over five days for a change that we estimate to take 10 minutes!
Isn't this like comparing apples and oranges? The estimation of 10 minutes is clearly "the time it takes to flip the flag", but what they actually want to estimate is "the time it takes to make it into production in a safe manner".
Maybe the team leader is confused because they thought this was a man-hour estimation? And they're now afraid that one developer will work full-time five days on it?
> On Monday, we started with the stand-up and discussed picking up this task. Another engineer proposed fixing a bug on another component first before turning on the flag.
Doesn't this imply that their 10 minute estimation was done completely without asking the team?
> As humans, it is impossible to consider all the potential complexity. […] We did not consider a legacy system with bugs and scaling problems. […] We did not consider a combined release process with a Release Manager.
Really? You couldn't consider the complexity of a release process you designed yourself? You honestly thought this was a 10 minute task even though you knew you were dealing with a legacy system?
It’s just as reliable since everything averages out over time and the queue size is a leading indicator while cycle time is a trailing indicator.
> But I don't get how they went from that to "estimations are bad". It seems to me that they're just doing better estimations now by looking at historical data?
For me personally, I went from estimations are just really poor even for the most simple things and not worth the effort.
It is correct that is can still be seen as migrating to a better form of estimation. If you look at the conference talk by Allen Holub[1]. He tries to make the distinction between estimation and prediction is that the latter is only data driven.
> Maybe the team leader is confused because they thought this was a man-hour estimation? And they're now afraid that one developer will work full-time five days on it?
That could be the case. I never looked at it that way.
> Doesn't this imply that their 10 minute estimation was done completely without asking the team?
Yes, it was done with just the team lead and me. We migrated away from Scrum at that point and we did not do full team refinement with estimations sessions pre-sprint. We would collaborate on the technical solution for any story when we would pick it. This could be full team to two engineers. The engineer that brought up the bug would probably bring up the bug.
> Really? You couldn't consider the complexity of a release process you designed yourself? You honestly thought this was a 10 minute task even though you knew you were dealing with a legacy system?
Yes, I honestly did! It was just a simple code change in our terraform code to set the config from false to true and cycle the machines. I should clarify this more in the story.
> It is correct that is can still be seen as migrating to a better form of estimation.
Well, from my perspective there is still estimation going here: A project manager came to you asking for a timeline for a feature and you replied with a concrete number of days. I don't think it makes any difference to the project manager whether you call it an "estimate" or "prediction".
> For me personally, I went from estimations are just really poor even for the most simple things and not worth the effort.
The example you're presenting isn't showing this because it doesn't seem like you made any actual effort estimating it. It would have been better if you showed that you spent hours talking with various people for input and then in the end the Cycle Lead was more precise. In this example you did an estimate without even talking to the team! (Yes, you talked to the team leader, but apparently they didn't know that there was performance issues with enabling the flag so there's some communication lacking here.)
If you wanted to properly estimate it you would at least (1) ask in the stand-up if there was any concerns about flipping the flag and (2) communicate with the release manager to see that it would have been possible to do a release. And it looks like if you actually did this then you would end up with an estimate that is closer to the actual time. It might also have uncovered that there was another release process happening at the same time.
Considering that the standard deviation is two weeks (!) it seems more that you were lucky in this example that there wasn't more unplanned work. What if the performance issue required three days of work instead of one? What if the release manager told you that they were upgrading database servers and couldn't do a release for two days?
> Yes, it was done with just the team lead and me. We migrated away from Scrum at that point and we did not do full team refinement with estimations sessions pre-sprint.
There's a middle way between "full team refinement with estimation" and "only team leader and me estimates". You can ask around and double check before giving out estimates to project managers.
If "estimations" means the usual, i.e. asking "Ok everyone, how long do you think this task will take to complete?" then yes, that is largely a waste of everyone's time. Those estimations are bad.
> they're just doing better estimations now by looking at historical data
Yes, better estimates are coming from data. The people aren't "doing estimation" any more, it's being automatically calculated. Doing estimation takes time and give wildly wrong results.
So that list was the spine of our annual planning, the stuff that we really prioritized. The rest of the stuff that people wanted to do would have to fall in between the gaps of that master set.
It worked surprisingly well for us, but we had to build this group experience first. After a few years with another dev team at another company, I brought it up as an alternative to over ambitious roadmaps and it worked well there too.
How much of this was just due to more team familiarity and experience or this oversimplified process? Don’t know. But it felt like the roadmap trade offs were more thoughtful, devs felt more relaxed and had reserves if we had to red line a bit, and fewer missed commitments.
Measure velocity for sure. Keep track of what kinds of things harm or improve it. But the absolute worst thing you can do is find an abstraction like tshirt size or rolling average and superimpose it on discreet work. You’re going to be wrong. And probably disappointed or disappoint someone else.
My strategy for estimation is:
- It’s tiny and I’m doing it now, I can give you an estimate within an hour margin.
- It’s small, it’ll be done in a reasonable amount of time that no further estimation is required.
- It’s not small and more research/design/planning is required. Any estimate you extrapolate from this is dishonest.
That’s it.
Edit: that wasn’t quite all. “It’s tiny” is reserved for urgent situations that require rapid response and not something I offer outside that situation. For daily work, it’s small or it’s too big to say. Tiny things are too easy to abuse (by managers or engineers) if they’re part of the normal flow.
The scrum people will tell you to call this unit "story points", but it doesn't matter what you call it, as long as it's based on something connected to task duration. Difficulty, complexity, risk, whatever.
Note that if you're tracking the conversion rate between developer estimated hours and real hours, you're already doing this. But you'll get an increased accuracy by using units that are explicitly not time-related, just because of quirks in how human brains think about time.
It is all about feedback.
My analogy is an old analog style joystick that positions a simulated robot arm on a screen. If the update rate is high enough I can track a rapidly moving dot, but if it drops across some threshold, and I needed to accurately position the arm at some time in the future, then I would need to construct a huge model of the system, know the force, stiction, friction, mass, moment and thermal expansion. (edit, I have another analogy, anyone can spray a moving target with a hose, but using a bow and arrow requires skill and practice)
Feedback allows us to use unpredictable components to make predictable systems. Those systems are nearly always amplifiers. Systems that use direct feedback don't have to have the same reductionist model as something that needs better prediction (estimates).
This is why Lisp was a super power in the 80s, it had a repl. Same as Smalltalk, the IDE and repl and the universe were all the same thing. It makes total sense that agile came out of a system based around repls and instant feedback. Arduino did it for embedded dev. Hypercard for programming, the spreadsheet before that.
Highbandwidth feedback allows us to be less skilled. Good estimators need to be highly skilled to make those estimates. Hose vs arrow. That reminds me, have you seen a really skilled FPS player on a predictable but high ping connection? They are almost timeless in how they predict the future, and to everyone else they dance between every 10th frame. Amazing predictors!
You can't agile a martian probe (yet). As new ways are discovered to reduce cycle times, the time between cause and effect, each proceeding structure of feedback is replaced with an even higher bandwidth one. Robust DFU is a metarepl as hardware manufactures race to ship products that are literally not finished and require a firmware update on boot to even function.
For a concrete example, let's imagine a team which has a continuous delivery pipeline which involves a code review step and manual acceptance tests. Let's say that the code review can stay in a queue for a couple of hours, or even sleep into the next day, and that the manual acceptance tests require the feature to be deployed to a preprod stage after passing through all unit and integration tests, and that it might take a day to run.
With this process alone, the ticket already takes at least 2 or 3 days between being assigned to someone and being marked as done.
Now, let's say that the coding bit of a random ticket might take 5 minutes or 3 days. This means that the overall time between the start and end time of a ticket is about 4 days +- 2day, which means worse case scenario, it takes 6 days to close a ticket.
How is this sort of estimate not possible?
The problem of providing estimates is not one of predicting the amount of time it takes to close a ticket. The problem of providing estimates is a problem of processes, and how to adequately organize, structure, and classify work. If you don't know what you're doing then you don't know when you're done.
Yes, I try to make the distinction between prediction looking at historical data and estimation asking the engineers how long it will take.
1. Split the project into tasks. (a task taking at least 3-5 PTs)
2. Have a POC / prerelease phase
3. For complex tasks introduce a second polishing task, like "Basic User management 1/2" and later "Basic User management 2/2". If the second one isn't needed: Great!
4. For all tasks/areas estimate a lower and an upper estimation in hours or PT.
5. Revisit your estimation afterwards using this process:
5.1. Imagined that someones gives you the following choice: You get 1000$ if your the true value lies within your estimation 90% of the time. Or you can roll a dice and in 9/10 you get the 1000$.
5.2 If confidently pick your estimation ("I'm 100% sure the true value is within my bounds") then it's too wide.
5.3 If you rather pick the dice-roll, then you don't trust your estimation, it's too narrow.
5.4 Adjust your estimation until both options seem equally probable to you: dice-roll or your estimate being true 90% of the time.
I'm also confused as to how you can look at "mean of one week, with a standard deviation of two weeks" and not wonder about the asymmetry of the distribution, and if mean/sd are really the metrics of choice here. (I'd think 1st/3rd quartile are a better choice, because it gives some clue as to skewedness)
We also focus on these during retro. What made us remarkably slow? You want to drive down these issues.
This investigation can help you in future work. I worked with another team that was consistently slow if they would need to build new pipelines or get connections up and running to third parties(heavy regulated environment). This insight helped us in planning.
They’re clearly missing 4 days of standard requirements in their estimates.