> (a) there’s no difference between test-first and test-after but (b) interleaving short bursts of coding and testing is more effective than working in longer iterations
I'm glad you quoted this study because it's a perfect example of a conclusion that should have been the starting point of a more interesting experiment. Interleaving code and test is known to be the most impactful factor. TDD and TRL don't differ in any of the ways that account for the vast majority of defects. Therefore of course the difference should be small, and of course shorter iterations should be better.
[1] https://www.slideshare.net/AnnMarieNeufelder/the-top-ten-thi...
The key takeaway is to evaluate everything in terms of sensitivity because it gives you design insight. The above quoted study identifies one insensitive factor (order) and one sensitive factor (iteration length) of interleaving test and code. Now you can choose, modify, or design any test method as long as you do short interleaving. The order is something you don't have to worry about. If the research doesn't reveal anything about the sensitivity, don't worry about it until someone figures it out, your beliefs are probably wrong and inconsequential anyway.
The default, intuitive approach is to write a whole bunch of code and then test it. When people start doing TDD, they struggle with only making enough changes to the code to pass the test. But it helps a lot that the test suite tells them when to switch from writing code (which they prefer) back to writing tests (which they don't prefer) by passing.
Then we discover that the main reason TDD works is that it gets you to interleave coding and testing in small batches, and it works just as well to reverse the order of the batches but keep them small.
And then, somehow, knowing why TDD works (better than what people do by default) gets translated into "TDD doesn't work" (better than something that's carefully controlled to be exactly the same, except for the part that gets people to do the rest of it). And most of the people who hear that go back to writing a whole bunch of code and then testing it.
Because the real world doesn't control all the variables, so we often have to think about factors that the rigorous research is silent on because they can help or hinder us in achieving what the research says matters.
As a multi-decade practitioner and manager of software engineering teams, I would certainly be interested in what the best of the empirical research has to say. I think it's important to always remain open to good new ideas wherever they may come from—strong opinions, loosely held, as they say.
That said, I don't believe that statistically significant results can be found that will overturn my own instincts and judgement on any specific project to which I am dedicated. The reason for this is threefold: 1) the universe of software and goals we pursue with is astronomically large 2) competence in software engineering depends on the combination of personal aptitudes and mindsets combined with years of practice and 3) measuring outcomes in software engineering across diverse projects is all but impossible. In other words, you can't equate tools, you can't equate projects, and most of all you can't equate people.
At the end of the day, success in software engineering comes from relentless focus on the specific goals at hand. One must be inherently curious and have a craftsperson's mentality about acquiring technical skill, but never become religious about methodology. This requires continuous first-principles thinking targeted at specifics. At the end of the day, two expert practitioners could propose unorthodox and diametrically opposed approaches to the same problem, and they would still dramatically outperform a lesser skilled journeyman who attempted to follow every best practice.
Empirical studies and the scientific method in general work fantastically well for uncovering the rules and inner workings of the natural world, but software is the creation of logical systems purely by human minds which is an entirely different challenge—there's just not enough evidence to draw on. I suspect results will be at least a couple orders of magnitude softer than sociology, and that probably won't sit well with the type of personality attracted to software in the first place.
This was pretty much the word-by-word argument against using statistical approaches to price insurance of shipments over the sea back in the 1700s. Yet we all know how insurance premiums are calculates today, and there's a reason for it.
Are you saying merchants in the 1700s didn't believe insurance outcomes were quantifiable? Or are you saying that software engineering output is quantifiable? If the latter, maybe you might shed some light on how you think that would work, I'm happy to be proven wrong.
Yes!
Methodology learns from the experience of the experts, and tries to teach the techniques they know to beginners. It's very different from statistics.
Has there ever been an empirical software study, that would have a beautiful research plan, sound statistical analysis, large sample size, and that would also have been replicated? Even one?
As far as advocacy goes though - when someone is recommending what other people should do - I think it's very different if there is relevant evidence and it doesn't back up the advocated position. It's even more different if there is relevant evidence that positively undermines the advocated position. There are snake oil salesmen in this industry and some of them will call you names if you don't follow their pet process. But if what they're peddling isn't backed by the evidence or even contradicts the evidence then they should be called out and their audience should probably be sceptical about anything else those same salesmen are selling as well. The old joke about someone finding it hard to believe something when their continued employment depends on its falsehood is as relevant as ever.
I'm confident the answer to what follows from that is "nothing yet" based on various conference talks. Studying developers (or in a worse case students) writing software doesn't seem to be an effective way of working out how to write software better/faster/whatever.
Wikipedia: https://en.wikipedia.org/wiki/Empirical_software_engineering (ESE)
Popular: https://www.americanscientist.org/article/empirical-software...
Microsoft has a ESE group with some interesting publications. I had read/downloaded a couple a while ago which unfortunately i can't now remember nor locate. But this is a good starting point - https://www.microsoft.com/en-us/research/publication/belief-...
The example of TDD : I've done significant pieces of work both with and without TDD. In some scenarios its a huge impediment; the actual complexity of the internal software is fairly low and the but the testing complexity is high (many complex stateful dependencies that are hard to control). In that case, I spent 80% of my time writing the tests and far more bugs surfaced in the tests than the code itself.
Then in other scenarios there's high internal complexity, low external dependencies / complexity and it's pretty much a no-brainer, TDD is almost the only tractable way to write the code let alone an improvement.
Then it's very personal as well. One person will work well with TDD and another will struggle. Dumb things like, is your personal preference in development environment conducive to rapidly running and iterating on tests are probably going to dominate.
End result is, I think these studies just can't possibly control all the variables and this is why they either end up in invalid conclusions, too specialised conclusions, or, as Jimmy says, the more rigorous the study the less significant the result are.
However, TDD is pretty effective in working with chatGPT. I always tell it to write the tests first.
(This would grow your example to match your argument, whereas my initial post would shrink your argument to match your example.)
Of course if something has a huge impact on my productivity, I want to practice it. Even if it's not fun. There's a lot of denial embedded on this article.