Yesterday, my program worked. Today, it does not. Why? (1999) [pdf] (opens in new tab)

(st.cs.uni-saarland.de)

100 pointsmaemre8y ago40 comments

40 comments

30 comments · 9 top-level

warpech8y ago· 11 in thread

Reminds me of one of the nice features of Git: git bisect. Find the breaking commit in the smallest number of steps.

Some random blog post about it: http://webchick.net/node/99

The algorithm described in the article, if I'm reading it correctly, is more powerful than git bisect, because it can identify the exact change within a commit that causes the bug. You could for example decide to make each file modification in a commit a different 'Change', and so the algorithm would identify not only which commit breaks something, but which file changes are relevant.

alttab8y ago

This would be difficult to determine - as running tests without some changes in non-deterministic way could affect testing or the program's ability to run properly.

1 more reply

epmatsw8y ago

I think what blew my mind most was `git bisect run`. Write a test case, go get coffee, have git spit out the offending commit automatically. So cool.

hacker_98y ago

Used this the other day! Had to track down a visual bug that was introduced in the last 20 commits, but with binary search I could narrow down on the problem a commit at a time and so only had to look at 4 in total.

pmarreck8y ago

how am I not surprised that this solved a regression that occurred at some point in the past on a Drupal project

I worked on a Drupal project... Once.

sethrin8y ago

I believe Joomla is worse in every conceivable way. And then of course there is WordPress, which I sincerely hope is the last major procedurally-written codebase. PHP has come a long way as a language (it can almost be confused for something sensible these days) but lots of issues with large libraries.

1 more reply

azhenley8y ago

This is awesome. Didn't know about it, thanks!

My approach has been to step exponentially. I go back 1 commit, 2 commits, 4 commits... etc. I'll be using bisect from now on though.

pishpash8y ago

You may be implementing an intuitive weighted bisect :)

maccard8y ago

Heh, I do a binary search. Find last known good build, go half way between, etc

3 more replies

devrandomguy8y ago

Git bisect is awesome. Before I discovered it, I was manually bisecting the codebase, and stubbing out the missing half.

adrianratnapala8y ago

Not to take away from git bisect (or hg bisect, which is no more broken than the rest of mercurial) -- I don't think it is an alternative to the thing you used to do.

git bisect is "temporal bisection", i.e. you delete half of a time interval. The manual bisect is kinda-sorta "spatial" in the sense that you delete half a region of code. the two things work by similar principles, and their problem domains overlap -- but they are not the same.

1 more reply

atonse8y ago· 6 in thread

We once had a bug that would cause a failure in our test suite, but very rarely. It was so rare that it would just "go away" and not show up for weeks. So we thought it was odd (maybe bad test data, etc) but moved on.

When we decided to look into it when we had some downtime, we found out that it was a date bug that only reared its head on the 31st day of the month, hence only happening every two months.

I'm not sure bisect would've helped but still was really funny when we did find the issue.

brainfire8y ago

Related bug- OpenOffice can't print on Tuesdays: https://bugs.launchpad.net/ubuntu/+source/cupsys/+bug/255161...

Previous HN thread: https://news.ycombinator.com/item?id=8171956

adrianratnapala8y ago

Thanks for the anecdote, because it gives me a new idea.

We should be monitoring the rate of test failures -- per test -- in a timeseries and looking for periodicities. Just graphing them and seeing regular spikes will help, but having real numerics looking for approximate periodicities will catch this sort of thing (after a few months!).

bArray8y ago

Unless you are writing date specific code or have seen this bug arise, I probably wouldn't bother. Time bugs will likely creep in regardless, for example how would you find a bug that happens one month a year? How would you find a bug once every four years? Sometimes human knowledge/experience can't be tested out of a system. All test failures probably warrant investigation of some type.

But, as a general rule, I would try to brute force or fuzz your "inputs", a.k.a data out of your control. Random numbers, dates, user input, network packets, etc. Users and environments often act in unexpected ways and you want to make sure your bug finding isn't reduced to accidentally be working on the right day! :)

michaelcampbell8y ago

Was this the kind of test that only ran once a day?

atonse8y ago

It wasn't, but it was the sort of thing where we never modified that test or even a related area, so it just wouldn't make sense that the test would fail.

But the bigger issue was that since it would "fix itself" we never prioritized it and just left it alone.

But looking back, the system (automated tests) was functioning as designed. :) We just changed our "DateTime.now" to concrete dates.

scorcher8y ago

Thats where property based tests come in very useful

candiodari8y ago· 2 in thread

TLDR of the article: cross-application interfaces (like GDB's cli interface has effectively become) break with version changes. Users don't care about the reasons behind that breakage, and would like to see those treated as bugs. Developers see that attitude as unworkable, and would rather just not care about external interfaces at all.

I always wonder why we gave up on refactoring instead of switching it along so that cross-program changes would become not just possible, but easy and feasible.

What can be done using alt-shift-r in a distributed large scale java app with a GWT frontend and 50 libraries used on the backend, each of which can run distributed across machines ... is baffling.

It shouldn't be !

And now with microservices, it becomes more and more baffling that this worked at one point. That's just horrible. Most of the complaints about it strike me as being complaints that when a toolchain allows for complexity, people make things complex. And of course, that's fair. If you give a 5 year old a laser CNC machine and he finds how to turn it on, things are going to go south quickly. But damn the things you can do with that !

infinity08y ago

That is not what the paper is about. Stop pushing your own agenda and actually read the thing.

dvfjsdhgfv8y ago

You definitely have a point (although what you say is only marginally related to the article). However, increasing complexity is inevitable in all aspects, therefore we should be developing tools to manage all this complexity instead of just deciding that "we shouldn't" (which is also an option, see e.g. the recent unikernels trend.)

kossmoboleat8y ago· 1 in thread

I took several courses with professor Zeller and with post-docs at his chair. We applied the described concept of "delta debugging" to finding programming errors in python programs.

Suprisingly it was possible to implement this in the course although one had to fiddle so much with Python's internals.

Zeller's chair homepage has a lot of additional infos. Here's a page on how the idea evolved after the initial paper in 1999: https://www.st.cs.uni-saarland.de/dd/

sn98y ago

He actually has a course on debugging on Udacity.

ericfrederich8y ago· 1 in thread

The GDB people have done it again. The new release 4.17 of the GNU debugger [6] brings several new features, languages, and platforms, but for some reason, it no longer integrates properly with my graphical front-end DDD [10]: the arguments specified within DDD are not passed to the debugged program. Something has changed within GDB such that it no longer works for me. Something? Between the 4.16 and 4.17 releases, no less than 178,000 lines have changed. How can I isolate the change that caused the failure and make GDB work again?

I guess this was before Git. I love git bisect

khedoros18y ago

I couldn't quickly find the 4.17 release, but gdb 4.17.0.4 was released at the end of June 1998. The release notes mentioned Linux kernels in the 2.0-2.1 range.

dsw1088y ago

Here's an implementation of a modified version of his delta debugging algorithm we did for minimizing file inputs that induce a failure in a compiler or static analysis tool: http://delta.tigris.org/

Also works for other kinds of files of course. Our version is easily seen as a kind of simulated annealing, with the input providing sufficient randomness that we did not need to add any internally.

Fun story: this was my first open source project. I had always wanted to be an open source programmer, but the precipitating cause of me releasing delta was that Microsoft Research asked me to. So back when Microsoft was saying that open source is a "cancer" I became an open source programmer because Microsoft asked me to.

T_D_K8y ago

Why does the article say there are 2^N different possible configurations? Shouldn't it be N! ? Maybe I'm missing something, but I don't see how they can rule out so many different orderings of the "configurations", since he started out by saying that the ordering wasn't reliable.

Edited: I figured it out. It's because they essentially do a binary search of the changes to find the issue, I should have read a bit further before commenting. The tables / examples cleared it up pretty quickly.

known8y ago

Blue screen of death? https://en.wikipedia.org/wiki/Blue_Screen_of_Death

throwawaybbq18y ago

If it was iOS, I'd say the provisioning profile expired :-p

j / k navigate · click thread line to collapse

40 comments

30 comments · 9 top-level

warpech8y ago· 11 in thread

Reminds me of one of the nice features of Git: git bisect. Find the breaking commit in the smallest number of steps.

Some random blog post about it: http://webchick.net/node/99

Anderkent8y ago

alttab8y ago

This would be difficult to determine - as running tests without some changes in non-deterministic way could affect testing or the program's ability to run properly.

1 more reply

epmatsw8y ago

I think what blew my mind most was `git bisect run`. Write a test case, go get coffee, have git spit out the offending commit automatically. So cool.

hacker_98y ago

pmarreck8y ago

how am I not surprised that this solved a regression that occurred at some point in the past on a Drupal project

I worked on a Drupal project... Once.

sethrin8y ago

1 more reply

azhenley8y ago

This is awesome. Didn't know about it, thanks!

My approach has been to step exponentially. I go back 1 commit, 2 commits, 4 commits... etc. I'll be using bisect from now on though.

pishpash8y ago

You may be implementing an intuitive weighted bisect :)

maccard8y ago

Heh, I do a binary search. Find last known good build, go half way between, etc

3 more replies

devrandomguy8y ago

Git bisect is awesome. Before I discovered it, I was manually bisecting the codebase, and stubbing out the missing half.

adrianratnapala8y ago

Not to take away from git bisect (or hg bisect, which is no more broken than the rest of mercurial) -- I don't think it is an alternative to the thing you used to do.

1 more reply

atonse8y ago· 6 in thread

When we decided to look into it when we had some downtime, we found out that it was a date bug that only reared its head on the 31st day of the month, hence only happening every two months.

I'm not sure bisect would've helped but still was really funny when we did find the issue.

brainfire8y ago

Related bug- OpenOffice can't print on Tuesdays: https://bugs.launchpad.net/ubuntu/+source/cupsys/+bug/255161...

Previous HN thread: https://news.ycombinator.com/item?id=8171956

adrianratnapala8y ago

Thanks for the anecdote, because it gives me a new idea.

bArray8y ago

michaelcampbell8y ago

Was this the kind of test that only ran once a day?

atonse8y ago

It wasn't, but it was the sort of thing where we never modified that test or even a related area, so it just wouldn't make sense that the test would fail.

But the bigger issue was that since it would "fix itself" we never prioritized it and just left it alone.

But looking back, the system (automated tests) was functioning as designed. :) We just changed our "DateTime.now" to concrete dates.

scorcher8y ago

Thats where property based tests come in very useful

candiodari8y ago· 2 in thread

I always wonder why we gave up on refactoring instead of switching it along so that cross-program changes would become not just possible, but easy and feasible.

What can be done using alt-shift-r in a distributed large scale java app with a GWT frontend and 50 libraries used on the backend, each of which can run distributed across machines ... is baffling.

It shouldn't be !

infinity08y ago

That is not what the paper is about. Stop pushing your own agenda and actually read the thing.

dvfjsdhgfv8y ago

kossmoboleat8y ago· 1 in thread

I took several courses with professor Zeller and with post-docs at his chair. We applied the described concept of "delta debugging" to finding programming errors in python programs.

Suprisingly it was possible to implement this in the course although one had to fiddle so much with Python's internals.

Zeller's chair homepage has a lot of additional infos. Here's a page on how the idea evolved after the initial paper in 1999: https://www.st.cs.uni-saarland.de/dd/

sn98y ago

He actually has a course on debugging on Udacity.

ericfrederich8y ago· 1 in thread

I guess this was before Git. I love git bisect

khedoros18y ago

I couldn't quickly find the 4.17 release, but gdb 4.17.0.4 was released at the end of June 1998. The release notes mentioned Linux kernels in the 2.0-2.1 range.

dsw1088y ago

Here's an implementation of a modified version of his delta debugging algorithm we did for minimizing file inputs that induce a failure in a compiler or static analysis tool: http://delta.tigris.org/

Also works for other kinds of files of course. Our version is easily seen as a kind of simulated annealing, with the input providing sufficient randomness that we did not need to add any internally.

T_D_K8y ago

known8y ago

Blue screen of death? https://en.wikipedia.org/wiki/Blue_Screen_of_Death

throwawaybbq18y ago

If it was iOS, I'd say the provisioning profile expired :-p

j / k navigate · click thread line to collapse