19

I've gotten a bug report from one of my users in a section of the software. The scenario is basically a databinding scenario where the user enters info, and that info is printed to pdf.

The problem is, that the functionality:

  • Is used frequently (about 40 times a week)
  • Hasn't been updated/modified in months
  • The area of code is relatively simple to walk through
  • The validation appears fine (ie, if the information wasn't filled out in the app, validation fires indicating it with a msgbox before the pdf is generated)

But this one user claims that in the past 2 weeks it's happened about 3 times out of 50 and I just can't reproduce it.

So what do you do in the case of a heisenbug?

Steven Evers
  • 28,180

6 Answers6

22

Add some logging to this users code.

kasterma
  • 432
13

I've seen things like this on an embedded system take 6 months to find. Really frustrating.

However in desktop land, it's amazing what happens if you go an actually watch what the user does. They may be doing things in some order / manner that had not been expected and this in turn causes the trouble.

quickly_now
  • 15,060
12

depending on your situation you might have success with:

  • Monitor the user's machine (perfmon, eventlog, etc)
  • Monitor the user (sit with them until they have the issue again)
  • Replace the user's machine temporarily (get them on another desktop to see if it is a hardware/os thing)

kasterma's suggestion of logging is still good, give them a debug build or use injectable logging if the full deploy is too troublesome.

Bill
  • 8,380
7

This is often caused by concurrent processes (not OS-level processes, just... general things that happen in your application: events, threads, input/output etc.) which both affect the rendering in some way. This leads to different behaviour depending on the order in which they happen, and debugging and breaking often interferes with that.

One good strategy is to replace stepping through the debugger with more logging - this takes much less time and therefore is much more likely to leave things as they are while still giving you more information.

Ultimately, though, nothing beats understanding what the system actually does. Is there one component, and one only, which should be responsible for maintaining the state of the UI? (Usually there should.) If so, why is it getting inconsistent commands in the first place? Obviously, logging can often help answering these questions as well.

Kilian Foth
  • 110,899
5

The best thing to do is to add logging and try to catch it in the act. If that is impractical, then the only thing left is to do a very thorough code review. Going through the change log would be a reasonable way to begin.

Dima
  • 11,852
4

Check the hardware.

Run a memory test on the machine showing the problem. Run a heavy CPU load and verify it. Something like Prime95.

Hardware isn't perfect and if the hardware is bad a programmer can waste a lot of time looking for bugs that just don't exist.

Zan Lynx
  • 1,300