1

I was wondering how you handle issues like that in your company.

Usually when something goes wrong at a customer he needs to:

  1. Submit the issue he encountered
  2. Add logs / configuration data
  3. Suddenly the logs are not in high debug level - so he needs to increase debug level and reproduce the problem
  4. Then again, you can't find the issue, and you give him a debug version and ask him to reproduce again
  5. etc...

Collection needs to be from several machines most of the times.

This of course leaves a lot of frustration in both sides. I was wondering how do you handle stuff like that? (better bug reporting systems? recommended actions in bugs / etc?)

to add an example:

The application can be an enterprise one which is composed of several products in several machines (Let's say - Application tier, Database tier and some kind of MQ tier - 3 machines).

Now, when there's a bug, logs need to be collected from those 3 machines and then the investigation starts.

ArielB
  • 191

1 Answers1

1

One solution is to add an Analytics package to your product.

This will report back all the bugs and usage to you so that you can see how people are using your service and what bugs they are experiencing.

With a B2B style product you obviously have to be careful around what information you send back. You might want to take the "submit a bug report" option, where you prompt the user for permission before sending the logs back.


Heading off into general advice here. I know the kind of systems you mean but I'm not sure there is a proven method of dealing with those kind of 'why is this thing in state x' errors. Especially when the system is deployed to a customer's site!

  • Collate all the logs together with (eg) logstash or splunk.
  • Log performance metrics, statsd or similar
  • Add audit trails to objects that get passed around i.e., order accepted, sent to warehouse, item missing from pick list, emailed customer..
  • Keep the audit trail with the object.
  • Keep logs clean. I.e., if you are logging errors that aren't 'real' fix the error asap. If you see an error it should mean something is wrong.
  • Go wild with debug logging all over the place and leave it switched on.
Ewan
  • 83,178