3

For monitoring, I set up CloudWatch alerts based on CloudWatch anomaly detection. Overall, they work quite nicely, but they got confused when the clock gets shifted (summer timer to winter time).

We have now recurring false-positives even though the recent time shift in the US and in Europe were weeks ago. I assume CloudWatch will eventually understand it, but then again, the clock shifts have on a regular basis, which will create some noisy.

Unfortunately, I did not find much in the documentation, only that CloudWatch provides to set the Time Zone Format. Before I leaved it empty, but now I set it to UTC.

Questions:

  • What is exactly the effect of setting the time zone? Does it have an impact on the prediction of the anomaly detector, or is it only for visualization and for other configurations such as excluding specified time periods?
  • Our user base is all over the world, but mostly in the US and in Europe. As this scenario is not too uncommon, I wonder if asking for one one time zone is even practical. What should be used? UTC? Or pick the time zone with the most users?

I understand that it difficult to avoid false-positives completely, especially, in the first or second week after a clock shift happened. Still, I wonder how to correctly setup up the anomaly detection model, so its prediction will be most accurate.

Philipp Claßen
  • 1,675
  • 3
  • 18
  • 30

1 Answers1

1

My current understanding is that it is not possible to avoid false positives. At least not if traffic comes from different time zones around the world (as in the context of this question).

AWS operates on UTC times, but this is irrelevant for the anomaly detection. It does play a role if you set timers, but it is not taking into account for the predictions of AWS. In other words, switching to UTC will not make false-positives less likely.

When thinking about it, it is hard to avoid false-positives. If a time-shift in Europe or the US happens, you could take that into account and predict the new traffic. But that requires extra domain knowledge. The AWS predictor only sees a generic time series.

There are other events such as Christmas, which will impact the expected traffic. Again, with extra knowledge you could improve predictions. But the problem is too difficult for the AWS predictor, as it does not have that domain knowledge.

In our concrete case, we still kept the alerts, as they do have value even with the known false-positives. Humans with knowledge about the underlying system, can reject the false-positive. It is not ideal, but it is a trade-off that we are willing to make.

Note that other types of graphs are not susceptible for these types of problems. For example, error rates instead of total number of errors. It is not always possible, but if you can use rates, the system will be more stable in my experience. AWS allows to set alerts on math expressions such as errors / requests. The advantage is that normalized rates tend to be agnostic of external events.

Philipp Claßen
  • 1,675
  • 3
  • 18
  • 30