[[ Based on tags and the wording this sounds like a PagerDuty-specific question, but it could apply to any tech or monitoring stack in my mind.... PagerDuty is just so dominant in SF-based jobs I've seen that is like tech Kleenex.... ]]
I've been searching in my spare time for a few months for an answer to: what do you tell a new recruit to pager duty (or PagerDuty or pagerduty)??? And I don't mainly mere "install the app and prey" training, but those are key parts of it. I'm more looking for generic, general, globalizable advice like:
- ask for help as early as possible
- don't go sleep deprived "too" many nights in a row
~3 - know who is likely to answer with low latency and who will notice work again at Monday 9am
DOH! - it is easier to find answer on StackExchange than random forums at 3am
free plug - triage: how big of a fire is it? how many fire trucks do you call in? Etc.
- when to merge incidents
- when to snooze incidents... but that is so scary!! "You don't get fired for snoozing here?"
hahaha. - good notification settings
- good phone/volume settings for getting woken up, even if you sleep like a log.
- scheduling and calendar layers
- rules of thumb and recommended tools
- know who to call for what subsystem (varies based on locality and org) but using a wiki or Confluence is often part of that
Maybe I'm searching wrong. I'm open to meta-suggestions for better ways to find this needle in the haystack. Searching for "starting pagerduty" makes sense to me, but it gets results of how to setup these things programmatically or technically, not how to instruct the humans in dealing with sorts of things. PagerDuty does try to address this, but it doesn't include the breadth of responsibilities that the job entails to me.
I also admit that I'm asking for everything and the kitchen sink and the city water supply for a bonus round. Getting little pieces of this as links and pointing folks at it would be a huge improvement over what I've found so far. I doubt anybody has done all of this in one doc or would ever want to do so.
I've actually already turned this question into an internal class. I'm happy to get permission from management to share my fun version of this after I pull out the proprietary bits. I'm still holding out for it having been done at some point before and I'm just too lazy to go that far into the search results. :)