«    »

Avoiding Outrage Over Outages

When outages - also known as service disruptions - happen to I.T. services, the response from the organization providing the service is the most important factor in determining the level of outrage felt by consumers of the service. Responses can be grouped into three main categories inspired by ITIL:

  1. Incident Communication: The information provided to consumers during the outage. Outrage is reduced by apologizing to consumers, acknowledging that the issue exists, confirming that staff are working on a resolution as a top priority, and communicating what steps have been and will be taken to mitigate or resolve the outage. For a very blunt discussion on how to word communications about outages, see this article by 37 Signals.
  2. Problem Communication: The information provided to consumers after the outage is resolved about the underlying problems or root causes of the outage. This communication should repeat the apology to consumers, explain why the outage occurred, and identify the actions underway to prevent such outages in the future. This helps reconfirm to consumers that the issue has been taken seriously, provides reassurance that it will not reoccur in the future, and helps bring a sense of closure regarding the incident. For an excellent example of this type of communication, see Amazon's communication regarding a data center outage.
  3. Problem Resolution: The actual resolution of underlying problems and causes of outages. This helps avoid what I call future outrage by improving the availability / reliability of services so that outages, and thus outrage, are less common. Skipping this step leads eventually to disillusionment and then rejection by consumers since in the long term they judge organizations by what they do rather than what they say.

This may sound easy to do, but it is amazing how many organizations fall dramatically short in one or more of these areas. I plan to write a follow-up post dissecting some case studies. In the meantime, feel free to post comments providing examples - both good and bad - of organizations dealing with outages.

If you find this article helpful, please make a donation.

«    »