«    »

Applying Medical Near Misses to I.T.

I forget from which source I first heard about the term “near miss” in health care, but I found the concept intriguing. A near miss is a problem with the safe delivery of care that did not actually affect the patient. Here is an example: a pharmacist in a hospital misreads the doctor’s diagnosis and provides the incorrect medication, but the administering nurse notices the mistake before the patient has taken the drugs.

So if the patient is still safe, why are near misses a big deal? I read about one study that looked at the rate of reported near misses between hospitals and compared it to the mortality rate (patient deaths). If you were told one hospital had a much higher rate of near misses than a second, which would you pick? The surprising answer from the study was that hospitals with higher levels of reported near misses were much safer, with a significantly lower rate of mortality. How could this be? The answer highlights why near misses are important. The hospitals reporting higher levels of near misses took them seriously and encouraged the reporting of them so that staff could learn from their mistakes. The goal was to make safety improvement activities proactive rather than reactive. Management of hospitals with low levels of near misses applied punitive measures, either official or unofficial, against those reporting near misses. So the low levels actually reflected an under-reporting of near misses. Pushing near misses out of sight hampered ongoing learning and development of more effective practices.

The good news for health care is that the medical establishment has recognized the importance of near misses. I read with interest a recent post http://geekdoctor.blogspot.ca/2012/05/patient-safety-organization-common.html from John D. Halamka, CIO of Beth Israel Deaconess Medical Center that discusses standardized terminology concerning patient safety events for reporting to patient safety organizations:

  • Unsafe Condition: Any circumstance that increases the probability of a patient safety event.
  • Near Miss: A patient safety event that did not reach the patient.
  • Incident: A patient safety event that reached the patient, whether or not the patient was harmed.

These terms establish a clear taxonomy of issues ordered by criticality that provides a basis for reporting and dealing with such issues.

So how is this relevant to software development and I.T.? The principles behind the concept of near misses applies to any knowledge-based organization. For software development, the equivalent to patient safety events are defects in the software. We can define a taxonomy for defects similar to the patient-safety-event taxonomy above:

  • Unsafe Condition: Any circumstance that increases the probability of a defect.
  • Near Miss: A defect that did not reach production.
  • Incident: A production defect, whether or not end users were impacted.

A similar taxonomy applies to I.T. operations with regards to unplanned outages or other operational issues:

  • Unsafe Condition: Any circumstance that increases the probability of an outage.
  • Near Miss: An outage in an environment other than production.
  • Incident: A production outage, whether or not end users were impacted.

Addressing the issues classified by these taxonomies, especially the near misses and unsafe conditions, should reduce the number of incidents. Near misses essentially function as an early-warning / early-detection system for problems that could potentially happen in production. The key is to take action to address them, rather than ignore or hide them.

If you find this article helpful, please make a donation.

«    »