«    »

Filter by Failure Mode Matrix: A Method for Planning Quality

For any software development effort a core component of planning how to achieve high quality is the selection of the quality-enhancing activities and practices that will be performed to assess the software. This selection depends on a number of factors including the capabilities of the team, the characteristics, complexity and criticality of the software, the level of quality desired, and the nature of the development effort being undertaken. Given this, selection of activities and practices cannot be a one-time event for the entire enterprise. While an enterprise can define a default selection as a starting point, this should be re-evaluated and customized for each application and each development effort.

One approach for doing this selection is a method I call the Filter by Failure Mode Matrix. The basic idea behind the matrix is to identify the activities and practices that will act as filters to either prevent or mitigate quality-related failure modes in the software development process. This is inspired by failure mode and effects analysis except it is applied to the development process itself rather than a piece of software. The categorization of failure modes in the matrix should be as broad as possible to minimize the size of the matrix, while remaining specific enough to effectively identify specific activities and practices that will be effective at preventing or mitigating each failure mode. Many of the failure modes correspond to defect root cause categories.

The matrix defines two sets of filters for each failure mode: primary filters and secondary filters. Primary filters consist of those activities or practices that the team will rely on as their first line of defense in preventing or mitigating the corresponding failure mode. Secondary filters act as a second line of defense for those issues that make it through the primary activities. The activities and practices assigned as a secondary filter are either less effective in preventing or mitigating that particular failure mode, cost more to perform in terms of budget or schedule compared to the primary filters, or are performed later in the overall development process compared to the primary activities. When the primary filters consist solely of activities performed by developers, then the secondary filter should include activities performed by non-developers in order to serve as an independent assessment of quality with respect to that particular failure mode. This is especially important for mission-critical and life-critical systems.

Multiple activities can be listed as primary filters or as secondary filters. The intent is not to require only a single activity per filter category. In fact, to obtain higher quality research tells us that multiple quality-enhancing activities are a necessity.

While most failure modes have general applicability, some are more contextually-dependent so you should ensure that the list of failure modes you use is meaningful to the particular software development effort you are undertaking.

Both the relevant failure modes and the effectiveness or applicability of quality-enhancing activities and practices can vary significantly based on the nature of the application functionality. For example, web-based user interface screens will experience usability defects and automated functional testing is more difficult to apply. In contrast a scheduled batch process does not need to worry about usability defects and automated functional testing is usually much easier to apply. So when a system combines multiple components that differ like this, this can be handled in one of two ways. First, a single matrix can be used with additional failure modes representing the two different components. Based on our above example, you could have a failure mode for web-based functionality defects and a second category for batch process-based functionality defects. This allows the latter category to specify automated functional tests as a primary filter. The second approach is to define separate matrices. This should only be done when there are significant differences between the matrices.

While a filter by failure mode matrix can be defined up front, you should expect to evolve it over time. Examples of how a matrix can evolve are:

  • A particular activity is found to be ineffective and is replaced with another activity in the matrix.
  • A new failure mode is identified based on defects that occur in user acceptance test or production. A new row is added for this failure mode with corresponding filters identified.
  • The overall quality level is discovered to be too low, so additional activities and practices are added throughout the matrix.

The following table provides a sample filter by failure mode matrix.

Failure Mode Primary Filters Secondary Filters
Coding logic errors Code review, Automated unit testing Functional testing, Domain testing (for boundary errors)
Integration / interface errors Clearly defined interfaces, Code review, Automated integration testing System testing
Usability problems Prototyping, Usability testing Frequent delivery with feedback, Manual functional and scenario testing
Regressions Pre-existing automated test suite Code review, Manual functional and scenario testing
Performance / capacity / scalability problems Performance, load, and stress testing Code review
Concurrency defects Code review -
Algorithm (design) errors Design review, code review Automated unit testing, Functional testing
Misunderstood requirements Acceptance criteria per requirement, Frequent communication with client Frequent delivery with feedback, Prototyping
Failure to perform quality-enhancing activities Culture of quality, Feature done checklist Pair programming, Test-driven development

If you find this article helpful, please make a donation.

One Comment on “Filter by Failure Mode Matrix: A Method for Planning Quality”

  1. Wow. When I was at UofT, we used to play a game of translating text books from academic into intelligent English. Usually, got one paragraph per page. I don’t know what your audience is, but your site made me want to do the fun again. Here’s the translation of your 1st two paragraphs for 2010-12-08.

    1. If you want to get quality software out of a development team, your going to have to put in some quality checking procedures. How much quality will depend on how much you want to spend. How much to spend, of course, depends on the product or project.

    2. Failures don’t get in at the end of a project, they creep in day by day and manifest later. They end up the root of a spiral of quality problems that become tough to find. Having a broad list, (follows), of failure modes will enable a quality assurance team to intercept bugs at conception.


«    »