«    »

Analyzing System Dependencies

Analyzing dependencies might seem like a simple concept, but recently I have come to appreciate even more the multiple layers to this topic. To start, why as I.T. professionals should we care about dependencies between and within systems? The core reason is that understanding dependencies allows us to determine the impacts of change. This aids us in more effectively making the current change and also helps us minimize or manage dependencies to make future changes easier.

Uses of Dependency Analysis

The act of analyzing system dependencies is relevant for most I.T. roles:

  • Architects need to guide the development and evolution of systems and components based on how they are used from both a technical and business standpoint within the organization. Retirement of legacy assets is in particular driven by analyzing the remaining dependencies on these assets.
  • Business analysts need to know how a change to business policy or rules affecting one part of a system might affect other, seemingly-unrelated parts of the system.
  • Developers need to create, change, and reuse classes and components which all requires an understanding of how the code in question is used by the rest of the system. High coupling - code with lots of dependencies - contributes to a system that is harder to change.
  • Testers need to understand the impacts of a change to a system in order to validate that the system remains fully functional.
  • Database administrators need to understand dependencies in order to successfully make revisions to a database schema such as changing tables or columns without negatively impacting existing database objects or existing data.
  • Operations staff need to be able to stop and restart processes in the correct sequence for application deployments, middleware upgrades, and recovery from failures.

Modelling Dependencies

Mathematically dependencies can be modeled as a directed graph: each node represents an element of the system such as a process, component, or class, and each directed edge represents a dependency that the first node has on the second node.

In the diagram node A depends directly on nodes B and C, and has an indirect, transitive dependence on node D. Nodes B and D have no dependencies.

A change to one node has a potential ripple effect throughout the subset of the graph linked to this node. Depending on the type of change, this impact might go in both directions. For example, a business change to one module might require new information to be provided to it, which then means revising the upstream modules that it depends on in addition to potentially altering the downstream modules. Cyclical dependencies are visualized as cycles in the graph and are typically considered problematic.

Using directed graphs to model dependencies is a simple concept, but putting it into practice is surprisingly difficult because of the wide variety of options for what the nodes and edges of the graph represent. So when you start your analysis, it is important to be crystal clear regarding the purpose or objective behind analyzing dependencies and then produce clear definitions for the nodes and edges of your dependency graph. Sometimes you will find that you need to model multiple types of dependencies for the same set of nodes.

To help illustrate these concepts I will move out of the realm of the theoretical and discuss common instances of dependencies within I.T. systems for code, libraries, and processes.

Code Dependencies

When it comes to code, dependencies are typically examined between packages or classes within an application as part of evaluating the quality of the application's architecture and design. (I have written previously about visualizing Java package dependencies.) Applicable to the package level is the dependency inversion principle (DIP), one of the design principles espoused by Robert C. Martin in his book Agile Software Development: Principles, Patterns, and Practices. At the class level we can calculate an instability metric based on afferent versus effluent coupling which when graphed over the code base provides an indication of how how easily the application can be changed.

Both these types of analysis look at static dependencies explicitly listed in the code. For example, if class A uses class B as a method argument, field, super-class, or local variable, then A has a static dependency on class B.

But there is another type of code dependency: dynamic or runtime dependencies. Object factories and dependency injection frameworks help reduce coupling and apply the dependency inversion principle by replacing compile-time dependencies on concrete classes with runtime dependencies. So returning to our example, class A can have a static dependency on interface C, which at runtime is populated with an instance of class B that class A then uses. So while A still ends up using class B, the code for class A does not explicitly know about class B. This allows class B to be changed independently of class A, as long as it complies with the syntax and semantics expected of interface C.

Library Dependencies

Reuse of code via third-party libraries has become an essential aspect of software development. It is not unusual for a typical web application to depend on a dozen or more libraries covering functionality like logging, database access, dependency injection, and web frameworks. These libraries, in turn, often have dependencies of their own on third-party libraries. To manage these complexities, dependency management tools and artifact repositories have arisen. Together, these allow an an application to specify its direct dependencies. The dependency tool then uses the artifact repository to recursively determine what each dependent library in turn depends on (transitive dependencies) and assembles the complete list of all libraries required by the application. In the Java space, Maven is one of the most popular of these tools and the first to really popularize the concept.

Just like for code dependencies, there are multiple types of library dependencies (which Maven refers to as scope). The three most common types are:

  • Compile: The application requires the library in order to compile production code.
  • Runtime: The application requires the library when it is executing, but it is not required to compile.
  • Test: The application requires the library for compiling or running automated tests - it is not required to be packaged with the application installation.

Process Dependencies

When servers or processes fail or have to be restarted, understanding the dependencies between processes becomes critical. However, in a complex enterprise I.T. environment, it is surprising just how difficult an undertaking this issue, largely because of the different types of dependencies that exist operationally. Process dependencies can be categorized along more than one dimension.

For the first dimension, dependencies can be categorized as business-meaningful services such as a payment processor or customer identity manager, middleware services such as a message queue or database, and infrastructure such as network and storage. Not only does this categorization affect the types and symptoms of issues that arise from failure, it also tends to influence the areas of knowledge held by different people. The infrastructure staff who know the details about networking and storage are unlikely to know anything about business-meaningful services, and vice-versa.

For the second dimension, dependencies can be categorized based on whether they are required to be available when the process in question is started. There can be both technical or business reasons why a dependency must be available. For example, most applications are designed to require their core database to be available at startup - this is a technical limitation. An application might make use of a payment service that is not technically required at startup, but if for business reasons the application is essentially useless without payments (e.g. no customers can complete orders), then this should still be considered a startup dependency.

As a third dimension, dependencies can be categorized based on their impact when they fail, which can be different in nature than the prior dimension. For example, a reference data cache service that must be available at startup for an application to load an in-memory cache might cause no impact if it goes down once the application is running. Impacts from the failure of dependencies can range from no impact, to partial impact, to in the worst case causing a complete outage of the dependent application.

If you find this article helpful, please make a donation.

«    »