<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Basil Vandegriend: Professional Software Development &#187; software development</title>
	<atom:link href="http://www.basilv.com/psd/blog/tag/software-development/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.basilv.com/psd</link>
	<description></description>
	<lastBuildDate>Wed, 25 Jan 2012 13:23:47 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>The Shocking Truth about Agile and Waterfall</title>
		<link>http://www.basilv.com/psd/blog/2011/the-shocking-truth-about-agile-and-waterfall</link>
		<comments>http://www.basilv.com/psd/blog/2011/the-shocking-truth-about-agile-and-waterfall#comments</comments>
		<pubDate>Mon, 14 Nov 2011 13:02:28 +0000</pubDate>
		<dc:creator>Basil Vandegriend</dc:creator>
				<category><![CDATA[agile]]></category>
		<category><![CDATA[process]]></category>
		<category><![CDATA[software development]]></category>
		<category><![CDATA[waterfall]]></category>

		<guid isPermaLink="false">http://www.basilv.com/psd/?p=741</guid>
		<description><![CDATA[There is a common perception within I.T. that Agile methods are recent innovations - the new kids on the block - and they are contrasted with the traditional waterfall approach - the old-timer that has been around for ages. This perception is propagated by events such as the widely-discussed 10-year anniversary of the agile manifesto [...]]]></description>
			<content:encoded><![CDATA[<p>There is a common perception within I.T. that Agile methods are recent innovations - the new kids on the block - and they are contrasted with the traditional waterfall approach - the old-timer that has been around for ages. This perception is propagated by events such as the widely-discussed 10-year anniversary of the <a href="http://agilemanifesto.org/">agile manifesto</a> and by the ongoing challenge of the "old guard" - either publicly or within organizations - of the effectiveness of agile versus waterfall. I recently read two articles, however, that convincingly shatter these misperceptions and lay bare the shocking truth.</p>
<p>While the term agile itself is indeed ten years old, the philosophy and approach of iterative and incremental development (IID) has a surprisingly rich and extensive history. The article <a href="http://www.craiglarman.com/wiki/downloads/misc/history-of-iterative-larman-and-basili-ieee-computer.pdf">Iterative and Incremental Development: A Brief History (PDF)</a> by Craig Larman and Victor R. Basili published in IEEE Computer discusses how the ideas of IID actually predate the existence of software, and have been used and promoted in <em>every</em> decade since for over 50 years. This article also discusses a classic 1970 article by Winston Royce well-known for supposedly promoting the use of waterfall, but which actually recommended the development of an initial pilot or preliminary version prior to creating the final version intended for delivery to the client. Larman's and Basili's article also discusses the ongoing evolution of the standard used by the U.S. Department of Defense for software development. The initial standard was document-heavy, gated, single-pass waterfall, but the high rate of project failures led to first the allowance of and eventually the full adoption of IID approaches. </p>
<p>So a limited set of organizations (often governments due to their byzantine contracting restrictions) did experience an evolution from waterfall to agile over time, but notably they started with a misunderstood version of waterfall. The use of IID approaches has always been part of the I.T. industry. </p>
<p>What about the effectiveness of agile? The second article <a href="http://searchsoftwarequality.techtarget.com/news/2240106479/Quality-metrics-The-economics-of-software-quality-Part-One">Quality metrics: Software quality attributes and their rankings</a> is an interview with Capers Jones and Olivier Bonsignour regarding their new book <a href="http://www.amazon.ca/gp/product/0132582201/ref=as_li_tf_tl?ie=UTF8&#038;tag=basilvandegri-20&#038;linkCode=as2&#038;camp=15121&#038;creative=330641&#038;creativeASIN=0132582201">The Economics of Software Quality</a><img src="http://www.assoc-amazon.ca/e/ir?t=basilvandegri-20&#038;l=as2&#038;o=15&#038;a=0132582201" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" />. The authors in this book discuss the effectiveness of over 100 quality factors using research based on over 10,000 software projects. On a scale of -10 for extremely harmful to a scale of +10 for extremely valuable, Agile methods rated a 9 - highly valuable - while waterfall only rated a 1 - barely useful. </p>
<p>These two articles highlight the fact that agile-like methods have been in use since the start of the software field, and are on average far more effective than the waterfall approach. This leads me to conclude that there is really no defensible reason for organizations to mandate or promote the use of waterfall.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.basilv.com/psd/blog/2011/the-shocking-truth-about-agile-and-waterfall/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Getting Started with WebSphere Configuration Scripting</title>
		<link>http://www.basilv.com/psd/blog/2011/getting-started-with-websphere-configuration-scripting</link>
		<comments>http://www.basilv.com/psd/blog/2011/getting-started-with-websphere-configuration-scripting#comments</comments>
		<pubDate>Mon, 24 Oct 2011 12:54:58 +0000</pubDate>
		<dc:creator>Basil Vandegriend</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[deploy]]></category>
		<category><![CDATA[Jython]]></category>
		<category><![CDATA[software development]]></category>
		<category><![CDATA[WebSphere]]></category>

		<guid isPermaLink="false">http://www.basilv.com/psd/?p=723</guid>
		<description><![CDATA[Deploying Java EE applications into a WebSphere application server typically requires configuration within WebSphere of settings such as data sources, thread pool sizes, and maximum heap size. The WebSphere Administration Console provides a graphical user interface for easily doing this setup, but the fatal flaw of this approach is that it is manual - repeating [...]]]></description>
			<content:encoded><![CDATA[<p>Deploying Java EE applications into a <a href="http://www.ibm.com/software/webservers/appserv/was/">WebSphere application server</a> typically requires configuration within WebSphere of settings such as data sources, thread pool sizes, and <a href="http://www.basilv.com/psd/blog/2011/how-to-determine-maximum-heap-size">maximum heap size</a>. The WebSphere Administration Console provides a graphical user interface for easily doing this setup, but the fatal flaw of this approach is that it is manual - repeating the same setup in other environments is potentially error-prone extra manual effort. A better approach is to automate this configuration through scripting.</p>
<h3>Tooling</h3>
<p>WebSphere provides the wsadmin command-line interface for interfacing with WebSphere servers. Two scripting languages can be used: JACL (the default) and Jython, a Java-based version of Python. Invoke wsadmin with the argument "-lang jython" to use Jython. I recommend the use of Jython for two reasons. First, it is a more mainstream language. Second, the WebSphere Admin Console shows scripting commands in Jython equivalent to the GUI operations you last performed. See the screenshot below for an example:</p>
<p><a href="http://www.basilv.com/psd/wp-content/uploads/2011/10/WAS-Admin-Console.png"><img src="http://www.basilv.com/psd/wp-content/uploads/2011/10/WAS-Admin-Console.png" alt="WAS Admin Console showing Jython command equivalent to last action" width="550" height="355" class="alignnone size-full wp-image-725"/></a></p>
<p>You may think that since the Admin Console provides equivalent scripting commands to GUI actions that you could just copy and paste these commands into a script and you would be set. Unfortunately if you want the script to be re-runnable (on a given server) or reusable (on other servers), you will almost always need to make modifications. Certain operations, like creating a data source, will fail if there is an existing data source with the same name. So to have a re-runnable script you will need to add logic to your script to first detect if the data source you want to create already exists. Many commands use a hard-coded <em>configuration id</em> to refer to a particular item to be configured. This id generally consists of the item's name, directory path to the configuration file, name of the XML configuration file, and XML ID of the item within the file. While the first three are constant, the XML ID appears to be a numerically generated ID that will vary between servers. So you need to change the script to look up the configuration id.</p>
<p>The changes required to adapt the script for broader use are not trivial and require use of WebSphere APIs - in particular the <code>AdminConfig</code> and <code>AdminTask</code> objects provided to Jython scripts that are executed within wsadmin. The best tooling I have found for writing these scripts is to use Rational Application Developer which includes a Jython editor that includes wsadmin objects as part of the content assist. See the screenshot below for an example:<br />
<a href="http://www.basilv.com/psd/wp-content/uploads/2011/10/RAD-Jython-Editor.png"><img src="http://www.basilv.com/psd/wp-content/uploads/2011/10/RAD-Jython-Editor.png" alt="RAD Jython editor showing content assist" title="RAD Jython Editor" width="550" height="253" class="alignnone size-full wp-image-726" /></a></p>
<h3>Examples</h3>
<p>With the tooling in place let us look at some examples. The following scripts all assume a simple WebSphere topology of a single node with a single server that we are setting up - no clusters or multiple servers to worry about.<br />
A number of settings require the configuration id or name of the server or node, so here's how to look them up:</p>
<pre class="prettyprint">
nodeConfigId = AdminConfig.getid('/Node:/')
nodeName = AdminConfig.showAttribute(nodeConfigId, 'name')
serverConfigId = AdminConfig.getid('/Server:/')
serverName = AdminConfig.showAttribute(serverConfigId, 'name')
</pre>
<p><code>AdminConfig.getid()</code> is one of the primary methods to use to look up configuration ids. But what is that <code>"/Server:/"</code> syntax used as an argument? That is called a <em>containment path</em>, which is a XPath-like syntax for looking up configuration ids. For more important details on containment paths and configuration ids read this <a href="http://blog.xebia.com/2009/11/23/websphere-scripting-with-wsadmin-containment-paths-configuration-ids-and-object-names/">excellent article by Vincent Partington</a>.<br />
As mentioned above, this only works when there is a single server defined on the node - with multiple servers, the call <code>AdminConfig.getid('/Server:/')</code> would return multiple server config ids separated by newlines. In this case if you want to configure a specific server you can look up the server configuration id by name as follows:</p>
<pre class="prettyprint">
serverName = "testserver1"
serverConfigId = AdminConfig.getid('/Server:' + serverName + '/')
</pre>
<p>Having a configuration id is a start, but we still have not changed any settings. So here's a basic example of setting the maximum heap size to 1 GB:</p>
<pre class="prettyprint">
maxHeapMb = 1024
jvmConfigId=AdminConfig.getid('/JavaVirtualMachine:/')
AdminConfig.modify(jvmConfigId, '[ [maximumHeapSize "' + maxHeapMb + '"]  ]')
</pre>
<p>This follows the standard pattern of first obtaining the configuration id and then changing the setting. I obtained the code for the <code>AdminConfig.modify</code> call from the Admin Console. But how did I figure out the AdminConfig call, in particular which containment path to use? This was painful initially to figure out. The hard-coded configuration id provided within the <code>AdminConfig.modify</code> call was "(cells/Node1Cell/nodes/Node1/servers/Server1|server.xml#JavaVirtualMachine_1183121908656)". The prefix "JavaVirtualMachine" on the XML id is the key type to use in the containment path.</p>
<p>Not all configuration commands require a configuration id. The <code>AdminTask</code> object typically takes arguments specifying the server and node name. Here's an example that prevents applications from accessing internal WebSphere classes:</p>
<pre class="prettyprint">
AdminTask.setJVMProperties('[-serverName ' + serverName +
' -nodeName ' + nodeName + ' -internalClassAccessMode RESTRICT]')
</pre>
<p>The last step in any configuration script is to save the configuration changes. This is straightforward:</p>
<pre class="prettyprint">
AdminConfig.save()
</pre>
<h3>Resources</h3>
<p>Here are some useful resources for this topic:</p>
<ul>
<li><a href="http://blog.xebia.com/2009/11/23/websphere-scripting-with-wsadmin-containment-paths-configuration-ids-and-object-names/">Excellent article on the differences between configuration ids, containment paths and object names for WebSphere scripting.</a></li>
<li><a href="http://www.ibm.com/developerworks/websphere/library/techarticles/1004_gibson/1004_gibson.html">IBM article introducing scripting WebSphere using Jython</a></li>
<li><a href="http://publib.boulder.ibm.com/infocenter/wasinfo/v7r0/index.jsp?topic=%2Fcom.ibm.websphere.express.doc%2Finfo%2Fexp%2Fae%2Frxml_7libserver.html">Documentation on the WebSphere 7 Express sample scripts library</a></li>
<li><a href="http://publib.boulder.ibm.com/infocenter/wasinfo/v7r0/index.jsp?topic=/com.ibm.websphere.nd.doc/info/ae/ae/rxml_adminconfig1.html<br />
">List of commands available on the AdminConfig object</a></li>
<li><a href="http://jythonpodcast.hostjava.net/jythonbook/en/1.0/LangSyntax.html">Free online Jython language book</a></li>
<li><a href="http://www.amazon.ca/gp/product/0137009526/ref=as_li_tf_tl?ie=UTF8&#038;tag=basilvandegri-20&#038;linkCode=as2&#038;camp=15121&#038;creative=330641&#038;creativeASIN=0137009526">Book published by IBM called "WebSphere Application Server Administration Using Jython"</a><img src="http://www.assoc-amazon.ca/e/ir?t=basilvandegri-20&#038;l=as2&#038;o=15&#038;a=0137009526" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /></li>
</ul>
<p>Happy scripting!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.basilv.com/psd/blog/2011/getting-started-with-websphere-configuration-scripting/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Alternatives to Formal Traceability</title>
		<link>http://www.basilv.com/psd/blog/2011/alternatives-to-formal-traceability</link>
		<comments>http://www.basilv.com/psd/blog/2011/alternatives-to-formal-traceability#comments</comments>
		<pubDate>Mon, 17 Oct 2011 13:23:05 +0000</pubDate>
		<dc:creator>Basil Vandegriend</dc:creator>
				<category><![CDATA[quality]]></category>
		<category><![CDATA[agile]]></category>
		<category><![CDATA[project management]]></category>
		<category><![CDATA[requirements]]></category>
		<category><![CDATA[software development]]></category>
		<category><![CDATA[testing]]></category>
		<category><![CDATA[traceability]]></category>

		<guid isPermaLink="false">http://www.basilv.com/psd/?p=713</guid>
		<description><![CDATA[In my prior post The Trouble with Traceability I discussed the problems with doing requirements traceability, especially formal traceability using approaches like a requirements traceability matrix (RTM). Despite the flaws with traceability the underlying objective is sound: ensure that everything the customer or user requires is correctly delivered. So how can we achieve this objective? [...]]]></description>
			<content:encoded><![CDATA[<p>In my prior post <a href="http://www.basilv.com/psd/blog/2011/the-trouble-with-traceability">The Trouble with Traceability</a> I discussed the problems with doing requirements traceability, especially formal traceability using approaches like a requirements traceability matrix (RTM). Despite the flaws with traceability the underlying objective is sound: ensure that everything the customer or user requires is correctly delivered. So how can we achieve this objective?</p>
<p>There are a number of pragmatic practices that address this objective while avoiding the pitfalls afflicting formal traceability. I will start with a simpler form of traceability and then move on to practices seemingly unrelated to traceability.</p>
<h3>Specification Highlighting to Track Test Coverage</h3>
<p>A tester given a written specification to verify the software against must have some way of keeping track of their progress. One simple method is to highlight each statement in the specification as they test it. This is essentially tracking test coverage of the specification. I credit <a href="http://www.satisfice.com/blog/">James Bach</a> for this idea.</p>
<h3>Testing Dashboard</h3>
<p>A testing dashboard is great at illustrating overall testing progress and can be used as part of the summary in the final test report. To produce the dashboard first divided the system under test into test areas that usually correspond to features, components, or significant non-functional requirements. Then create a grid with the test areas as rows, and the columns report testing progress for each area. Key information to report per area is an assessment of quality (is this area ready to ship?), and the thoroughness of testing (test coverage and test effort). For an example and fuller explanation see <a href="http://www.satisfice.com/presentations/dashboard.pdf">James Bach's presentation on testing dashboards</a>.</p>
<p>The testing dashboard is closest in form to a requirements traceability matrix - there's still a grid and something like requirements are listed down the grid. The big difference is that the dashboard shows a summary of testing for each area rather than listing individual tests and showing how they map.</p>
<h3>Agile User Stories with Acceptance Tests</h3>
<p>A common approach used by Agile methods is to divide requirements into fine-grained increments of business value called user stories and then define with examples or automated tests the criteria by which to accept that the story was correctly implemented. This takes a two-pronged approach to address traceability's objective. First, using fine-grained increments of functionality that are quickly developed minimizes the amount of work-in-progress that needs to be tracked and thus reduces the chance of mistakes. Second, the process produces tests for each user story early, sometimes before coding starts, so missing tests for a requirement is much less likely. Agile basically sufficiently changes how development is done to negate virtually all the sources of problems that would otherwise justify a requirements traceability matrix.</p>
<h3>Task Tracking via Task Board</h3>
<p>Popularized by <a href="http://en.wikipedia.org/wiki/Kanban_(development)">Kanban</a> but also used by many Agile teams, a task board is a practice for tracking the status of tasks. The board has multiple columns representing different status: the minimum set is usually "Not Started", "In Progress", and "Done". Tasks are placed on the board according to their status, and the team meets daily to discuss the tasks and update appropriately. I vastly prefer a physical task board on a wall, but for teams not co-located many online versions exist. Tasks are often quite fine-grained - one user story is often decomposed into multiple tasks. </p>
<p>Many teams include a status of "Testing" on the board, or use a symbol on the card to signify the completion of testing. Other quality control procedures such as code reviews can also be tracked. See the image below for an example of a task board with such states. </p>
<p><a href="http://www.basilv.com/psd/wp-content/uploads/2011/10/TaskBoard.jpg"><img src="http://www.basilv.com/psd/wp-content/uploads/2011/10/TaskBoard.jpg" alt="Example Task Board" title="Example Task Board" width="500" height="307" class="alignnone size-full wp-image-714" /></a></p>
<p>The combination of fine-grained tasks, daily updates, and tracking quality ensures that each requirement that is tackled by the team is properly developed. This achieves traceability's objective.</p>
<h3>Conclusion</h3>
<p>Despite formal traceability being too effort-intensive for too little value the underlying objective should not be ignored. I have described a variety of alternative practices that help ensure that requirements are properly developed and tested with a more modest level of effort.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.basilv.com/psd/blog/2011/alternatives-to-formal-traceability/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Trouble with Traceability</title>
		<link>http://www.basilv.com/psd/blog/2011/the-trouble-with-traceability</link>
		<comments>http://www.basilv.com/psd/blog/2011/the-trouble-with-traceability#comments</comments>
		<pubDate>Mon, 10 Oct 2011 12:23:47 +0000</pubDate>
		<dc:creator>Basil Vandegriend</dc:creator>
				<category><![CDATA[quality]]></category>
		<category><![CDATA[requirements]]></category>
		<category><![CDATA[software development]]></category>
		<category><![CDATA[testing]]></category>
		<category><![CDATA[traceability]]></category>

		<guid isPermaLink="false">http://www.basilv.com/psd/?p=710</guid>
		<description><![CDATA[In software development traceability is the linkage of requirements to the software and/or development artifacts like design or test cases. The underlying objective is to ensure that everything the customer or user requires has been correctly delivered. I have no quibbles with this goal, but in practice the applications of traceability I have seen leave [...]]]></description>
			<content:encoded><![CDATA[<p>In software development traceability is the linkage of requirements to the software and/or development artifacts like design or test cases. The underlying objective is to ensure that everything the customer or user requires has been correctly delivered. I have no quibbles with this goal, but in practice the applications of traceability I have seen leave me feeling troubled. I often wonder whether there are sufficient benefits achieved to justify the efforts spent on traceability, especially formal traceability with defined artifacts.</p>
<h3>Formal Traceability</h3>
<p>The most frequent incarnation of formal traceability I see is the requirements traceability matrix (RTM). This is a two-dimensional table, often represented by a spreadsheet, in which the one dimension lists requirements and the other dimension lists the artifact being traced to - business objectives, test cases, or design documentation. Cells within the table are filled in with a mark when the corresponding artifact and requirement are related.</p>
<p>The example traceability matrix below lists requirements across as columns and test cases down as rows. Notice that requirement 2.2 has no corresponding related test case, which is considered a traceability gap that indicates that this requirement has not been tested.</p>
<table class="fancy" cellspacing="0">
<tr>
<th></th>
<th>Req 1.1</th>
<th>Req 1.2</th>
<th>Req 1.3</th>
<th>Req 2.1</th>
<th>Req 2.2</th>
</tr>
<tr>
<th>TC 1</th>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<th>TC 2</th>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<th>TC 3</th>
<td></td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<th>TC 4</th>
<td></td>
<td></td>
<td>X</td>
<td></td>
<td></td>
</tr>
<tr>
<th>TC 5</th>
<td></td>
<td></td>
<td>X</td>
<td></td>
<td></td>
</tr>
<tr>
<th>TC 6</th>
<td></td>
<td></td>
<td>X</td>
<td></td>
<td></td>
</tr>
<tr>
<th>TC 7</th>
<td>X</td>
<td>X</td>
<td>X</td>
<td>X</td>
<td></td>
</tr>
<tr>
<th>TC 8</th>
<td></td>
<td></td>
<td>X</td>
<td>X</td>
<td></td>
</tr>
</table>
<h3>Traceability Troubles</h3>
<p>Below are listed the aspects of traceability that trouble me.</p>
<ul>
<li>I have successfully developed and enhanced high quality software applications that have pleased clients and been used in production without needing formal traceability. So does it really provide value? Traceability seems like it addresses the problem of having unimplemented or untested requirements. This has rarely been a problem on the development projects I have been on. We are much more likely to struggle with ambiguous or unclear requirements, missed or new requirements, and incorrect logic in obscure special conditions.
</li>
<li>The tracing of requirements to test cases assumes they can be identified and listed which in turn limits the testing approaches that can be used. The <a href="http://www.context-driven-testing.com/">context-driven school of testing</a> would challenge these limitations. Exploratory testing in particular does not fit the confines imposed by a formal requirements traceability matrix.</li>
<li>While the value of producing the initial RTM is unclear to me at best, I am even more dubious about the value of maintaining a RTM over time. The typical claims is that for for future requirement changes (e.g. enhancements) the RTM can be used to do impact analysis and determine what tests need to be executed and/or changed. Regarding impact analysis, the RTM might get you started, but the analysis must always consider code-level dependencies not reflected in the RTM (through, for example, reuse of common functionality), and must consider other requirements and functionality that are not directly related (and thus not reflected in the RTM) but make assumptions that are no longer true under the proposed change. The idea that the RTM can be used to determine what tests to executes makes several poor assumptions. First, that the tests are manual and not automated. If you have an automated test suite, you can just rerun the whole suite to identify failing tests - you never need to worry about running a subset. Second, that reusing manual tests is a good idea. If a test is worth re-executing often, it is frequently worth automating. For those tests that are manual, much of the benefit of having human testers is lost if they simply re-run the same scripts over and over, since there is a low probability of such re-executions failing and testers are less likely to find defects when they do occur. A better approach is to let the tester come up with their own variations each time to try to creatively break the system.</li>
<li>Tracing requirements to design is conceptually flawed. It at best very indirectly addresses the objective of traceability -  ensuring the customer gets the functionality they require. Doing this mapping is in reality quite hard because the mapping from design to code (and from requirements to design) is not a strict one-to-one and is often a complex many-to-many, especially for holistic non-functional requirements like performance or security. Furthermore, the idea of tracing to design assumes that there are fine-grained pieces of design that can be identified and referenced. This presumes a certain approach to design and its documentation that is at odds with many modern methods of software development.</li>
<li>For small-scale software applications, the system is simple enough that producing a RTM is a reasonably small effort. The number of requirements and test cases is relatively small. But the chance of errors occurring that would be caught be traceability is likewise low because the system's complexity is low. For very large systems with hundreds or thousands of requirements and tests, it is far more conceivable that requirements may be forgotten to be implemented or tested. But while the benefits of formal traceability are higher, the efforts are much, much higher. For a RTM, the number of cells to complete essentially scales non-linearly with the number of requirements in the system.</li>
<li>I do not understand the obsession with testing in the context of traceability and especially RTMs. Reviews or inspections are much more effective at finding defects than testing. So why do I never hear of formal traceability from requirements to reviews?
</li>
</ul>
<h3>Conclusion</h3>
<p>Software engineering researchers and industry leaders have reached the same conclusions about traceability. Andrew Kannenberg and Dr. Hossein Saiedian in <a href="http://www.crosstalkonline.org/storage/issue-archives/2009/200907/200907-Kannenberg.pdf">Why Software Requirements Traceability Remains a Challenge</a> (PDF) are clearly big proponents of requirements traceability, but look at their conclusions:</p>
<ul>
<li>"Unfortunately, manual traceability methods are not suitable for the needs of the software engineering industry."</li>
<li>"Currently existing COTS traceability tools are not adequate for the needs of the software engineering industry."</li>
</ul>
<p>Robert Glass in his book <a href="http://www.amazon.ca/gp/product/0321117425?ie=UTF8&#038;tag=basilvandegri-20&#038;linkCode=as2&#038;camp=15121&#038;creative=330641&#038;creativeASIN=0321117425">Facts and Fallacies of Software Engineering</a><img src="http://www.assoc-amazon.ca/e/ir?t=basilvandegri-20&#038;l=as2&#038;o=15&#038;a=0321117425" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /> points out the underlying cause of troubles implementing traceability: "when a requirement commonly links to 50 or more design requirements and each of those links to some very much larger number of coding elements and coding elements may be reused to satisfy more than one requirement, we get a burgeoning complexity problem that has resisted manual solution and even tended to thwart most automated solutions" (page 78)</p>
<p>I hope the above discussion has convinced you to tread cautiously when it comes to traceability. In a future article I will look at alternatives to formal traceability that achieve the same underlying objectives.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.basilv.com/psd/blog/2011/the-trouble-with-traceability/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Defects &#8211; To Fix or Not to Fix</title>
		<link>http://www.basilv.com/psd/blog/2011/defects-to-fix-or-not-to-fix</link>
		<comments>http://www.basilv.com/psd/blog/2011/defects-to-fix-or-not-to-fix#comments</comments>
		<pubDate>Tue, 04 Oct 2011 13:41:11 +0000</pubDate>
		<dc:creator>Basil Vandegriend</dc:creator>
				<category><![CDATA[quality]]></category>
		<category><![CDATA[agile]]></category>
		<category><![CDATA[defects]]></category>
		<category><![CDATA[lean]]></category>
		<category><![CDATA[software development]]></category>

		<guid isPermaLink="false">http://www.basilv.com/psd/?p=703</guid>
		<description><![CDATA[To fix defects or not fix defects, that is the question: whether it is better to suffer the complaints of outraged users, or to divert effort to investigate and eliminate them. Shakespeare quotes aside, every software development project has to make decisions on how many defects to fix and which ones to leave alone prior [...]]]></description>
			<content:encoded><![CDATA[<p>To fix defects or not fix defects, that is the question: whether it is better to suffer the complaints of outraged users, or to divert effort to investigate and eliminate them. </p>
<p>Shakespeare quotes aside, every software development project has to make decisions on how many defects to fix and which ones to leave alone prior to shipping. While I have seldom seen this question debated within projects, the advice from industry thought leaders varies considerably. The Agile and Lean methods of software development in particular have somewhat opposing perspectives. </p>
<p>I believe that considering both sides of this question provides a fuller understanding of the issues and better equips us to answer appropriately. Therefore in the two sections below I explore the reasons behind both sides of the debate. </p>
<h3>To Fix</h3>
<ul>
<li>Shipping poor quality, defect-ridden code can upset users, turn away customers, and lead to a hard-to-shake bad reputation.</li>
<li>The decision that a feature is worth developing is made with the expectation that it will work correctly. So any defects found in a feature means that the feature is still incomplete until these issues are fixed.</li>
<li>Defects provide feedback regarding the development process. Each defect represents an opportunity to do a root cause analysis of what led to the defect and put countermeasures in place to prevent re-occurrence. The Lean mindset of "Stop the line" demands that new development be put on hold to fix newly discovered defects.</li>
<li>Defects introduce the risk of compounding quality problems. The impact of a defect can be more significant than initially realized. Defects can be inadvertently replicated in other parts of the system. Enhancing components with too many defects can slow progress to a halt, as the system becomes essentially a shifting quicksand that is too unstable to work on. Constantly fixing defects helps maintain a high velocity of development over time.</li>
<li>To mitigate risks in not fixing defects, each defect needs to be analyzed to understand its impact, cause, and required changes to fix. But after performing this analysis most of the work is usually done - the fix is relatively straightforward. Waiting to decide later to fix the defect (e.g. in a subsequent release) causes all the knowledge gained in the analysis to decay over time which is wasteful (in the Lean sense).</li>
</ul>
<h3>Not To Fix</h3>
<ul>
<li>Significantly delaying the release of software to fix all defects leads to a loss of immediate revenue and potentially loss of market share due to competitors beating you to market. So you cannot afford to wait to fix all defects.</li>
<li>The entrepreneurial mindset, especially for startups, is to ship early to get feedback from paying customers. Perfection is the enemy of the good.</li>
<li>Under at least some versions of Scrum, defects are considered new tasks that are added to the product backlog to be prioritized by the product owner. This prioritization is based on the defect's impact (severity and likelihood of occurrence) and the effort required to fix it. Many minor defects will therefore likely never be fixed as new functionality will typically be of higher value.</li>
<li>Stopping to analyze and fix defects disrupts developers who are in the middle of working on other functionality and is wasteful.</li>
<li>Fixing defects in functionality that is already otherwise finished development and testing will require additional regression testing. Not fixing now and waiting until enhancements to this functionality are needed minimizes the extra effort required.</li>
</ul>
<h3>Conclusion</h3>
<p>Shakespeare was wrong. There is actually a third perspective regarding whether or not to fix defects: avoid the question as much as possible by focusing on defect prevention. The Lean mindset of building quality in avoids all the waste associated with finding, analyzing, and fixing defects and should be our preferred approach. Only when it fails and the occasional defect is introduced do we then have to answer the question.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.basilv.com/psd/blog/2011/defects-to-fix-or-not-to-fix/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Software Documentation Templates</title>
		<link>http://www.basilv.com/psd/blog/2011/software-documentation-templates</link>
		<comments>http://www.basilv.com/psd/blog/2011/software-documentation-templates#comments</comments>
		<pubDate>Mon, 18 Jul 2011 13:03:36 +0000</pubDate>
		<dc:creator>Basil Vandegriend</dc:creator>
				<category><![CDATA[tools]]></category>
		<category><![CDATA[documentation]]></category>
		<category><![CDATA[quality]]></category>
		<category><![CDATA[software development]]></category>

		<guid isPermaLink="false">http://www.basilv.com/psd/?p=686</guid>
		<description><![CDATA[I am a believer in minimizing software documentation that lives outside the code. This does not, however, mean no documentation. There are a number of reasons why documentation can be useful, especially for larger organizations: Documentation is more effective than code at communicating high-level or cross-cutting design and operational concerns. Larger organizations or distributed organizations [...]]]></description>
			<content:encoded><![CDATA[<p>I am a believer in minimizing software documentation that lives outside the code. This does not, however, mean no documentation. There are a number of reasons why documentation can be useful, especially for larger organizations:</p>
<ul>
<li>Documentation is more effective than code at communicating high-level or cross-cutting design and operational concerns.</li>
<li>Larger organizations or distributed organizations cannot rely on face-to-face communications or having everyone co-located in one room so documentation has a role to play in knowledge transfer and communication.</li>
<li>Documentation can target a non-developer audience, such as business representatives.</li>
<li>Documentation helps protect against team turnover, which while a bad practice is not uncommon when using a vendor for development or maintenance or using separate development and maintenance teams.</li>
<li>The act of creating documentation helps clarify thinking and identify gaps, thus functioning as a form of quality control.</li>
</ul>
<p>This last reason is actually quite significant as it is often under-appreciated. In my experience it has happened quite often when I am working on design documentation for a body of code that I identify things that are sub-optimal, such as a badly named class or an unwanted dependency between components. This form of quality control is most valuable, however, when it uncovers gaps such as missing functionality or flawed design. Such gaps are very difficult for most other forms of quality control like testing or reviews to find. </p>
<p>The use of documentation templates makes it much easier to find these gaps by acting essentially as checklists of items to consider. I recently came across a great <a href="http://7d6a11fowa9p0ndghc221ih49r.hop.clickbank.net/">set of comprehensive templates covering all aspects of software development by Klariti</a> . For a team or organization the price of these templates is ridiculously low - the full set of software development templates costs far less than a day's salary - and there are an assortment of free templates as well. </p>
<p>There are some potential drawbacks to avoid when using documentation templates. Some people have a tendency to want to fill in every section of a template or to use all available templates. This can waste a lot of time and effort. Focusing on identifying what is relevant and useful to document and doing only that is much more effective. Another drawback is that while the templates are in Office format (Word or Excel), you might be better served using a different medium such as a Wiki or a <a href="http://www.basilv.com/psd/blog/2007/development-tools-should-use-text-files">text-based format</a> that is more friendly to version control. In these cases I would convert the templates to the desired medium. </p>
<p>Even if your team or organization has some templates, I think it would be beneficial to check out Klariti's templates and use them as a checklist to see if there is anything missing or needing revision within your own. That link again is: <a href="http://7d6a11fowa9p0ndghc221ih49r.hop.clickbank.net/" target="_top">Klariti's software development templates</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.basilv.com/psd/blog/2011/software-documentation-templates/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Most Disturbing Code</title>
		<link>http://www.basilv.com/psd/blog/2011/most-disturbing-code</link>
		<comments>http://www.basilv.com/psd/blog/2011/most-disturbing-code#comments</comments>
		<pubDate>Mon, 27 Jun 2011 13:32:07 +0000</pubDate>
		<dc:creator>Basil Vandegriend</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[professional]]></category>
		<category><![CDATA[code review]]></category>
		<category><![CDATA[mission]]></category>
		<category><![CDATA[software development]]></category>

		<guid isPermaLink="false">http://www.basilv.com/psd/?p=662</guid>
		<description><![CDATA[One question I often ask when giving job interviews is "What do you find most disturbing when reviewing code?" The answers I receive are especially interesting when compared to the interviewee's results doing an actual code review: it is rare for them to identify the problems they consider the most disturbing. This lack of congruence [...]]]></description>
			<content:encoded><![CDATA[<p>One question I often ask when giving job interviews is "What do you find most disturbing when reviewing code?" The answers I receive are especially interesting when compared to the interviewee's results doing an actual code review: it is rare for them to identify the problems they consider the most disturbing. This lack of congruence between what people say they do and what they actually do is not unusual - it is a common problem, for example, when using market focus groups. </p>
<p>This chain of thought then prompted me to ask myself this same question. What do <em>I</em> find most disturbing when reviewing code? My instinctive reaction was to answer "defects", but upon a little reflection I realized this was not true - I expect to find defects as a <a href="http://www.basilv.com/psd/blog/2011/top-seven-quality-principles-in-software-development">fundamental principle of quality</a>. So it usually does not bother me to find a few during a review. </p>
<p>There are times when I am very disappointed when reviewing code - what am I finding at those times? Here are some specific occurrences:</p>
<ul>
<li>Code riddled with defects reflecting a fundamental lack of understanding about the requirements.</li>
<li>Code very difficult to understand due to poor names and overly complicated logic that seemed repetitive or unnecessary.</li>
<li>A large amount of non-GUI code written without any supporting tests.</li>
<li>Code with inconsistent formatting and style.</li>
</ul>
<p>What is the common theme here? After further reflection, I realized that the common element underlying these situations that I find most disturbing is a lack of professionalism / craftsmanship. This can manifest in a number of ways as indicated by the above list. The key criteria is whether a developer is helping achieve or is hindering <a href="http://www.basilv.com/psd/blog/2008/our-mission-as-software-developers">our mission as software developers</a>, based on what they produce for code. My evaluation of what is most disturbing is at its essence based on my core values and beliefs concerning software development.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.basilv.com/psd/blog/2011/most-disturbing-code/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Top Seven Quality Principles in Software Development</title>
		<link>http://www.basilv.com/psd/blog/2011/top-seven-quality-principles-in-software-development</link>
		<comments>http://www.basilv.com/psd/blog/2011/top-seven-quality-principles-in-software-development#comments</comments>
		<pubDate>Thu, 19 May 2011 13:10:34 +0000</pubDate>
		<dc:creator>Basil Vandegriend</dc:creator>
				<category><![CDATA[quality]]></category>
		<category><![CDATA[lean]]></category>
		<category><![CDATA[process]]></category>
		<category><![CDATA[software development]]></category>

		<guid isPermaLink="false">http://www.basilv.com/psd/?p=656</guid>
		<description><![CDATA[How do you ensure high quality when developing software? The processes that are used, the decisions that are made, and the actions that are taken must be aligned with proven quality principles. In this context I define a principle to be a fundamental truth that is the foundation for a system of behavior. Too often [...]]]></description>
			<content:encoded><![CDATA[<p>How do you ensure high quality when developing software? The processes that are used, the decisions that are made, and the actions that are taken must be aligned with proven quality principles. In this context I define a <em>principle</em> to be a fundamental truth that is the foundation for a system of behavior. </p>
<p>Too often I see projects and even entire organizations misaligned with one or more of these principles and in the worst cases taking actions in complete violation of these principles. The result is predictable: poor quality or significantly extra cost to achieve adequate quality, with the complete absence of high quality. </p>
<p>Achieving high quality is difficult. It suffers from what I call the weakest link problem: no matter how many things you do well, it is the area you are poorest at that generally dictates the overall quality. Another way of putting it is this: there are many ways to fail at developing high quality software and only a few ways to succeed.</p>
<p>So to help you achieve high quality here is my list of the top seven quality principles in software development. For each principle think about the implications - how might your organization be acting in ways that contradict it? How might you change to better align with the principle?</p>
<ol>
<h3>
<li>People as individuals and teams are the most significant factor affecting quality</li>
</h3>
<p><a href="http://forums.construx.com/blogs/stevemcc/archive/2008/03/27/productivity-variations-among-software-developers-and-teams-the-origin-of-quot-10x-quot.aspx">Software engineering research</a> has consistently found that variations between people are by a large margin the single largest contributing factor to quality and productivity. While good process is important, good people have a much greater impact.</p>
<p>Achieving high quality requires people capable of doing high quality work, both individually and as teams. While individual excellence is important, most software is produced by a team of people and it is the team's overall quality of workmanship that ultimately determines the quality of the software being produced. </p>
<p>Within organizations, throughout the blogosphere, and even in many software development books too much attention is placed on process and methodology and too little on the quality of the people doing the work.</p>
<h3>
<li>All software has defects</li>
</h3>
<p>Any significant piece of software has defects, no matter how carefully it has been produced. Zero defects is an Utopian ideal, but not a realistic goal. With extreme discipline and effort you can get very close to zero defects, as evidenced by the <a href="http://www.fastcompany.com/magazine/06/writestuff.html">group working on the space shuttle control software</a>, but under more normal conditions defects are unfortunately a fact of life. </p>
<h3>
<li>It is impossible to fully test software</li>
</h3>
<p>For even the most trivial of functionality like adding two numbers there are a nearly infinite set of tests that could be performed. So completely testing a significant piece of software is impossible. Testers therefore must appropriately choose and perform an extremely small subset of all possible tests. </p>
<h3>
<li>Defects are more expensive to fix the later they are found</li>
</h3>
<p>As time elapses from the point at which a defect is introduced, the effort required to fix the defect grows. There are several reasons for this. The most significant is that as the software moves through subsequent phases of the software development life-cycle - from development into test and then finally into production use - the effort involved to get to that phase increases. Work previously done, like testing and deployments, has to be repeated to some degree. Another reason is that over time, the developer's memory of the problematic functionality in question fades, thus requiring more effort to recover the context. </p>
<h3>
<li>Build quality in</li>
</h3>
<p>This principle comes from the lean thinking literature and addresses the issue highlighted by the prior principle. Finding and fixing defects is considered waste - non-value-add work - and thus according to lean should be minimized. This is accomplished in a twofold manner: first, use practices that help prevent the introduction of defects, such as <a href="http://www.basilv.com/psd/blog/2010/example-based-requirements">specifications with examples</a>, and second use practices such as <a href="http://www.basilv.com/psd/blog/tag/test-driven-development">test driven development</a> and <a href="http://www.basilv.com/psd/blog/tag/code-review">code reviews</a> that find defects as quickly and cheaply as possible after they are added.</p>
<h3>
<li>Adopt your approach to quality based on the level of criticality and complexity</li>
</h3>
<p>Different contexts require different actions. Criticality and complexity are the two most significant factors to consider in order to determine the level of quality you require and the approach you will take to achieve it. Criticality is the measure of how important the application is to the business and/or the users and is generally assessed using categories such as life critical, mission critical, business important, and casual use. Complexity is the measure of how difficult it is to understand and work with the code. A number of factors increase complexity such as complicated algorithms, sheer volume of code, and poor architecture resulting in high coupling.</p>
<h3>
<li>Higher quality requires more quality-focused activities and better execution</li>
</h3>
<p>In order to achieve a higher level of quality corresponding to a lower volume of production defects requires that you introduce less defects and/or find and fix more defects throughout your software development process. This process can be modeled as a series of activities acting as feedback loops or filters that each prevent or remove some percentage of defects. </p>
<p>Using this model it becomes obvious that the only way to reduce the defects remaining at the end of the process is to either make existing activities more effective - what I call <em>better execution</em> - or add more activities to filter out additional defects. The choice of which quality-focused activities to use and the effort to put into each requires deliberate planning which I have written more about in my article titled <a href="http://www.basilv.com/psd/blog/2010/filter-by-failure-mode-matrix-a-method-for-planning-quality">Filter by Failure Mode Matrix: A Method for Planning Quality</a>.
</ol>
]]></content:encoded>
			<wfw:commentRss>http://www.basilv.com/psd/blog/2011/top-seven-quality-principles-in-software-development/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Growth through Operating Under Constraints</title>
		<link>http://www.basilv.com/psd/blog/2011/growth-through-operating-under-constraints</link>
		<comments>http://www.basilv.com/psd/blog/2011/growth-through-operating-under-constraints#comments</comments>
		<pubDate>Mon, 16 May 2011 13:00:26 +0000</pubDate>
		<dc:creator>Basil Vandegriend</dc:creator>
				<category><![CDATA[learning]]></category>
		<category><![CDATA[personal development]]></category>
		<category><![CDATA[software development]]></category>

		<guid isPermaLink="false">http://www.basilv.com/psd/?p=652</guid>
		<description><![CDATA[The other day I was composing a tweet and it struck me that the difficulties I faced in crafting my message to fit within 140 characters without using abbreviations was a good exercise for making me a better writer. After further reflection I generalized this specific case to a broader principle about personal development: performing [...]]]></description>
			<content:encoded><![CDATA[<p>The other day I was composing a tweet and it struck me that the difficulties I faced in crafting my message to fit within 140 characters without using abbreviations was a good exercise for making me a better writer. After further reflection I generalized this specific case to a broader principle about personal development: performing an activity under uncomfortably tight constraints stimulates growth. </p>
<p>The constraints need to push us out of our comfort zone. We need to bump up against the constraints, repeatedly, in order to generate those learning opportunities. The constraints cannot be so harsh, however, that they prevent us from finding solutions that fit within them. </p>
<p>The nature of the constraint dictates the area in which the growth will occur. Twitter's constraint encourages conciseness and the ability to focus on the core message. Writing assessments in classes tend to set the opposite constraint of a minimum number of pages of content in order to encourage deeper exploration of a topic and more hours of practice in writing.</p>
<p>Thinking about this principle in the context of software development made me realize that the many different types of software are often differentiated by the major constraints they face:</p>
<table class="fancy" cellspacing="0">
<tr>
<th>Type of Software</th>
<th>Major Constraints</th>
</tr>
<tr>
<td>Mobile apps</td>
<td>Minimize power usage, which implies minimizing CPU and memory usage.</td>
</tr>
<tr>
<td>Real-time systems</td>
<td>Fast response time - no lags.</td>
</tr>
<tr>
<td>Large public websites</td>
<td>Scalability to handle extremely high volumes of traffic</td>
</tr>
<tr>
<td>Mission critical enterprise systems</td>
<td>Reliability and availability</td>
</tr>
</table>
<p>Constraints can also be applied to processes or methodologies to provide learning opportunities. I have heard more than one ScrumMaster say they prefer one week iterations for teams new to scrum as a way of learning how to decompose functionality into the smallest possible increments. The concept of stretch goals is similar: set targets beyond the current capabilities of the team in order to push team members out of their comfort zone and seek innovative ways to meet those targets.</p>
<p>How can you use constraints to improve and grow? I encourage you to reflect on this and leave a comment sharing your thoughts.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.basilv.com/psd/blog/2011/growth-through-operating-under-constraints/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Streaming Data to Reduce Memory Usage</title>
		<link>http://www.basilv.com/psd/blog/2011/streaming-data-to-reduce-memory-usage</link>
		<comments>http://www.basilv.com/psd/blog/2011/streaming-data-to-reduce-memory-usage#comments</comments>
		<pubDate>Thu, 05 May 2011 13:01:52 +0000</pubDate>
		<dc:creator>Basil Vandegriend</dc:creator>
				<category><![CDATA[design]]></category>
		<category><![CDATA[coding]]></category>
		<category><![CDATA[Hibernate]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[software development]]></category>

		<guid isPermaLink="false">http://www.basilv.com/psd/?p=641</guid>
		<description><![CDATA[I recently performed a series of optimizations to reduce an application's memory usage. After completing several of these I noticed that there was a common theme to many of my optimizations that I could explicitly apply to help identify further opportunities for improvement. As a reoccuring solution, this qualifies as a design pattern which I [...]]]></description>
			<content:encoded><![CDATA[<p>I recently performed a series of optimizations to reduce an application's memory usage. After completing several of these I noticed that there was a common theme to many of my optimizations that I could explicitly apply to help identify further opportunities for improvement. As a reoccuring solution, this qualifies as a design pattern which I refer to as <em>Streaming Data</em>.</p>
<h3>Context</h3>
<p>This pattern applies when you need to process a significant volume of data but the processing can be done incrementally on small subsets of the data. A typical example is loading a list of entities and then iterating through the list to process each one. While the results (output) of processing can be combined across all the entities, it is important that the input to the processing only requires a small subset of all the data, and not the entire list of entities. A code example illustrating this problem context is shown below:</p>
<pre class=" prettyprint">
List&lt;Entity&gt; entities = loadEntities();
List&lt;ProcessingResult&gt; results = new ArrayList&lt;ProcessingResult&gt;();
for (Entity entity : entities) {
  ProcessingResult result = processEntity(entity);
  results.add(entity);
}
</pre>
<h3>Solution</h3>
<p>Reducing the memory usage in the above example is based on the observation that loading the entire list of objects to process can consume a large amount of memory and is not necessary since we only use one object at a time. So the solution is to stream - incrementally retrieve - these objects instead of loading them all at once. For the consumer of this data the only change required is to first obtain a reference to the stream such as an <code>Iterable</code> that incrementally fetches data. Updating our prior code example results in the following (changed lines shown in green background):</p>
<pre class=" prettyprint">
<span style="background:#97FF77;">Iterable&lt;Entity&gt; entities = streamEntities();</span>
List&lt;ProcessingResult&gt; results = new ArrayList&lt;ProcessingResult&gt;();
for (Entity entity : entities) {
  ProcessingResult result = processEntity(entity);
  results.add(entity);
}
</pre>
<h3>Examples</h3>
<p>The mechanism to use for streaming objects will depend on the source of the data and may require significant changes compared to a bulk load. Here are some specific examples.</p>
<h4>Parsing XML</h4>
<p><a href="http://www.basilv.com/psd/blog/2008/simple-xml-parsing-using-jaxb">Parsing XML files using JAXB</a> is a convenient approach for converting the entire file into a tree of Java objects, but it populates the entire tree at once. To instead stream such data use the SAX parser provided as part of the JAXP API. The SAX parser is event-based, which means that it iterates over the entities (and attributes) of your XML and for each item invokes callbacks you define.</p>
<h4>Querying Databases using Hibernate</h4>
<p>When using Hibernate to query for a collection of entities it is convenient to simply ask Hibernate for the entire collection. A typical example of doing this using the query by criteria API within Hibernate is below:</p>
<pre class=" prettyprint">
public List&lt;Entity&gt; queryData() {
  Criteria criteria = session.createCriteria(Entity.class)
  // Add appropriate restrictions
  // ...
  List&lt;Entity&gt; result = criteria.list();
  return result;
}
</pre>
<p>When the criteria returns a large volume of data, however, this approach will consume a high volume of data. Instead use the <code>scroll</code> method on <code>Criteria</code> to return a <code>ScrollableResults</code> instance that can be used to iterate through the results. If you prefer to not expose the rest of the application to Hibernate classes, you can wrap the <code>ScrollableResults</code> in a special implementation of <code>Iterator</code> (which I leave as an exercise to the reader). The revision of the above example using streaming looks like the following (changed lines shown in green background):</p>
<pre class=" prettyprint">
<span style="background:#97FF77;">public Iterator&lt;Entity&gt; queryData() {</span>
  Criteria criteria = session.createCriteria(Entity.class)
  // Add appropriate restrictions
  // ...
<span style="background:#97FF77;">  ScrollableResults scrollableResults = criteria.scroll();</span>
<span style="background:#97FF77;">  Iterator&lt;Entity&gt; result = new ScrollableResultsIterator(scrollableResults);</span>
  return result;
}
</pre>
<p>This scroll approach only works when all the data can be processed within the same database transaction since the Hibernate session must remain open for the <code>ScrollableResults</code> to be able to continue fetching data. If this is not suitable then another option is to load the data using multiple queries that each return a subset of the data. One common example of this is when displaying search results to an user. Rather than showing all the results (which may number in the hundreds or thousands) show one page at a time and let the user step through the various pages of results. Due to the frequency with which this occurs I refer to this solution as <em>paging</em>. To implement this in Hibernate using the query by criteria API is fairly simple:</p>
<ol>
<li>Start by creating your criteria object and defining its restrictions as you normally would.</li>
<li>Apply an ordering to the criteria. It is best if this ordering is consistent, by which I mean that database updates or inserts between queries will not result in invalid or unexpected results being returned. This assumes each query for a page executes in a separate database transaction which provides no guarantees of transactional isolation for the group of queries as a whole. In some contexts, consistency is not required. If it is then I prefer to use an auto-incrementing surogate primary key as the field to sort by in order to achieve the highest level of consistency. </li>
<li>Apply restrictions to retrieve only the specific page. This is done using the methods <code>setFirstResult</code> and <code>setMaxResults</code> on the <code>Criteria</code> object.</li>
</ol>
<h3>Consequences</h3>
<p>One potential consequence of streaming data is a reduction in performance because data is loaded piece by piece rather than in bulk. To mitigate this, the solution is use what I call <em>loading sets</em>: define subsets of the total data volume that are small enough to not impact memory usage but large enough to minimize performance impacts. Then load the data one set at a time. The consuming API does not need to change: it can still iterate or stream over each loaded set, and then fetch the next set once the current one is exhausted.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.basilv.com/psd/blog/2011/streaming-data-to-reduce-memory-usage/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

