«    »

Development Tools Should Use Text Files

Here is a request to all vendors of development tools: please persist code to text files. Over the last few years, I have encountered an alarming number of tools that do not use text files. Some examples are:

  • A reporting tool that stores report definitions as binary files.
  • A data modeling tool that persists models as binary files.
  • An extract-transform-load (ETL) tool that writes the data to an object-oriented database.

I have witnessed the problems and struggles that occur when trying to use these tools as a result.

The first problem area is version control. Version control software such as Subversion supports concurrent modification of files without locking by handling concurrent changes to the same files through merging and diffing. This is handled automatically by the version control software for text files, but requires a custom diff and merge tool for each particular binary format you want to store. Development tools with binary file formats usually do not provide any diff or merge capability, and when they do it is built into the tool with no easy way to plug it into the version control software. For such tools, a frequent workaround is to use locks within the version control system to ensure that only one person is modifying each file at a time. Locks are generally painful to use: they are an extra step to perform for each edit, and they do not scale as the team grows in size or spreads across multiple locations.

The ETL tool I have encountered that writes to a database is even more difficult to use with version control. While you can take a backup of the entire database, it is not an effective way to do versioning as it is too course-grained. Fortunately, the tool provides the capability to export the transformations to individual text files and re-import them. So proper version control is possible, though awkward because of the extra steps necessary.

Another problem with tools using proprietary binary formats to store code is that sometimes you need to do something which the tool does not support. A common scenario is searching for uses of a particular artifact in order to perform impact analysis. Consider a real-life example: a database table is having a column removed. You need to verify first that this column is not being used in reports, extracts, transformations or loads. If the reporting tool or ETL tool does not provide an appropriate search function, then how can this impact analysis be done? If the tools write to text files, then you can do a text search on the file system, outside the tool, in order to find uses of the table column to be deleted. If the tools write to binary files, then this is not an option. If you instead need to rename the column and the tool does not provide a convenient way to do this across all reports or transformations, then it will likely be much easier to do a global search and replace across the text files. With binary files, you are stuck making the modifications within the tool.

A third problem with binary formats is that they restrict interoperability between tools and increase the risk of vendor lock-in. If you need or want to change tools, and neither tool has the appropriate functionality to convert to the other's format, then you are stuck if either tool restricts itself to a binary format.

Tools that manipulate graphics or diagrams often write to binary formats. In the case of bitmap images, a binary format makes a lot of sense, and I am doubtful you would ever want to merge changes to a single image. Diagrams or models that contain words as well as image information are candidates for doing merges or impact analysis, and therefore are better stored as text. This is possible with certain vector graphic formats such as SVG.

One general characteristic of the tools I have seen that do not write to text files is that they are marketed to non-developers. In some cases, the vendors of these tools claim that developer expertise is not necessary because you can create artifacts through graphical manipulation or wizards rather than traditional entry of code into an editor. It seems like this marketing mindset negatively impacts how well the tool accommodates typical development practices such as version control, automated builds, automated testing, and refactoring. Vendors seem to think that since developers are not necessary to use the tool, development practices are not necessary for the artifacts produced by the tool. I strongly disagree on both points.

Vendors may succeed in building tools that do not require the textual entry of code, but no matter what the user interface is, fundamentally the tool provides a means for a person to create a computer-readable specification for how to perform some operation. That is coding. Development practices such as version control, builds, deployments, and testing are applicable to all types of coding. The mechanism or user interface by which the code was created is irrelevant.

If you find this article helpful, please make a donation.

«    »