«    »

Running My Website: 3 Month Retrospective

My website has been operating now for three months, and it has been an incredible learning experience. I thought I'd take the opportunity to share my observations and lessons learned - especially since some of them are applicable to operating and maintaining software applications of any type. If you've read my articles on learning (starting with Perpetual Learning), you could probably have guessed that I am a big fan of project retrospectives (also called postmortems) where at the end of a project (or at a suitable milestone) you bring the team together to discuss how things went, what worked well, and what can be improved. As one co-worker once said in a postmortem, there is always something to improve. Given how beneficial they can be, I am disappointed that retrospectives do not appear to be commonly employed in the industry. When they are, the results are often not distributed to anyone outside the project team, especially for failed projects which I'd expect one could learn the most from. So I hope that by publishing this retrospective I will encourage you to do likewise.

Before launching my website, I had created a local development environment to run my website using WAMP (a package combining Apache web server, MySQL database and PHP for windows), installed WordPress (the software running this website), and ensured everything was running properly. Satisfied I would not have any problems, I uploaded the website files to my ISP, did the necessary setup (i.e. created the database) and tried to view the home page. Naturally, I promptly ran into all sorts of problems. After a few frantic days, I had resolved most of the issues and the website was working to my satisfaction. The root cause for the problems was that my ISP used Microsoft's IIS web server instead of Apache. I had known that my ISP used both IIS and Apache web servers, so I had hoped Apache would be used instead of IIS, but such was not the case. There are two lessons to learn here: the first is that the more closely your development environment resembles your production environment, the better. The second lesson is to not make hopeful assumptions, and instead verify and test everything.

Immediately after announcing my website I was interested in obtaining feedback on the number of viewers. My ISP provided web server statistics analysis, so I took a look. I quickly realized there were problems with the statistics: every hit on the site was being recorded, including my hits to either test the site, administer the site, or to write articles through the web interface. The statistics I was interested in were being obscured or distorted by my own hits. So I ended up downloading the raw web server log file to my local machine and analyzed it with AWstats, which is open source software for analyzing web server logs. I set up the software to exclude administration URLs and to exclude hits from my IP address, and the results better matched what I was looking for. I was also happier with the statistics produced by AWstats compared to the software used by my ISP because AWstats automatically filters out hits by robots (automated programs navigating the site) rather than combining these hits with those by human readers. I think measuring usage of an application or website by users is a great idea - I certainly found it useful, and will have more to say about what I learned below. But the lesson here is to not blindly trust the numbers provided by an analysis program and to be careful of what you are measuring.

I quickly grew tired of manually downloading the web server log (since I wanted to look at the latest day's or week's statistics) and automated the whole process using an ant build script. Now with one click I can download the latest server log and run the analysis software. One lesson I learned is that I should have automated the process much sooner that when I did - I did it manually for about five days longer than I should have, in part because I thought it would take longer to automate than it actually did, and in part because the amount of manual work was fairly small. But once I fully automated the process, I immediately regretted not having done it sooner.

My initial analysis of the web server logs provided some interesting statistics, one of which was the large number of page-not-found requests (http error code 404). Further investigation revealed that almost all of these were due to the absence of a robots.txt file in my website (used by web crawlers to determine which portions of the site to avoid). I immediately added a blank robots.txt file in order to prevent more 404s from occurring. I'm sure the absence of a robots.txt file didn't cause any problems to the crawlers, but I considered the 404 errors in the log a warning mechanism, and didn't want it unnecessarily cluttered, just like having too many warnings displayed by your build or IDE causes you to ignore all of them.

When a new version of WordPress came out, I decided to upgrade. Because of the differences between my local development environment (running Apache) and the production environment (running IIS), I realized that I needed a staging / testing environment running on my ISP to experiment with and test out changes before going to production. I could have tried running IIS in my local development environment, but I wasn't sure if a free version existed, and had heard of many people having problems configuring IIS, so I figured I was better off using my ISP's configuration for both the staging and production environments. As I set up the staging environment and began the upgrade process, I realized that it was somewhat complicated and involved, in part because I wanted to minimize downtime for the website. To address this I created an upgrade process document to describe the steps necessary to perform the upgrade. Not only was this useful in ensuring I didn't forget any necessary steps, but I can use the document as a basis for performing future upgrades. The lesson I learned is that a public website is no different from any other software application in production: in particular have a separate staging / testing environment and have a documented migration / upgrade process. If you have an application in production without these things in place (and I do know of at least one case where this is true), you can expect to have problems. Since I set up the staging environment I have found it quite useful, especially for trying out IIS-related changes that I cannot try in my local development environment. Since my website is in a sub-directory of my domain (basilv.com/psd/), it was easy to put the staging environment in a parallel directory. If you are setting up a website, I'd recommend using this structure instead of putting your website directly in your domain's root. If you want to receive traffic going to your domain root, you can always add redirects (i.e. the URL basilv.com is redirected to basilv.com/psd/).

It was never my goal to have lots of search engine traffic, yet when setting up my site I tried to follow basic recommendations for search engine optimization (SEO) such as Google's webmaster guidelines. Not only did these guidelines make sense for virtually all websites in the first place, but it was an opportunity for learning about SEO that I didn't want to pass up. Combined with a small amount of 'marketing' such as providing information on the WordPress forums pointing to my post on Running WordPress under IIS, I achieved a Google page rank of 4 across my site, and a page rank of 5 for the WordPress article. Over the three months I have received a decent amount of search engine traffic, most of it going to this article. I found the statistics on search engine usage interesting: visitors to my site from a search engine results page used Google 80% of the time, Yahoo 15% and MSN 5%. The results follow a typical distribution found in many industries: there is a market leader with the majority of the customers (Google), a secondary player with a significant minority (Yahoo), and the remaining players which barely show up (MSN). I should point out that these statistics are for an extremely small sample of searches on a very limited set of topics, so I wouldn't generalize from them.

Web server logs usually include the user agent from which a user's browser and operating system can be determined. Statistics for operating systems matched my expectations: the majority was Windows (82%, of which 68% was Windows XP and 10% Windows 2000). The various flavors of Linux and Unix, including Mac OS X, accounted for the remainder (various distributions of Linux added up to 6% and Mac OS X had 5%). For 7% of the hits the operating system was unknown. I found the low numbers for Linux interesting. Despite all the hype surrounding Linux on the desktop, Linux usage was quite low. Considering the computer-savvy audience for my website, I would have expected this number to be higher.

Statistics for browsers were more surprising. Initially Internet Explorer had the lead as I had expected, but as traffic to my site picked up, Firefox became the leader with 50% of the hits (40% use Firefox version 1.5 or later and 10% use earlier versions). Internet Explorer received 34% of the hits, almost all of which were MSIE 6. Other browsers that had more than 1% of the hits included Mozilla at 6%, Safari at 3% and Opera at 3%. For 2% of the hits the browser was unknown. I found it interesting that despite the majority of users being on Windows, less than half of them are using Internet Explorer. At least among the technically-minded users coming to my website, more than half of Windows users have switched to Firefox. These results are quite different from more general website statistics, where I've heard that Internet Explorer is still used by roughly 80% of visitors.

From the statistics I was able to determine which articles are the most popular. (Actually, for technical reasons these statistics indicate the number of times articles are viewed after they no longer appear on the home page, which given my publishing schedule is one week after they are posted.) The most popular articles (excluding the WordPress one) are listed below. If you've missed one or more of them, I suggest giving them a read.

  1. Working Smarter, Not Harder
  2. Overtime Considered Harmful
  3. Local Variable Declarations
  4. What is Professional Software Development

There are several different reasons why these articles are the most popular. The last article is linked from my About page, so newcomers to my site who look at the About page may navigate to the article. These articles were all published in January, and the general trend I've observed is that the longer the article has been available, the more hits it receives. For the first two articles, I'd like to think that it is their intriguing titles that have attracted more viewers. If so, then this confirms the advice I've read multiple times: the title is one of the most important elements to an article. (The introduction being the other important element.)

The process of writing articles for my website taught me a few other lessons. Originally, when I established my website and set a goal of publishing one article per week, I was uncertain whether I would be able to achieve this goal, both in terms of finding the time to write the articles, and in terms of finding a topic to write about. In order to help ensure I achieved my goal, I followed the recommendations for goal-setting taught in personal management courses. I wrote the goal down (making it concrete). I published the goal as one of a list of goals for my website and told my friends about it (thus introducing the risk of public shame if I fail to achieve the goal). After a few weeks of running the site, I also set a specific day & time for publishing each article (Thursday morning), so that it is always clear each week whether I have met my goal or not. I believe these steps have helped me achieve my goal so far.

What about my concern about being able to come up with ideas for articles? I no longer worry about this. In fact, I have the opposite problem: too many ideas. I've had to set up a document to capture all my ideas, and sometimes have troubles deciding which one to use. Scarcely a week goes by that I don't generate two or three ideas for every one I turn into an article. In fact, the act of writing an article often spawns new ideas. And there is nothing special about me in this regard. I have recently read books such as The Joy of Writing by Pierre Berton and articles that indicate the same experience happens to professional writers and entrepreneurs. To quote Pierre Berton, "... ideas are a dime a dozen in this business. It is not ideas that count, it is the execution of these ideas." (page 313). I'd suggest the same is true for software development, whether one has ideas for new features of an existing application, or for a brand new application: the idea is easy to have, but to turn it into a working, usable feature or application that users will actually use is much harder, and is the essence of our craft.

On that note, I'll bring this article to a close. I hope you have enjoyed the website so far and continue to read the articles I post. I'd appreciate any feedback you have: feel free to leave a comment and let me know what you think.

If you find this article helpful, please make a donation.

«    »