At $work we have what we all refer to as "The Helpdesk". By this point, the helpdesk is just one piece of the whole mess ('web application' has always seemed like a goofy term to me, but I suppose it applies here). It was on its second or third iteration when I was hired, and I've ended up being solely responsible for it. While in some ways I've improved the process of working on the system (I've set up a separate development environment, source control via darcs, and a separate ticketing system for dealing with bugs/new feature requests), it still suffers from being a one person system. To make matters worse, being the small shop that we are, and not being in anyway software development shop, I've got plenty of other things to that often distract me from playing with code.
The 'one-man-show' problem bit me again recently. Last summer, it was decided that we should record the temperature and humidity of our AC units twice a day (and eventually we decided to include information from our PDU's and UPS's). This sounded like a good use for
RRDTool; while we were using it elsewhere already, those uses had been setup before my time, so I was looking forward to actually learning the ins and outs while setting this new system up. The form to enter data is about is ugly and simple as could be, but since clients will never see it I haven't been that worried about prettifying it. The helpdesk is all mod_perl, using
HTML::Mason as the templating system. I went with
RRDTool::OO as it didn't seem any more difficult than RRD::Simple, and could better handle complex cases if I ended up needing anything complex.
A ticket opens at 10 am and 10 pm every day, reminding the admins on duty to record all of the required data. Of course, its not always possible to do so immediately, and if the admin uses pen and paper instead of lugging a laptop around, there may be even more delay before the numbers can be entered. For this reason, I let the user set the month, day, and time (to the last hour) when entering the data, with it defaulting to 'now'. However, since January 1st, all attempts to add new data failed, with an error stating that the last update was later than the date of the data being entered. I couldn't think of any obvious reason for that off-hand, and unfortunately didn't get a chance to look into it until yesterday.
To save another drop-down box on the ugly form, I assumed the year would be the current year, as calculated just before stuffing the data into the rrd. This is almost always true, and since my tests for this code were far from thorough, I never discovered when it wouldn't be true until too late. When the task was to be performed on New Years Eve, whoever did so must not have attempted to enter the data until after midnight. Of course, he set the date and time to whenever he recorded the data on December 31st ... but as the current year was calculated and set just before updating the rrd, the update was considered to be December 31st, 2008. The rrd is setup to keep a years worth of data points, with a certain period (with some leeway). Suddenly jumping ahead a year means that it will go back and update the last years worth of data points. Since none of them were truly entered, they all become undefined (or, really, NaN), and bam! all of the data recorded since the summer is lost. One silly mistake, that anyone (including myself) actually going back and reviewing for a few minutes probably would have caught, or that decent tests should have caught, manages to wipe out a good chunk of data (in this case, all of the data that existed).
Of course, I'm also responsible for the backup process of everything on the machine. In this case I use a script written by someone else (though tweaked a bit by me) that uses rsync. As it turns out, with the options sent to rsync for the backup process, rsync doesn't detect the file being changed after new data is added. Fortunately I had changed the archive size at some point in November, so the data from August until early November was in the backups. Since we probably only really care about the numbers in the warmer months, its not really a great loss in this case, but it is a great example of why 'development teams' (even of just two people) and code reviews are rather useful.