Monday, August 10, 2009

OpenID-Savvy Users Can Now Comment.

I have waited a while to allow people other than those I know to comment on this blog because of the simple fact that I don't blog much... until I realised that, well, this thought-dumping thing is in a one-way street.

Until now. I have decided to relax the restrictions a bit to allow OpenID registered users to post comments on tonzaThought. Now you can vent your spleen all you like... at me (but, please do it nicely!).

I am interested in hearing your comments on any articles I write. Don't want to be anti-social or anything, but hopefully, some feedback will help entice me to blog more.

Look forward to hearing from you... now that I let (some of) you in!



Sunday, August 09, 2009

When a S.M.A.R.T. Status Means Nothing.

You'd think a hard drive with S.M.A.R.T. (Self-Monitoring, Analysis, and Reporting Technology) would be able to reliably tell you whether it is about to die or not. Well, for a Hitachi IC25N040ATMR04-0 40 GB hard drive in a PowerBook G4, this is not the case. Disk Utility (a tool for managing hard disks in Mac OS X) produces a S.M.A.R.T. Status for this drive as "verified", but the hard disk itself suffers badly from I/O errors. In short, the drive is unreliable, and S.M.A.R.T. has proven itself to be too dumb to be useful.

And what does "verified" mean, exactly? Does it mean, "the hard drive has been verified to be good", or does it mean, "the status of the hard drive has been verified with some other component of the system", but doesn't actually tell you what this verified status is? Because in this case for this poor, little laptop, "verified" doesn't mean what I think it is intended to mean.

I am still pondering whether to replace this laptop's hard disk drive in this 6 year old machine. The machine is too slow for casual Internet use, thanks to the demands of Adobe Flash and the countless Web sites that use this antiquated, grossly inefficient technology, so perhaps replacing the laptop itself is a better option. And since this laptop can't run Mac OS 9 without the help of Classic, I can't use it for nostalgic reasons.

So, getting back to the topic, how useful is this S.M.A.R.T. indicator? Well... here's something for me to ponder on, what if it is actually correct in that nothing is going wrong with the hard disk drive? What if it is some other component in the laptop? Well, that would make replacing the hard drive pointless, because if this problem continues even with a new hard drive, then I would have just blown some $300.00 on nothing that solves the problem. A counter argument is that it must be the hard disk, because attaching an external hard disk drive to the laptop (thus making the laptop permanently chained to a desk) allows the laptop to run fine without any I/O problems to speak of. That says I should try replacing the machine's internal hard disk drive since that has problems and everything else about the machine doesn't.

Which doesn't make me trust S.M.A.R.T. indicators, since this one just lied to me.


Time Machine Event Store UUID Issue.

I've been looking long and hard on an issue which was causing me some grief in relation to backing up my computer system using Time Machine on Mac OS X 10.5 Leopard, and I think I have nailed it!

Stating the specifics of my predicament, I am not using Time Capsule to back up to... rather, I am using locally attached external hard drives. Also, I am backing up an external hard drive onto one of the external backup drives. This second point is important, because it is the one important factor for allowing the UUID Issue to appear in the first place—if all you're doing is backing up your computer's internal drive(s), this issue should never occur.

So what's the "UUID issue"? Well, it is when backupd(1) reports that "Event store UUIDs don't match for volume: <volume>". This rather minimally-descript error message is met with Time Machine starting a new backup cycle on the affected volume, backing up the entire volume again and wasting space on the backup drives. This affects the ability for Time Machine to back up your drives with a decent backup history, because in the best case scenario, Time Machine discards older backups to make room for the new, redundant backup; in the worst case scenario, Time Machine will complain that your backup disk is full, and ask that you attach yet another disk drive for it to use! Meanwhile, backups may fail due to lack of space on the backup drives Time Machine has assigned itself.

What I have found out as to the cause of this issue happening, is that if Time Machine starts a backup session whilst an external volume you have backed up previously is not mounted, this causes Time Machine to complain with this event store UUID mismatch. This in turn causes Time Machine not to trust the contents of the volume to be backed up, and start a new backup cycle for the volume.

I would have thought that Time Machine should scour the contents of the hard drive to determine if it is one that it has backed up before, but if you think about the fact that more than one HFS Plus volume having the same name can be mounted on the Finder desktop at the same time, this makes the event store UUID the only identifier that uniquely describes each and every volume that Time Machine backs up. Missing a backup schedule means an opportunity for a volume to be switched with another of the same name, and if Time Machine tries to back up that other volume to the backup disks as if it is the original volume, it could lead to a corrupted volume snapshot (ie., two volumes appear to have been destructively merged in the backup disks). This could make restoration of the affected volume unacceptable since data could have been lost during the backup. This is why event store UUIDs are used to track volumes which may have the same names, so that backups don't get clobbered like this!

So... why doesn't Time Machine trust its own event store UUIDs?! And, why aren't backups named by event store UUID, rather than by volume name?! If you look at a Time Machine backup disk's contents, the backup repository describes its backup sessions as follows:

  • volume name of the backup volume
  • Backups.backupdb
  • hostname of the machine doing the backup
  • backup date and time
  • volume names of the volumes backed up
  • contents of the backup

Now, you'd think that the volume names of the volumes backed up ought to be event store UUIDs! Well, they aren't, and it's probably the reason why Time Machine doesn't trust them when a backup series with any one volume breaks.

The solution to this problem is a bit tedious, but as far as I can tell, the only reliable one thus far:

  • ensure that all of your locally mountable volumes that Time Machine can back up are indeed mounted when you use your computer, so that when Time Machine schedules a backup, the volumes to be backed up are present, otherwise
  • if you cannot have a backed up volume mounted, disable Time Machine in the System Preferences, and only re-enable Time Machine when you can mount that volume again.


Labels: , , ,