Skip to content

Why check the digest of files you copy and what to do when they don’t match

June 25, 2008

I’m always copying data from home to work and less often from work to home. Mostly these are disk images. I always check the md5 sum just out of paranoia. It turns out you can’t be paranoid enough! The thing to remember if the check sums don’t match is not to copy the file again but use rsync. It will bring over just the blocks that are corrupt.

: FSS 43 $; scp . diskimage.fat.bz2    100% |*****************************|  1825 MB 11:10:31     : FSS 44 $; digest -a md5 diskimage.fat.bz2     674f69eec065da2b4d3da4bf45c7ae5f : FSS 45 $; ssh digest -a md5 /tank/tmp/diskimage.fat.bz2 191f26762d5b48e0010a575b54746e80 : FSS 46 $; ls -l diskimage.fat.bz2 -rw-r—–   1 cg13442  staff    1913779931 Jun 25 08:56 diskimage.fat.bz2 : FSS 47 $; rsync diskimage.fat.bz2             : FSS 48 $; digest -a md5 diskimage.fat.bz2                         191f26762d5b48e0010a575b54746e80 : FSS 49 $;  

Since my home directory is now on ZFS and I snapshot every time my card gets inserted into the Sun Ray I can now take a look at what went wrong. Using my zfs_versions script I can get a list of the different versions of the file from all the snapshots:

: FSS 56 $; digest -a md5 $( zfs_versions diskimage.fat.bz2 | nawk ‘{ print $NF }’) (/home/cg13442/.zfs/snapshot/user_snap_2008-06-25-05:51:57/diskimage.fat.bz2) = 0a193e0e80dbf83beabca12de09702a0 (/home/cg13442/.zfs/snapshot/user_snap_2008-06-25-05:54:44/diskimage.fat.bz2) = 7aa78dba6a7556fe10115aa5fc345bad (/home/cg13442/.zfs/snapshot/user_snap_2008-06-25-07:05:34/diskimage.fat.bz2) = c6a77429920f258dfca1dbbd5018a69c (/home/cg13442/.zfs/snapshot/user_snap_2008-06-25-09:06:39/diskimage.fat.bz2) = 674f69eec065da2b4d3da4bf45c7ae5f (/home/cg13442/.zfs/snapshot/user_snap_2008-06-25-09:38:22/diskimage.fat.bz2) = 191f26762d5b48e0010a575b54746e80 : FSS 57 $;

So the last two files in the list represent the corrupted file and the good file:

: FSS 57 $; cmp -l /home/cg13442/.zfs/snapshot/user_snap_2008-06-2> cmp -l /home/cg13442/.zfs/snapshot/user_snap_2008-06-25-09:06:39/diskimage.fat.bz2 /home/cg13442/.zfs/snapshot/user_snap_2008-06-25-09:38:22/diskimage.fat.bz2 | head -10                  84262913   0 360 84262914   0  14 84262915   0 237 84262916   0  25 84262917   0 342 84262918   0 304 84262919   0  41 84262920   0  12 84262921   0 372 84262922   0  20 : FSS 58 $;

and there appear to be blocks of zeros.

: FSS 58 $; cmp -l /home/cg13442/.zfs/snapshot/user_snap_2008-06-2> cmp -l /home/cg13442/.zfs/snapshot/user_snap_2008-06-25-09:06:39/diskimage.fat.bz2 /home/cg13442/.zfs/snapshot/user_snap_2008-06-25-09:38:22/diskimage.fat.bz2 | nawk ‘$2 != 0 { print $0 } $2 == 0 { count++ } END { printf("%x\n", count ) }’ 23d8c : FSS 58 $;

or at least 0x23d8c bytes were zero that should not have been. Need to see if I can reproduce this.

Anyway the moral is always check the md5 digest and if it is wrong use rsync to correct it.


From → Solaris

  1. Mike Smith permalink

    Any idea what caused the corruption?
    Is it likely to be at application rather than ZFS level?
    I take it there were no ZFS checksum errors reported.

  2. I suspect my home server. When I bought the system ECC memory was expensive so it is entirely possible that the memory could silently fail. Although I have run some tests and none of them throw anything up.
    Indeed there are no ZFS errors at either end which implies that the server was given bad data to put on the disk. Given that the pipe over which the data was copied was an ssh pipe it is hard to see how that could induce errors without seeing them.
    So I may never know but that home server memory is very concerning. How much is 8G of ECC memory? More than I have in my piggy bank.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: