Archive for January, 2008

Using tar and netcat to backup filesystems: A pitfall resulting in broken links and how to avoid it

January 16, 2008

As shown in yesterday’s blog entry, I failed to recover my Feisty laptop due to missing links. Remembering the old saying “Every fool may do a backup” I now managed to find my mistake.

The error may be shown using localhost, so here we go:

First I started the listening netcat like this:

cd /tmp
netcat -l -p 2342 | tar xvf -

Then I backuped /sbin using:

cd /
tar cf - sbin | netcat localhost 2342

Both processes do not stop, until you hit CTRL-C. And at this point I did a mistake: I hit CTRL-C on the receiving side (I saw no further files coming in). DO NOT DO THIS – IT WILL EVENTUALLY LEAVE YOU WITH BROKEN LINKS. E.g. instead of

root@lulu:/tmp# ls -l sbin/ip
lrwxrwxrwx 1 root root 7 2008-01-16 11:33 sbin/ip -> /bin/ip

you will have

root@lulu:/tmp# ls -l sbin/ip
---------- 1 root root 0 2008-01-16 11:28 sbin/ip

Let us dig a little bit deeper using strace. Again start the listening side and then the sending side. Do not hit CTRL-C. Looking at the process list you will see that the sending tar has finished, the receiving tar is still there. Strace this process and hit CTRL-C on the sending side. Check sbin/ip and you will find a link correctly pointing to /bin/ip.

This is the strace output:

root@lulu:/tmp# cat receiving-tar.strace
8394 read(0, "", 10240) = 0
8394 clock_gettime(CLOCK_REALTIME, {1200479622, 923188961}) = 0
8394 clock_gettime(CLOCK_REALTIME, {1200479622, 923263552}) = 0
8394 close(0) = 0
8394 SYS_299(0xffffff9c, 0x808e6a5, 0xbfd650fc, 0x808e6a5, 0xb7f6eff4) = 0
8394 chmod("sbin", 0755) = 0
8394 chown32("sbin", 0, 0) = 0
8394 lstat64("sbin/lsmod", {st_mode=S_IFREG, st_size=0, ...}) = 0
8394 unlink("sbin/lsmod") = 0
8394 symlink("/bin/lsmod", "sbin/lsmod") = 0
8394 lchown32("sbin/lsmod", 0, 0) = 0
8394 lstat64("sbin/ip", {st_mode=S_IFREG, st_size=0, ...}) = 0
8394 unlink("sbin/ip") = 0
8394 symlink("/bin/ip", "sbin/ip") = 0
8394 lchown32("sbin/ip", 0, 0) = 0
8394 close(1) = 0
8394 munmap(0xb7cff000, 4096) = 0
8394 exit_group(0) = ?
root@lulu:/tmp#

Obviously tar handles the symbolic links not in the moment as they appear but in some final procedure. When I hitted CTRL-C on the receiving side I prevented tar to run that final procedure as this strace clearly shows:

root@lulu:/tmp# cat receiving-tar-ctrl-c
8495 read(0, "", 10240) = 0
8495 --- SIGINT (Interrupt) @ 0 (0) ---
root@lulu:/tmp#

So you really must hit CTRL-C only on the sending side, interrupting netcat, which is safe as the sending tar already has gone.

Even better: Use netcat’s q option for the sending netcat, e.g.

Receiving process:

root@lulu:/tmp# netcat -l -p 2342 | tar xvf -

Sending side:

root@lulu:/# tar cf - sbin | netcat -q 2 localhost 2342
root@lulu:/#

With -q 2, netcat stop go away 2 secs after detecting EOF on stdin. The receiving netcat then will go away too.

So my conclusion:

Always use tar + netcat this way:

Receiving side:

netcat -l -p 2342 | tar xvf –

Sending side:

tar cf – whatever-you-want-to-backup | netcat -q 2 localhost 2342

Never omit -q !

Update:

Just a hint for those using e.g. SuSE: Though netcat version is 1.10 (same as in Feisty) option -q ist not available (there are others missing too). So in SuSE you must hit CTRL-C on the right side …

Advertisements

Hard disk failure: Additional sense: Unrecovered read error – auto reallocate failed

January 15, 2008

Last week my laptop suddenly refused to boot any longer. Having configured nothing as root for quite some time, I read:

grub: Error 29: Disk write error

After restarting I got this error again. So I booted my Feisty cdrom and started to inspect the system. There was no obvious error in the first place. Being in a networked environment I decided to save my $HOME using netcat. This worked fine. So in the next step I backuped the root file system. This too worked fine. I finally tried to save another partition containing data of a virtual machine I use with vmplayer.

This failed spitting out a lot of driver error messages, e.g.

Additional sense: Unrecovered read error - auto reallocate failed

But the tar came back finally, saying:

root@ubuntu:/# tar cf - mnt | netcat cheetah 2342
tar: mnt/now2/now2-s004.vmdk: File shrank by 944635904 bytes; padding with zeros

To make a long story short:

After replacing the disk, restoring the filesystems, fixing grub and /etc/fstab my system came up – but with a lot of error messages. I could log in and after a while I found that all links like this one in /sbin were broken:

lrwxrwxrwx 1 root root 7 2008-01-15 14:30 ip -> /bin/ip

Instead I found:

---------- 1 root root 0 2008-01-15 14:50 ip

I fixed a lot of missing links in /sbin, /bin and all of the run level scripts manually. But finally I gave up because I did not find a method to fix the links in a really reliable way (apt, dpkg do not seem to help here – no way to rerun postinstall scripts IMHO).

So finally I reinstalled my laptop. Fortunately all data in $HOME was fine, so I was up and running again rather soon.

Nevertheless: Why failed my backup procedure ?

Solved: See next blog entry