Tuesday, May 22, 2012

Data Recovery using ddrescue

Disclaimer: although the following post describes a successful recovery process, I don't accept any responsibility for any data loss that occurs from trying to follow these instructions!

A colleague of mine came to me with an external hard disk recently, complaining about missing photos (as well as some other files). The file system in question had a "photos1" directory in the root of the drive, where all the pictures had been meticulously organised into sub-directories. However, it was no longer possible to navigate into the directory; each attempt resulted in a long wait followed by an error message hinting at I/O errors.

Immediately after connecting the drive to my laptop, I was greeted with Fedora's/Gnome's "imminent disk failure" warning (like the title image). This confirmed my suspicions that there was a physical problem with the disk, as well as the following messages in the system log:

May 18 09:43:45 alpha kernel: [233314.587708] sd 47:0:0:0: [sdd] Unhandled sense code
May 18 09:43:45 alpha kernel: [233314.587713] sd 47:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
May 18 09:43:45 alpha kernel: [233314.587717] sd 47:0:0:0: [sdd] Sense Key : Medium Error [current]
May 18 09:43:45 alpha kernel: [233314.587721] sd 47:0:0:0: [sdd] Add. Sense: Unrecovered read error
May 18 09:43:45 alpha kernel: [233314.587726] sd 47:0:0:0: [sdd] CDB: Read(10): 28 00 0d 02 f1 80 00 00 08 00
May 18 09:43:45 alpha kernel: [233314.587734] end_request: I/O error, dev sdd, sector 218296704
May 18 09:43:45 alpha kernel: [233314.587741] Buffer I/O error on device sdd1, logical block 27286832
May 18 09:43:47 alpha kernel: [233317.241426] sd 47:0:0:0: [sdd] Unhandled sense code
May 18 09:43:47 alpha kernel: [233317.241434] sd 47:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
May 18 09:43:47 alpha kernel: [233317.241438] sd 47:0:0:0: [sdd] Sense Key : Medium Error [current]
May 18 09:43:47 alpha kernel: [233317.241442] sd 47:0:0:0: [sdd] Add. Sense: Unrecovered read error
May 18 09:43:47 alpha kernel: [233317.241446] sd 47:0:0:0: [sdd] CDB: Read(10): 28 00 0d 02 f1 80 00 00 08 00
May 18 09:43:47 alpha kernel: [233317.241454] end_request: I/O error, dev sdd, sector 218296704
May 18 09:43:47 alpha kernel: [233317.241459] Buffer I/O error on device sdd1, logical block 27286832
May 18 09:43:47 alpha ntfs-3g[16779]: ntfs_attr_pread_i: ntfs_pread failed: Input/output error
May 18 09:43:47 alpha ntfs-3g[16779]: Failed to read index block: Input/output error

I took the time to explain to my colleague that the data was most likely retrievable; just because there was a problem accessing the top-level "photos1" directory didn't necessarily mean there would be issues with the directories and files contained within. I mentioned that I myself had faced a similar problem a couple of years back and that I ended up using a professional data recovery firm, Data Recovery Direct. Time had not been on my side in that case; I ending up paying a premium for the company to expedite the recovery process, but it was totally worth it; by the next day I had access to the files I needed off the drive. Incidentally, I think Data Recovery Direct are one of the only companies that I would recommend to others requiring such services, while simultaneously hoping I never have to use them myself again!

Once I had explained the potential cost of recovery to my colleague, he enquired as to whether there was anything I would be able to do instead. I informed him that I could attempt to recover as much data from the disk as possible, but there was a chance I could render the drive completely useless! He accepted the risks and with his consent, I proceeded.

I opted to use an open source recovery tool: ddrescue. This tool attempts to make an image of the disk, partition or file in question by making multiple passes over the source data. It achieves this by writing out a log file so that you are able to perform further passes and/or continue if process is interrupted. Once I had the damaged drive connected to a machine (without mounting the file system) I followed this process:

  1. Performed an initial scan across the surface of the disk, ignoring bad, unreadable sectors:
    ddrescue -f -n /dev/sdd1 /mnt/external_storage/tmp/hdd.img /mnt/external_storage/tmp/hdd.log

    This took a long time to complete (a day or two) and resulted in a file the size of the partition.

  2. Began a scan that concentrated on the bad sectors, forcing 3 attempts at reading the data:
    ddrescue -d -f -r3 /dev/sdd1 /mnt/external_storage/tmp/hdd.img /mnt/external_storage/hdd.log

    Because of the existing log file, ddrescue is able to ignore sectors that it has already retrieved data from and target the damaged areas of the partition. Despite this, however, this command ended up running for several weeks. Towards the end, the "rescued data" count was only increasing by a byte or two a day(!); this was when we decided to kill the process and continue with the next steps in the recovery.

  3. Created a new NTFS partition on another (larger capacity) HDD and wrote the recovered image out to it:
    dd if=/mnt/external_storage/tmp/hdd.img of=/dev/sde1

  4. Attached the drive to a machine running Windows and fixed any outstanding file system errors:
    chkdsk /F G:

After completing the above steps, I was able to navigate into the previously inaccessible "photos1" directory, much to the joy of my colleague!