Diagnosing external HD errors under Linux
I've got a 3TB Seagate USB 3.0 (connected to 2.0) hard drive at work that's giving me issues. The OS is CentOS 6.3, and the drive is formatted ext4.
Last week, I discovered the GUI unresponsive because of hundreds of notification popups and Nautilus windows. Best I can tell, the drive was spontaneously disconnecting and reconnecting. Every disconnect was a warning popup, and every reconnect was a Nautilus window for the drive.
After plugging in the drive, it'll pop up with this after a minute or two:
About 30-60 seconds after that, it mounts. While it is mounted, any time I try to copy from the drive, the transfer fails with an I/O error after about 100MB. I swapped out the USB cable, but same behavior continues.
I'm guessing the drive is dying, but since I'm not familiar with the meaning of the error logs, I wanted to check with you guys to be sure. The dmesg log is below. It covers everything from mount to the errors when trying to copy a file.
$ dmesg | tail -n 61 usb 2-6: new high speed USB device number 45 using ehci_hcd usb 2-6: New USB device found, idVendor=0bc2, idProduct=a0a4 usb 2-6: New USB device strings: Mfr=2, Product=3, SerialNumber=1 usb 2-6: Product: Backup+ Desk usb 2-6: Manufacturer: Seagate usb 2-6: SerialNumber: NA5K4VZZ usb 2-6: configuration #1 chosen from 1 choice scsi686 : SCSI emulation for USB Mass Storage devices usb-storage: device found at 45 usb-storage: waiting for device to settle before scanning usb-storage: device scan complete scsi 686:0:0:0: Direct-Access Seagate Backup+ Desk 0508 PQ: 0 ANSI: 6 sd 686:0:0:0: Attached scsi generic sg4 type 0 sd 686:0:0:0: [sdc] Spinning up disk...................................ready sd 686:0:0:0: [sdc] 732566645 4096-byte logical blocks: (3.00 TB/2.72 TiB) sd 686:0:0:0: [sdc] Write Protect is off sd 686:0:0:0: [sdc] Mode Sense: 4f 00 00 00 sd 686:0:0:0: [sdc] Assuming drive cache: write through sd 686:0:0:0: [sdc] 732566645 4096-byte logical blocks: (3.00 TB/2.72 TiB) sd 686:0:0:0: [sdc] Assuming drive cache: write through sdc: unknown partition table sd 686:0:0:0: [sdc] 732566645 4096-byte logical blocks: (3.00 TB/2.72 TiB) sd 686:0:0:0: [sdc] Assuming drive cache: write through sd 686:0:0:0: [sdc] Attached SCSI disk usb 2-6: reset high speed USB device number 45 using ehci_hcd EXT4-fs (sdc): warning: maximal mount count reached, running e2fsck is recommended EXT4-fs (sdc): mounted filesystem with ordered data mode. Opts: SELinux: initialized (dev sdc, type ext4), uses xattr usb 2-6: reset high speed USB device number 45 using ehci_hcd lo: Disabled Privacy Extensions SELinux: initialized (dev proc, type proc), uses genfs_contexts sd 686:0:0:0: [sdc] Unhandled sense code sd 686:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE sd 686:0:0:0: [sdc] Sense Key : Medium Error [current] sd 686:0:0:0: [sdc] Add. Sense: Unrecovered read error sd 686:0:0:0: [sdc] CDB: Read(10): 28 00 06 5e e0 60 00 00 1e 00 sd 686:0:0:0: [sdc] Unhandled sense code sd 686:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE sd 686:0:0:0: [sdc] Sense Key : Medium Error [current] sd 686:0:0:0: [sdc] Add. Sense: Unrecovered read error sd 686:0:0:0: [sdc] CDB: Read(10): 28 00 06 5e e0 7e 00 00 02 00 sd 686:0:0:0: [sdc] Unhandled sense code sd 686:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE sd 686:0:0:0: [sdc] Sense Key : Medium Error [current] sd 686:0:0:0: [sdc] Add. Sense: Unrecovered read error sd 686:0:0:0: [sdc] CDB: Read(10): 28 00 06 5e e0 80 00 00 1e 00 sd 686:0:0:0: [sdc] Unhandled sense code sd 686:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE sd 686:0:0:0: [sdc] Sense Key : Medium Error [current] sd 686:0:0:0: [sdc] Add. Sense: Unrecovered read error sd 686:0:0:0: [sdc] CDB: Read(10): 28 00 06 5e e0 9e 00 00 02 00 sd 686:0:0:0: [sdc] Unhandled sense code sd 686:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE sd 686:0:0:0: [sdc] Sense Key : Medium Error [current] sd 686:0:0:0: [sdc] Add. Sense: Unrecovered read error sd 686:0:0:0: [sdc] CDB: Read(10): 28 00 06 5e e0 62 00 00 01 00 sd 686:0:0:0: [sdc] Unhandled sense code sd 686:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE sd 686:0:0:0: [sdc] Sense Key : Medium Error [current] sd 686:0:0:0: [sdc] Add. Sense: Unrecovered read error sd 686:0:0:0: [sdc] CDB: Read(10): 28 00 06 5e e0 62 00 00 01 00
I ran e2fsck and then chickened out allowing rewrites since I didn't know if that would help or hurt anything if this is a hardware problem.
$ sudo e2fsck /dev/sdc e2fsck 1.41.12 (17-May-2010) extBU has been mounted 240 times without being checked, check forced. Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Error reading block 99614726 (Attempt to read block from filesystem resulted in short read) while reading inode and block bitmaps. Ignore error [y]? yes Force rewrite [y]? extBU: e2fsck canceled.
Comments
@Lincoln: what's up with that second block that's keeping it from being formatted as code?
From random research, that appears to be a potential for failing hardware. You can try to force mount a dirty filesystem with this command (make sure to unmount the device first). This assumes you don't have a folder called /media/external already created and used and only one partition on the external HDD. It is also possible that the controller is failing in the external mount rather than the actual drive (unlikely, but possible), so if you are unable to get results using this method, you can try removing the drive from it's casing.
sudo umount /dev/sdc1 mkdir /media/external sudo mount ext3 /dev/sdc1 /media/external -o force
You may then be able to browse the drive by navigating to /media/external. This won't fix the filesystem issues if that is all that is occurring though. Hopefully you don't have stuff stored in only one place!
edit:added code tags and missed the most important part of the second command cause I'm a bad.
@Gargoyle the double line break in the middle overrode the code tags with paragraph tags.
Short reads are typically a sign of bad hardware. If you have enough drive space, I would suggest using ddrescue to make a disk image and then trying to recover your data from the image.
http://www.technibble.com/guide-using-ddrescue-recover-data/
The 10000ft overview of recovering the data:
Yep yep yep
I think I thought about that and then failed to actually mention it.
< isbad.jpg
Thanks guys! I'd like to try using ddrescue, but the external drive is bigger than any internal drive partitions. Do you know if there's a way to get it to write a file smaller than the drive size?
From my searches so far, it looks like no, since it doesn't look at the file system and doesn't know if a block is used or not. I think I can resize the partition first and then run ddrescue off the smaller partition, but I'm afraid the resize will fail and toast the drive.
So before I try that, I'm wondering if there's any other fault-tolerant way of trying to copy files off the disk. All it has on it is two backup zips made every week (although they are 11gb and 3gb in size). If I could get just one pair of backup files off of it before attempting something riskier or RMAing the drive, that would be ideal.
If anyone wants to answer the above, feel free, but I've got an alternative figured out now. I'm going to borrow another external drive from IT to make a fresh backup, bid farewell to the old backups, and try to RMA this thing.
Nope, ddrescue is a very low level tool. It literally attempts to read the disk bit by bit and make an exact bitwise copy of the drive. It doesn't actually attempt to interpret the data, just extract it to a file (or another drive). When it encounters a bad spot on the drive, it does it's best to read it, marks it as bad if it cannot read it cleanly then moves on. Once it has completed the drive, it goes back to bad spots and tries over and over to read them as best as it can until it reaches it's maximum tries (or the drive fails totally, or the user stops it). It is very good for saving data from a failing drive, but it's not meant to be smart insofar as it does not differentiate between used and unused blocks.
Thanks Ardi!
Making backups to a new HD now. I think I'll start keeping a copy out on the network, in case the new one dies, too.