Ugh...another dying 120

edited February 2004 in Folding@Home
I'm going to be taking a hit on my folding over the course of the next few days, my primary drive (the remaining half of my 2-120gig array) is dying and I'll have to ghost it to my 40gig maxtor and reorganize all my files then yank the 120 out for rma.
I credit the discovery of the troubles to SpeedFan with it's built in s.m.a.r.t. monitoring as it alerted me to the problem which WD Diag later verified.
I called WD and spoke to them about it and gave them a rundown of the WD Diag error codes and got an rma set up. As they had sent a 160gig in replacement for the first 120 they are going to send another to replace this one so that I'll have a good basis for an array when the new one arrives...very cool on their part I must say.
While I've got my system apart for the hdd removal I'm also going to be adding a 4-dial rheobus and the C-Box for my Chaintech and pulling out my CyrstalFontz 634-usb display which I'll make an enclosure for to facilitate desktop placement.
I'm begining to wonder what the issue is with the 120gig WD sata drives as I've heard of more than a few dying.

Comments

  • Straight_ManStraight_Man Geeky, in my own way Naples, FL Icrontian
    edited February 2004
    If they are all SMART codes, probably a media coating problem on a batch of them, or the read\write arms were not up to par. At a guess, media coating. Could be controller circuits contributed, but more likely the media coatings on the platters had impurities when made. SMART tracks mostly bad clusters, and allows drive to compensate for a certain percentage of media area being bad by itself. The platters that hold data are typically metal core, and the data is mag encoded in magnetic sensitive chemical in a sandwich of chemical layers. If the layer that is mag sensitive is at all uneven or is too thin, it cannot hold the values written well (weak field charge results due to not enough material to be charged right)-- and if not enough material in thickness or in concentration is in the checmical for the mag layer, you get bad platters. Weak fields progressively show up as more and more clusters being badmapped when SMART has to rewrite data elsewhere to get a write-verify. You could think of SMART as a Soft Media Auto Read Test process, I do not know what acronym meaning they use, but smart works by confirming a good write with a read. If the value written does not cross-check, the drive uses another area reserved for that to write data to. When drive control circuitry runs out of spare reserve space, it gens errors that the diags pick up.

    Stepper motors that are not stepping right contribute to possibility, and rotational motors that are undervoltaged can cause this also. Given that the 160's appear to be working, I suspect that the media was not applied right on the platters on some 120's, as a batch of platters was made.

    Note that the drives do not necessarily need to check every byte or bit(doing so would slow down drive a bunch as it would be making a double pass over every part of area newly written), they can do a confirm of one out of 64 or 128 bytes and if error the whole cluster gets rewritten-- SMART sample checks writes with reads. Typical reserve is 1-2% of platter, over that WD thinks the drive is unreliable and they would rather replace drive than have it in the field and take the reputational\goodwill hit on their brand name. Basicly, the 160's are more debugged, they learned from the 120's problem what was common. But, to keep costs reasonable, they only want to replace proven bad drives, so they use SMARt to log problems also by tracking how much of reserve actually gets used. When the amount increases a lot or reaches a certain percentage used, the drive starts sending SMART errors.

    Bascily, there is no such thing as a perfect drive, the older utils used to use a standard of 100 consecutive clusters to mark a drive as bad. You might also note, that typically WD does not rate a drive to full platter capacity, they rate it at what platters can hold less the reserve. The drives are actually OVER capacity rated, to allow for reserve. Thus, rating is higher than shown, by a bit, but user cannot access extra area-- only SMART can.

    One good thing about WD, they do handle properly reported RMAs right.

    John D.
Sign In or Register to comment.