Giles Habibula wrote:Wow Neal, that was beautiful!
Seriously, where else would I be priveleged enough to get such a detailed explanation for my mundane problem? My girlfriend is sitting here very impressed and telling me that this goes way beyond normal customer service, and you're not even getting paid for this. Yet another reason I'm proud to call OO my home.
And where else can I share a tiny smidge of my useless knowledge and find someone not only appreciative of it, but actually interested to boot!?! I love this place. And responses like yours make me more likely to share answers (like the one above) again in the future. Thanks!
I imagine the 5 year life expectancy you mentioned has more to do with average use rather than actual time? I used those drives pretty heavily for 6 or 7 years, but now they get used only several times a year. Or maybe it's because they have a larger margin for error since the tolerances might have been more lenient back in the days when not so much was crammed into the same area? I hope that made sense...? I'm just purely guessing at all this, but I'd be curious as to what you think.
A bit of both, actually, but mostly based on use. Even if the drive is unpowered but in your PC, however, it's still undergoing thermal cycles and that's still a huge stressor on the mechanics. And if your drive is plugged in to your power supply, you're still spinning it up (at least initially upon powering up your machine). We test some insane number of CSS (Contact Start-Stop -- basically the act of powering up the specific types of drives that were typically built in that time frame; CSS drives aren't quite all the rage any more as they've got competition in the "ramp-load" drives, which spin up the disks, and then merge the heads, meaning the heads aren't resting on a spinning surface when the drive first starts to spin up) cycles on a zillion drives before they go out the door. I think that if you had a CSS event five times a day, every day of the year, for five years -- that ends up close to 10,000 start/stop events. I believe we test 100,000 CSSs (without a failure) before we're satisfied that a drive is good to go in that department. And much like your automobile, starting up a drive is the worst thing you normally do to it (i.e., the biggest mechanical stressor).
You make a great point about drives of six years ago having more lenient tolerances. Moore's law as its applied to hard drives, has shown areal density growth rates of about 90% per twelve months over that span. (I googled up a wacky IBM white paper with
this graph showing areal storage density values in megabits per square inch.) So running the numbers, you see that we're packing the same amount of data into a size rougly 47 times smaller than we used to be when your drive was built. To be able to position our magnetic readers/writers that much more precisely, we do have to improve our tolerances within the drive by some amount (although it's significantly less than the amount you'd be lead to believe with that math -- a whole lot of the improvements that we make from generation to generation are to find non-mechanical innovations to allow more precise positioning. But clearly, I wouldn't have a job at all if there wasn't significant work being done in mechanical improvements themselves).
So it's very true to say that we're putting in more precise mechanics in drives nowadays, but does more precision necessarily indicate less robustness to enviornmental hazards? Well, yes and no. Drives built today operate with fluid-bearing motors (as opposed to your ball-bearing drie), as the vibration signature due to rotating imbalances is much easier to deal with on a fluid-bearing drive (not to mention the complete elimination of ball-bearing defects in the vibration profile). And FDB (Fluid-Dynamic Bearing) motors right now *are* a bit more finnicky than their BB (Ball-Bearing) counterparts, but that's more a function of the newness of the technology then it is something inherent to the technology itself. Hard drive designers had decades of experience with BB motors and basically had fine-tuned all the "gotchas" out of the system. We've got all of about four years experience with FDB motors, and while the headaches of that first generation of implementation is not something I'd ever want to work my way through again, I'm pretty confident that we've got a motor system that's about as viable as the old BB system, reliability-wise.
Just about every hard drive product has at least one piece of new mechanical technology implemented into it. And every first-generation technology has its own range of "gotchas" that need be found by folks like me. So in that sense, dealing with the tighter requirements of the product (the demand that we make higher and higher capacity hard drives leading to more and more data being squished into a smaller and smaller area), means some inherent risk with new products. If you opened up the image I linked above, you can see that the areal density growth rate has an "elbow" around 1993, when MR-heads were released. Up to that point, drives did *not* need as much new technology to achieve workable performance, so they were a bit more inherently stable.
Generally the introduction of new mechanics doesn't lead to a battle between "can we make it work" vs. "reliability," but the trade offs are generally "can we make it work" vs. "price." Few of our technologies, once they've matured, make the drive inherently more unreliable (although I'm working on a notable exception right now -- Dual-Stage Actuation). Generally they do add some cost to the price of a drive, and if a company builds 60,000,000 hard drives in a year, adding even a nickle's worth of cost to the price of building a drive adds up quickly. Technology introduction meetings are hilarious (or would be to an uneducated observer) as frequently people are screaming at each other over figures of "Three freaking cents!! Do you really think we can JUSTIFY adding three cents to the BOM?!?! What are you nuts?" (The BOM being the Bill of Materials, or the price of the components that are all stuck together in manufacturing the drive.)
So while certain technology improvements do add some complexity to the inner workings of hard drives (and complexity generally means bad juju for reliability), typically our mechanical improvements are more side-steps in the complexity realm, and many of the generational improvements we make are not mechanical in nature to begin with. Because there generally is some new mechanical technology in every product release, it's more a function of "rolling the dice" on any given product with regards to overall reliability. Of course, as an engineer, I'm doing everything in my power to determine the outcome of that dice-roll before we start shipping drives to customers, so the smarter I am about setting up relevant tests, the better our understanding of the outcome of that dice-roll is ahead of time. But if we test the wrong things or don't properly scale our tests to customer enviornments or do a bad job with statistics or don't have representative parts to test (say our suppliers send us "the good material" up front, then start to backslide when we're building production drives) or any other number of factors, those things can lead to a product being released to the public with a failure rate very different than what the manufacturer predicts it to be.
Search up the IBM 75 GXP "DeathStar" for a very real-world example.
~Neal