Hadoop is the Southwest Airlines of storage technology. Like Southwest Airlines with its one type of aircraft, no-frills approach, offering a scalable low cost option, Hadoop has a basic store, sort and group capability, scales through standardization on commodity hardware, and offers a low-cost, scalable alternative to highly specialized data appliances.
Hadoop has multiple levels of redundancy (duplication of data) – so, everything written to a Hadoop cluster is copied to three different machines. If one machine goes down, the master node knows where else to go to get that data. Infrastructure guys can replace the failed machine and when it comes back up, Hadoop will copy the stuff over to the new machine and get it ready for future use.
Hadoop was designed for commodity hardware. If you end up investing in very costly hardware, you are defeating the purpose. You are paying more to have redundancy and then still more to make sure redundancy is not needed (due to higher hardware cost plus better hardware, which fails less).
Here’s another way to look at it. Let’s assume I was offered tires for my car, which were likely to fail 100 times less than my current ones, but cost five times more. Should I opt for them as compared to my current tires? For practical purposes: No. The reason being: I am still likely to carry a spare. Even with 100 times less failure, it’s not zero. So, if you are carrying a spare anyways, then why pay five times more if the likelihood that you will blow a second tire before you can get the first one fixed is very, very low? If I were to carry two spares, the probability goes down even further.
Given that a large number of nodes (machines) is important for Hadoop’s success for better parallelism, redundancy is designed into the architecture much more easily. If I have 20 machines to deal with, I just need to make sure that the same data is copied to at least three machines at any given time, so if one of those three fails, an end-user would not even know it, and I can replace that hardware in 12-24 hours.
Also, the purpose is very important to understand: Sticking with the tire analogy, if I’m buying tires for a mission critical function, say for a space shuttle, I have very different considerations then for my commuter car. We don’t need to run financial operations on Hadoop, just because traditional databases do a good job of this and are typically more effective in doing things like transaction processing. We also won’t run day-to-day operations on what comes out of Hadoop. While Hadoop is used for more critical decision-making, it doesn’t need the same level of system availability requirement as a core transactional system.
At a macro level, it is always about balance. Redundancy also has a cost. Going one extreme where mean time to failure (MTTF) is very low (e.g. very cheap hardware) there may be more work involved each time components are replaced or you may have to stick to higher redundancy. If you go to a very high cost and MTTF then redundancy is the cost you are carrying without any benefit.