Archive for the ‘Backup’ Category

It is not a surprise now that SSD or Flash drives are now mainstream. The rotating hard drive is still around, but for specific use cases… mostly in “Cheap and Deep” storage, video streaming and archiving purposes. Even with 10TB densities, these drives are destined for the junk yard at some point.

Flash storage is approaching 32TB+ later this year and the cost is coming down fast. Do you remember when a 200GB flash drive was about 30k? It wasn’t that long ago. But with flash storage growing so quickly, what does that mean for performance? Durability? Manageability?

These are real-world challenges as flash storage vendors are just concerned with making them bigger, we as consumers cannot just assume that they are all the same. They are not… As the drives get bigger, the performance of flash drives start to level out. The good thing for us is that when we design storage solutions, the bottleneck is no longer in the SAN with flash. So we too can take the emphasis off of performance. The Performance conversation has now become the “uncool” thing. Nobody wants to have that conversation anymore. The bottleneck has now shifted to the application, the people, the process. That’s right! The bottleneck now is with the business. With networking at 10Gb/40Gb and servers so dense and powerful, this allows the business to finally focus on things that matter to the business. This is the reason we see such a big shift into the cloud, app/application development and IoT. Flash is the enabler for businesses to FINALLY start to focus on business and not infrastructure.

So, back to the technical discussion here…

Durability is less of an issue with large flash drives because of the abundant amount of cells available for writes and re-writes. The predictability of drive failures mitigates the need for management of the common unstable legacy storage.

Manageability is easier with SDS (software defined storage) and Hyper-converged systems. These systems can handle faults much better through the distributed design and the software’s ability to be elastic, thus achieving uptime that exceeds 5 nines.

So as flash storage grows, it becomes less exciting. Flash is paving the way to a new kind of storage,the NVMe.

“Penny Wise and Pound Foolish” is one of my favorite lines. My top personal motto in life is, “It’s the cheap man that pays the most”. Time and time again, I have seen so many people opting to save a few mere “pennies” while paying a monumental price in the long term. Whatever it is and whatever the reason, doing due diligence and going the extra mile for good and objective research can help in making wise choices, particularly when it comes to technology. Technology changes so fast and sometimes nullifying current technologies, it is important to understand the “how” and the “why”, not just the “here” and “now”. Today, I am going to talk about a topic that has been talked about over and over again, “Data Management” with the current state of technology.

The New Data Management Landscape
Although I have been writing about this topic for many years, this blog addresses some of the new challenges most IT managers face. With the proliferation of data from IoT, Big Data, Data Warehousing, etc. in relation to data security and the dynamic implications of governance and compliance, there are far too many variables in play to effectively and efficiently “Manage” data. Implementing a business mandate can have far-reaching implications on data management. Questions like; What is the balance between storing data vs securing data? What is the cost involved going too far one way vs. the other respectively? How can I communicate to management these implications and cost factors?

False Security: The Bits and Bytes vs. ROIs and TCOs

One of the biggest challenges in IT is communicating to the people who “signs the checks” the need to spend money (in most cases, more money) for technology. The need to spend the money to effectively and successfully implement the mandates put forth by management. Unfortunately, this is not an easy task, only mastered by a few in the industry, often highly regarded professionals and living in the consulting field. Guys who are good at understanding the “Bits and Bytes” are usually illiterate at the business side of things. The business side understands the “Return of Investments” (ROI) and “Total Cost of Ownership” (TCO) language and couldn’t care less what a “Bit or Byte” is. The end result: many systems out there are poorly managed and their management having no idea about it. The disconnect is real and companies do business as usual everyday until a crisis arise.
IT managers, directors and CIOs/CTOs need to be acutely aware of the current systems and technologies at the same time remain on the cutting edge of new technologies to perform the current, day-to-day operations as well as supporting all of the new business initiatives. The companies that do well to mitigate this gap are the ones that have good IT management and good communications with the business side of the company. This is also directly related to how much is spent on IT. It is a costly infrastructure, but these are the systems that can meet the demands of management and compliance.

The IT Tightrope
Understanding current and new technologies is key to an effective data management strategy. Money spent on technology may be rendered unusable or worse, hinders the use of new technology needed to meet the demands of business. It is a constant balancing act because a solution today can be tomorrow’s problem.

Data Deduplication
Data deduplication is a mature technology and has been an effective way to tame the data beast. It is basically, in a nutshell, an algorithm that scans data for duplication. So when it sees duplication in data, it will not re-write that data, but put a metadata in its place. In other word, the metadata is basically saying, “I got this data over there, so don’t rewrite it”. This happens over the entire volume(s) and is a great way to save storage capacity. But with data security at the top of most company’s minds, data encryption is the weapon of choice today. Even if it is not, compliance mandates from governing agencies for forcing the hand to implement encryption. But how does encryption impact data management? Encryption is basically taking data and randomizing it with an encryption key. Data deduplication is made more difficult with encryption. This is a high-level generalization and there are solutions out there, but considerations must be made when making encryption decisions. Additionally, encryption adds complexity to data management. Without proper management of encryption keys can render data unusable.

Data Compression
Back in the day of DOS, Norton Utilities had a great toolbox of utilities. One of them was to compress data. I personally did not do it as it was risky at best. It wasn’t a chance I wanted to take. Beside, my data was copied to either 5.25” or 3.5” floppies. Later, Windows came in with compression on volumes. Not one to venture into that territory. I had enough challenges just to run Windows normally. I have heard and seen horror stories with compressed volumes. From unrecoverable data to sluggish and unpredictable performance. The word on the street was that it just wasn’t fully baked feature and was a “use at own risk” kinda tool. Backup software also offered compression for backups, but backups were hard enough to do without compression, adding compression to backups just wasn’t done… period.
Aside from using the PKZip application, compression had a bad rap until hardware compression. Hardware compression is basically an offload of the compression process to a dedicated embedded chipset. This was magic because there was no resource cost to the host CPU. Similar to high-end gaming video cards. These video cards are (GPU) Graphics Processing Units to offload the high-definition and extreme texture renderings at high refresh rates. Hardware compression became mainstream. Compression technology went mostly unnoticed until recently. Compression is cool again and made popular as a feature of data on SSDs. Some IT directors that I talked to drank the “Cool-Aid” on compression for SSDs. It only made sense when SSDs were small in capacity and expensive. Now that SSDs are breaking the 10TB per drive mark and cheaper per GB than spinning disk. Compression on SSDs are not so cool anymore. It’s going the way of the “mullet” hairstyle, and we all know where that went… nowhere. Adding compression to SSDs is another layer of complexity that can be removed. Better yet, don’t buy into the gimmick for compression on SSD’s, rather look at the overall merits of the system and the support of the company that is offering the storage. What good is a storage system if the company is not going to be around?

What is your data worth?
With so many breaches in security happening, seemingly every other week, it is alarming to me that we still want to use computers for anything. Some of the businesses affected by hackers I am a customer of. I have about 3 different complimentary subscriptions to fraud prevention services because of these breaches. I just read an article of a medical facility in California was hit with ransomware. With the demands of payment in the millions via bitcoin, the business went back to pen and paper to operate. What’s your data worth to you? With all of these advances in data storage management and the ever changing requirements from legal and governing agencies, an intimate knowledge of the business and data infrastructure is required to properly manage data. Not just to keep the data and to protect it from hackers but also from natural disasters.

Americans are fascinated by brands. Brand loyalty is big especially when “status” is tied to a brand. When I was in high school back in the 80s, my friends (and I) would work diligently to save our paychecks to buy the “Guess” jeans, “Zodiac” shoes and “Ton Sur Ton” shirts because that was the “cool” look. I put in many hours of working the stockroom at the supermarket and delivering legal documents as a messenger. In 1989, Toyota and Nissan entered into the luxury branding as well with Lexus and Infinity respectively after the success of Honda’s upscale luxury performance brand, Acura which started in 1986. Aside from the marketing of brands, how much value (aside from the status) does a premium brand bring? Would I buy a $60,000 Korean Hyundai Genesis over the comparable BMW 5 Series?

For most consumers in the Enterprise Computing space, brand loyalty was a big thing. IBM and EMC lead the way in the datacenter for many years. The motto, “You’ll never get fired for buying IBM” was the perception. As you may have heard the saying, “Perception is Reality” rang true for many CTOs and CIOs. But with the economy ever tightening and IT as an “expense” line item for businesses, brand loyalty had to take a back seat. Technology startups with innovative and disruptive products paved the way to looking beyond the brand.

I recently read an article about hard drive reliability published by a cloud storage company called BackBlaze. The company is a major player in safeguarding user data and touts over 100 petabytes of data with over 34,880 disk drives utilized. That’s a lot of drives. With that many drives in production it is quite easy to track reliability of the drives by brand and that’s exactly what they did. The article can be found in the link below.

https://www.backblaze.com/blog/hard-drive-reliability-update-september-2014/

BackBlaze had done an earlier study back in January of 2014 and this article contained updated information on the brand reliability trends. Not surprising, but the reliability data remained relatively the same. What the article did pointed out was that the Seagate 3TB drives were failing more from 9% – 15% and the Western Digital 3TB drives jumped from 4%-7%.

Hard Drive Failure Rates by Model

Company or “branding” plays a role as well (at least with hard drives). Popular brands like Seagate and Western Digital paves the way. They own the low end hard drive space and sell lots of drives. Hitachi is more expensive and sells relatively less drives than Seagate. While Seagate and Western Digital may be more popular, the hard drive manufacturing / assembly and sourcing of the parts are an important part of the process. While some hard drive manufacturers market their products to the masses, some manufacturers market their products for the niche. The product manufacturing costs and processes will vary from vendor to vendor. Some vendors may cut costs by assembling drives where labor is cheapest or some may manufacture drives in unfavorable climate conditions. These are just some factors that come into play that can reduce the MTBF (Mean Time Before Failure) rating of a drive. While brand loyalty with hard drives may lean towards Seagate and Western Digital, popularity here does not always translate into reliability. I personally like Hitachi drives more as I have had better longevity with them over Seagate, Western Digital, Maxtor, IBM and Micropolis.

I remember using Seagate RLL hard drives in the 90s and yes with failed hard drives also, but to be fair, Seagate has been around for many years and I had many success stories as well. Kudos to Seagate as they have been able to weather all these years through economic hardships and manufacturing challenges from Typhoons and parts shortages while providing affordable storage. Even with higher failure rates, failures today are easily mitigated by RAID technology and with solid backups. So it really depends on what you are looking for in a drive.

Brand loyalty is a personal thing but make sure you know what you are buying besides just a name.

Thanks to BackBlaze for the interesting and insightful study.

Being in the IT industry for over 20 years, I have worn many hats in my days. It isn’t very often that people actually know what I do. They just know I do something with computers. So by default, I have become my family’s (extended family included) support person for anything that runs on batteries or plugs into an outlet. In case you don’t know, I am a data protection expert and often not troubleshooting or setting up servers anymore. In fact, I spend most of my days visiting people and making blueprints with Microsoft Visio. I have consulted, validated and designed data protection strategies and disaster recovery plans for international companies, major banks, government, military and private sector entities.

For those who ARE familiar with my occupation often ask me, “So what does a data protection expert do to protect his personal data?” Since I help companies protect petabytes of data, I should have my own data protected also. I am probably a few professionals that actually do protect data to the extreme. Sometimes a challenge also because I have to find a balance between cost and realistic goals. It is always easier to spend other people’s money to protect their data. There’s an old saying that, “A shoemaker’s son has no shoes”. There is some truth in that. I know some people in my field that have lost their own data while being paid to protect others.

Now welcome to my world. Here is what I do to protect my data.

1. Backup, Backup and Backup – Make sure you backup! And often. Doing daily backups are too tedious, even for a paranoid guy like me. It is unrealistic also. Doing weekly or bi-weekly is perfectly sufficient. But there are other things that needs to be done as well.

2. External Drives – External drive backups are not only essential, but they are the only way we can survive as keeping pics and home videos on your laptop or desktop is not realistic. Backing up to a single external drive is NOT recommended. That is a single point of failure as that drive can fail with no other backups around. I use a dual (RAID1) external drive. It is an external drive that writes to 2 separate drives at the same time. There is always 2 copies at all times. I also have a 2 other copies on 2 separate USB drives. You should avoid don’t slam the door drives as they add an additional layer of complexity. When they fail, they fail miserably. Often the NAS piece is not recoverable and the data is stranded on the drives. At that time, data recovery specialist may have to be leverage to recover the data. This can cost thousands of dollars.

3. Cloud Backup – There are many different cloud services out there and most of them are great. I use one that has no limit to backing up to the cloud. So all of my files are backed up to the cloud whenever the my external drives are loaded with new data without limits.
4. Cloud Storage – Cloud storage is different from cloud backup as this service runs on the computers that I use. Whenever I add file(s) on my hard drive, it is instantly replicated to the cloud service. I use Dropbox at home and Microsoft SkyDrive for work, as it is saved in the cloud as well as all my computers. I also have access to my files via my smartphone or tablet. In a pinch, I can get to my files if I can get to an Internet browser. This feature has saved me on a few occasions.

5. Physical Off-Site Backup – I backup one more time on an external hard drive that I copy my files onto once a year. That drive goes to my brother-in-law’s house. You can also utilize a safety deposit box for that as well. This is in case there is a flood or my house burns down, I have a physical copy off-site.

Data is irreplaceable and should be treated as such. My personal backup plan may sound a bit extreme, but I can sleep well at night. You don’t have to follow my plan but a variation of this plan will indeed enhance what you are already doing.

Far too many times I have bought something with much anticipation only to be disappointed. If it wasn’t the way it looked or what it was promised to do; it was something else that fell short of my expectations. I have to say that one of the few companies that go beyond my expectations are the ones I keep going back to. The one I like to frequently talk about is Apple. Their products often surprise me (in a good way) and the intangible features that brings a deep satisfaction way beyond what is advertised. The “new drug” for me is Samsung and Hyundai (cars).

American marketing plays the leading role in setting this expectation. It is the marketing that has become the “American” culture… The “must have” the newest, coolest and flashy-est toys that defines who we are. Unfortunately, marketing of these products almost always falls short of the actual product itself. We all seem to hang on the hope that these products will exceed our expectations. This is why “un-boxing” videos are so popular on YouTube. Product reviews and blogs are also a good way to keep companies honest and helping us with our “addictions” to our toys. This marketing culture is not only limited to personal electronics but is also true for products in the business enterprise as well.

Marketing in the Business Enterprise

The Backup Tape

I remember having to buying backup tapes for my backups. I have often wondered why and how they can advertise 2x the native capacity of the tape? How can they make that claim? For example, a SDLT320 tape is really a 160GB tape (native capacity). How do they know that customers can fit 320GBs on a 160GB tape?” After doing some research, the conclusion I came to was that they really don’t know! It was a surprising fact to me that they can make such a claim based on speculation. How can they do this and get away with? It is easy… It is what I call the “Chaos Factor”. This is when someone or something takes advantage of a situation to further their cause.
In the case of the backup tapes, they capitalize on a few things that facilitate the Chaos Factor:

1. The Backup Software and

2. The Business Requirements.

The Backup Tape “Chaos Factor”

1. The Backup Software

Tape manufacturers know this all too well. Backup software is very complex. Virtually all backup administrators are far too busy worrying about one thing; completing the backups successfully. Looking to see if tapes are being utilized to meet its advertised capacity is not something that is even thought about in the day-to-day operation. In fact, the only time tape utilization ever comes up is if management asks for it. When it is requested, it is usually a time consuming exercise as backup software does not have good reporting facilities to compile this information readily. Tape utilization is not a concern.

1. The Business Requirements

Another reason is how backup software uses tapes. Tape backups are scheduled by jobs. Most jobs are completed before the tape are filled up. Depending on the companys’ policy, most tapes are ejected and stored off-site. So tapes are rarely ever be filled up because of this policy! This is normal for backup jobs and it is when companies leave tapes in the drive(s) to fill them up goes against why they do backups in the first place. Backup tapes are meant to be taken off-site to protect from disaster. It is really the ONLY time (other than having backups larger than a single tape) that a tape can actually be fully utilized.

So this Chaos Factor is also used in the business of data storage. The SAN market is another one of where the protection of data trumps our ability to efficiently manage the storage. The SAN market is full of dirty secrets as I will outline them below.

The SAN “Chaos Factor”

A dirty secret of the storage industry is the use of marketing benchmark papers. Benchmark testing papers are designed to give the impression that a product can perform as advertised. And for the actual paper itself, it may be true, but sometimes these tests are “rigged” to give the product favorable results. In fact, sometimes these performance numbers are impossible in the real-world. Let me illustrate.. For example, I can type about 65 words per minute. Many people can and will view that as average, but if I wanted to “bend the truth”, I can say I can type 300 words per minute. I can technically type “at” 300+ words per minute, but in the real world, I don’t type like that. What good is a book with 1 word (at) printed on 300 pages? This kind of claim holds no water but it is the same technique and concept used for some of these technical papers. Although the results are touted, keep them honest by asking what their customers seeing in their performance on a day-to-day operation.

Here is another technique that is commonly used by vendors. It is what I call the “smoke and mirror” marketing. It is a tactic used to mimic a new technology, feature or product that is hot. The main goal of this is to create the feature at the best possible price and downplay the side-effects. This is the deliberate engineering around providing the feature set at the expense of existing features. Here is an example. I bought a new Hyundai Sonota last year. I love the car, but I am not crazy about the ECO feature that comes with it. I was told that I would save gas with this mode. Although I have to say I think I get a few more miles on a tank of gas, the cost I pay in lost power, torque and responsiveness is not worth me using this feature at all. I believe this feature as well as a smaller gas tank capacity eventually lead to a class-action law suite over Hyundai’s gas mileage claims. So for a vendor to incorporate new features they sometimes have to leverage existing infrastructures and architectures because it is what they already have. In doing so, they now have an inferior product by emulating new features and masking or downplaying the effects. The prospective customers are not going to know the product well enough to know the impact or these nuances. They often just see the feature set in a side-by-side comparison with other vendors and make decisions based on that. While the details are in the fine print, it is almost never looked at before the sale of the product. As a seasoned professional, I commonly do my due diligence to research their claims. I also am writing this to help you avoid making these mistakes by asking questions and researching before making a major investment for your company.

Here are some questions you should ask:

• What trade magazines have you been featured in lately? (last year)
• What benchmarking paper is available for review
• How does that benchmark compare to real-world workloads?
• What reference architectures are available?
• What customers can I talk to on specific feature set(s)?

Here are some things to do for research

• Look through the Administrator’s Guide for “Notes” and fine print details. This will usually tell you what is impacted and/or restricted as a result of implementing the features
• Invite the vendors for a face-to-face meeting and talk about their features
• Have the vendor present their technologies and how they differ from the competition
• Have the vendor white-board how their technology will fit into your environment
• Ask the vendor to present the value of their technology in relation to your company’s business and existing infrastructure
• If something sound too good to be true then ask them to provide proof in the form of a customer testimony

I hope this is good information for you because I have seen time after time, companies making a purchases into something that isn’t the right fit. Then they are stuck with it for 3-5 years. Remember, the best price isn’t always the best choice.

It is human nature to assume that if it looks like a duck, quacks like a duck and sounds like a duck then it must be a duck. The same could be said about hard drives. They only come in 2.5” and 3.5” form factors, but when we dig deeper, there are distinct difference and developments in the storage industry that will define and shape the future of storage.

The Rotating Disk or Spinning Disk

So there were many claims in the 90’s of how the “mainframe is dead”, but the reality is, the mainframe is alive and well. In fact, there are many corporations still running on mainframes and have no plans to move off of it. This is because there are many other factors that may not be apparent on the surface, but it is reason enough to continue with the technology because it provides a “means to an end”.

Another claim was in the mid 2000’s that “tape is dead”, but again, the reality is, tape is very much alive and kicking. Although there have been many advances in disk and tape alternatives, tape IS the final line of defense in data recovery. Although it is slow, cumbersome and expensive, it is also a “means to an end” for most companies that can’t afford to lose ANY data.

When it comes to rotating or spinning disk, many are rooting for the disappearance of them. Some will even say that is going the way of floppy disk, but just when you think there isn’t any more that can be developed for the spinning disk, there are some amazing new developments. The latest is…

The 6TB Helium Filled hard drive from HGST (a Western Digital Company).

Yes, this is no joke. It is a, hermetically sealed, water proof, hard drive packed with more platters (7 platters) to run faster and more efficiently that the conventional spinning hard drive. Once again, injecting new life into the spinning disk industry.

What is fueling this kind of innovation into a supposedly “dying” technology? For one, solid state drives or SSDs are STILL relatively expensive. The cost has not dropped (as much as I would have hoped) like most traditional electronic components thus keeping the spinning disk breed alive. The million dollar question is, “How long will it be around?” It is hard to say because when we look deeper into the drives, there are differences. They are also fulfilling that “means to an end” purpose for most. Here are some differences…

1. Capacity
As long as there are ways to keep increasing capacity and keep the delta between SSDs and spinning disk far enough, it will dilute the appetite for SSDs. This will trump the affordability factor because it is about value or “cost per gigabyte”. We are now up to 6TBs in a 3.5” form factor while SSDs are around 500GBs. This is the single most hindering factor for SSD adoption.

2. Applications
Most applications do not have a need for high performance storage. Most storage for home users are for digital pictures, home movies and static PDF files and documents. Most of these files are perfectly fine for the large 7.2k multi-terabyte drives. In the business world or enterprise, it is actually quite similar. Most companies’ data is somewhat static. In fact, on average, about 70% of all data is hardly ever touched again once it is written. I have personally seen some customers with 90% of their data being static after being written to for the first time. Storage vendors have been offering storage tiering (Dell Equallogic, Compellent, HP 3Par) that automates the movement of storage based on their usage characteristics without any user intervention. With this type of virtualized storage management maximizes the ROI (Return on Investment) and the TCO (Total Cost of Ownership) for spinning disk in the enterprise. This has extended the existence of spinning disk as it maximizes the performance characteristics of both spinning disk and SSDs.

3. Mean Time Before Failure (MTBF)
All drives have a MTBF rating. I don’t know how vendors come up with these numbers, but they do. It is a rating of how long the device is expected to be in service before they fail. I wrote in a past blog called “The Perfect Storm” where SATA drives would fail in bunches because of the MTBF. Many of these drives are put into service in massive amounts at the same time doing virtually the same thing all of the time. MTBF is theoretical number but depending on how they are used, “mileage will vary”. MTBF for these drives are so highly rated that most of them that run for a few years will continue to run for many more. In general, if a drive is defective, it will fail fairly soon into the operational stage. That is why there is a “burn-in” time for drives. I personally run them for a week before I put them into production. Those drives that last for years eventually make it back on the resale market only to run reliably for many more. On the other hand, MTBF for an SSD is different. Although they are rated for a specific time like the spinning disk, the characteristics of an SSD is different. There is a process called “cell amplification” where the cells in an SSD will actually degrade. They will eventually be rendered unusable but there is software that will compensate for that. So as compared to a spinning disk where there is no cell amplification, SSDs are measurably predictable to when they will fail. This is a good and bad thing. Good in the aspect of predicting failure but bad in the sense of reusability. If you can measure the life of a drive, this will directly affect the value of the drive.

In the near future, it is safe to say that the spinning disk is going to be around for a while. Even if the cost of SSDs come down, there are other factors that meet the needs for the users of storage. The same way that other factors that have kept the mainframe and tape technologies around the spinning disk is has earned its place.

Long live the spinning hard drive!

If you are amazed with technology today, you should know that the speed of technological advances have been hindered for many years by many factors.

Things like copyright laws, marketing and corporate profits often constrain the speed of new products and innovations. We as a collective human race can develop and advance much faster if these obstacles are to be removed. The counter-force to these constraints have come through the “Open Source” community with Linux and other operating systems and open source standards, hobbyist, enthusiasts and hackers (ethical and unethical) has brought to us great benefits and/or improvements to the devices we all love so much. With these leaps in technology and advancements comes a new technology inhibitor… Increased Regulations and Compliance.

From the days of the Enron scandals to the insider trading of Martha Stewart, a number of regulatory rules and compliance has come down on businesses. Some examples of these rules and regulations are the Sarbanes Oxley or (SOX) Act, PCI DSS (Payment Card Industry Data Security Standard) and HIPAA (Health Insurance Portability and Accounting Act) are just to name a few.

These compliance rules do not directly inhibit innovations and advancements in technology, but it does slow it down. It forces technology to stay around longer than it is intended for. It “shifts” innovations to data preservation and accessibility. Regulations that span many years like in the financial sector are typically 7 years worth of unaltered and verifiable financial data. There is now the possibility of expanding it to 10 years or more. The medical industry is moving to retention of 50+ years of unaltered and verifiable record keeping. The portability of medical history is now driving retentions to exceed the life of a person; in some cases 100+ years, depending on the medical treatment type. Finally, there are the data “packrats”. Although they may not be mandated by regulations yet, some institutions’ self-imposed retentions to “forever”.
Reality? Yes and No.

Yes, that we can set these rules today, but the reality is No… at least not proven yet. It is a work in progress. There are some innovative products that are designed to keep data past 100+ years, but most companies’ IT departments are not looking that far ahead. They are not looking to spend lots of money on unproven technology on the prospect of being able to keep the data that long. They have more immediate issues to solve. So companies are faced with the challenge of keeping up with innovations and leading-edge technologies while being able to support data retention compliance. Much of that data is on old tapes, optical platters and old disk storage systems.

Fortunately, there are the aftermarket niche resellers that specialized in repurposed gear. These business provide an essential service to these unique situations. Companies are making their storage subsystems last longer, usually past the vendor’s EOL (End of Life) support of the products. Some are resorting to eBay and user groups to fulfill the hard to find items, but to varying degree of success. One IT manager says, “When I buy my stuff from eBay for my workplace, I am playing Russian roulette. I prefer to go to these niche resellers because I know what I am getting and it’s not from some guy in a garage somewhere.” EOL disks in archival systems with compliance metadata, older servers with specific BIOS support for older drives, SANs with EOL disks with specific interface or firmware are generating a steady demand for these components, but until it hurts enough for companies to invest in a ultra-long term archival/compliance ready solutions, companies will endure the pain and resort to leveraging older equipment to preserve their data.

As you may remember when SATA drive technology came around several years ago, it was a very exciting time. This new low cost, high-capacity, commodity disk drive revolutionized the home computer data storage needs.

This fueled the age of the digital explosion. Digital photos and media quickly filled hard drives around the world and affordably. This digital explosion propelled companies like Apple and Google into the hundreds of billions in revenue. This also propelled the explosive data growth in the enterprise.

The SAN industry scrambled to meet this demand. SAN vendors such as EMC, NetApp and others saw the opportunity to move into a new market using these same affordable high-capacity drives to quench the thirst for storage.

The concept of using SATA drives in a SAN went mainstream. Companies that once could not afford a SAN can now buy a SAN with larger capacities for a fraction of the cost of a traditional SAN. This was so popular that companies bought SATA based SANs by the bulk, often in multiple batches at a time.

As time progressed, these drives started failing. SATA was known for their low MTBF (mean time before failure) rates. SATA SANs employed RAID 5 at first to provide protection for a single drive failure, but not for dual drive failure.

As companies started to employ RAID 6 technology dual drive failure protection would not result in data loss.

The “Perfect Storm” even with RAID 6 protection looks like this…

– Higher Capacity Drives = longer rebuild times: The industry has released 3TB drives. Depending on SAN vendor, this will vary. I have seen 6 days for a rebuild of a 2TB drive

– Denser Array Footprint = increased heat and vibrations: Dramatically reducing MTBF

– Outsourced drive manufacturing to third world countries = increase rate in drive failures particularly in batches or series: Quality control and management is lacking in outsourced facilities resulting in mass defects

– Common MTBF in Mass Numbers = drives will fail around the same time: This is a statistical game. For example, a 3% failure rate for a SAN in a datacenter is acceptable, but when there are mass quantities of these drives, 3% will approach and/or exceed the fault tolerant of RAID

Virtualized Storage = Complexity in recovery : Most SAN vendors now have virtualized storage, but recovery will vary depending on how they do their virtualization

– Media Errors on Drives = Failure to successfully rebuild RAID volumes: The larger the drive the chance of media errors become greater. Media errors are errors that are on the drive that renders small bits of data to be unreadable. Rebuild of RAID volumes may be compromised or failed due to these errors.

Don’t be fooled into having a false sense of security but having just RAID 6. Employ good backups and data replication as an extension of a good business continuity or disaster recovery plan.

As the industry moves to different technologies other new and interesting anomalies will develop.

In technology, life is never a dull moment.

Do you know we live in amazing times? When I was growing up and if I wanted to learn to dance, I would have to take lessons. If I wanted to learn construction, I would get an entry level job as a construction worker. Today, you can Google virtually anything and learn almost anything from the Internet. As I was thinking today about how far we have come with storing data. I wanted to take this time to simplify this nifty technology so that many can “Google” RAID technology and understand it in about 5 minutes.

Although RAID has been around for a long time, most people who are not in IT won’t know what RAID is. There has been some consumer version of hardware RAID cards for the home, but is not commonly used. Let’s first start with what RAID stands for; “Redundant Array of Independent Disks”. When RAID was first introduced, it stood for “Redundant Array of Inexpensive Disks”. The acronym was changed to reflect the changing nature of hard drives and RAID sets. Basically, RAID is a data protection method that employs different data storing algorithms using a set of disks. There are different levels of RAID which is designated by a number following the term “RAID X” (X being the RAID level). I will break down the different RAID levels for you.

RAID 0
This RAID level is striping without parity. Striping is the ability to store data across multiple drives. Parity is an error correction method that is used in RAID and is a core mechanism in rebuilding failed drives. This RAID level offers no protection… Yes, this is the only RAID level you probably don’t want to use. The only advantage of this RAID level is increased capacity and throughput because there are more disk spindles in the RAID disk set. The minimum RAID 0 set contains 2 drives. Some external home use drives use RAID 0 to increase capacity in which I DO NOT recommend unless you have another set of backups somewhere else and capacity is paramount. In my book, capacity never trumps reliability when it comes to data storage.

RAID 1
This RAID level known as disk mirroring (without parity). Simply put, RAID 1 is a duplicate image of the main disk on another disk. This is also called duplexing. This RAID level is usually done using hardware controllers, but can also be done using some operating systems that support disk mirroring or third part software. There are definitely advantages for using this RAID level, but is the most costly because essentially you will be buying double the usable disk capacity.

RAID 5
This RAID level is disk striping with distributed parity. What this means is that the data is distributed along with the parity data across all drives in the RAID set. This RAID level can tolerate a single drive failure which will need to be replaced. Upon replacement, the new drive must be rebuilt from the surviving drives. During this time, the failure of a second drive will result in data loss.

RAID 6
This RAID level is disk striping with double distributed parity. What this means is that this RAID set can tolerate 2 failed drives and still be operational. Failure of a third drive will result in data loss.

HOT SPARE
A hot spare is usually a disk that is powered on and spinning that sits in an array that does nothing but wait for a drive failure. The hot spare is then automatically rebuilt from the surviving disks in the RAID set . This will allow for minimized windows of exposure to data loss.

I will be expanding on RAID technologies in my future posts and hope this was helpful in understanding this complex but compelling technology.

It is amazing how our lives revolve around our computer(s). I actually felt like I was helpless when I left my laptop at home for a vacation. Of course, there was my blackberry that I was still able to get my fix of information. The days when all my worldly information could fit on a single 5.25” floppy disk are long gone. Like many people, you probably make backups of your data to an external hard drive, but is that enough to adequately protect your valuable data? How about all those tunes and movies that you spent a fortune on and all those years of digital pictures? What about your iPad or tablet? Are those protected from data loss? The other question is, what if I lost my portable device or portable hard drive?

I have so many friends that come to me with their portable hard drives that stopped working because they either dropped it or it just croaked. The fatal mistake most people make is when they start MOVING data to a portable hard drive instead of a BACKUP. That is when only 1 copy exists. I try to help my friends with some recovery techniques, but sometimes the portable drives are beyond my tools and abilities. Usually, at that point, some serious $$$ are needed to have the data recovered. How can you protect your valuable data from different levels of threats?

1. Backup, Backup, Backup!
Backup to an external drive. Then backup to another external drive. That way, you have 2 copies in an external source. Make sure if you move your data from your computer, keep another copy on another drive. If you have super sensitive information, it is a good idea to encrypt your external drives. That way, if you loose the drive, your data is unreadable. The other consideration is using a RAID technology in an external drive. These are more costly, but by RAIDing your drives, it adds protection from drive failures at the drive level.

2. Cloud Backup
Backing up the the “Cloud” is a great way to add extra protection to your data. Cloud backup is using your Internet connection to upload your data to some off-site location. Not all cloud backup services are the same. Make sure you do your research carefully to see if the service provides you with the security, capacity and cost you are comfortable with. Sometimes recovering to a USB drive or DVD is costly, so check before you commit. I like BackBlaze and Mozy for they are easy to use and reliable.

3. Use a MobileMe
I am not one to promote MobileMe because I personally think this is an expensive service and the cloud backup space is minimal. The one thing I really like about MobileMe is that you can locate your iPad or iPhone remotely as well as issue a remote “wipe” of your data if you ever lose your device. As for Droid devices, I am sure there either are or soon to come a similar service.

There are certainly additional ways you can backup your data, but I believe if you do follow these 3 steps, you will be protected pretty well.

Don’t wait till it happens, do it today!