We all know digital data is stored….somewhere, but few of us truly understand what it takes to manage the digital data we all produce. And while data storage management as a concept may sound simple- when it exists on a greater scale than just your personal computer, it can get pretty complicated.
Not surprisingly, the growth of digital data has been one area that has been thriving, and even growing at an unbelievable pace through the recession. A recent study by IDC Digital Universe says there is over 1 zettabyte of data in the world right now. 1 zettabyte! You don’t hear that term very often do you?
Curious as to how much data that really is?
Okay, say you started buying ordinary laptops and filled them up with the world’s digital data – documents, images, emails, movies etc. Now imagine that you began distribution of these laptops to each of the approximately 7 billion people in the world. You would have to give each person in the world about 20 of said laptops, each loaded with digital data, before you would exhaust 1 zettabyte. That’s an incredible amount of data if you wrap your head around that picture.
So how much would all that cost…?
Before you begin wondering about the cost of data storage, let’s check out some facts. About 75% of the total digital data out there is duplicate data. That ratio is probably higher for many of these giant data producing behemoths, where, for every byte of original data, 20 duplicates are produced.
What is duplicate data?
Duplicate data are copies of original data that you arrive at through backup files, computer recovery systems, digital archives and warehousing, or just plain copies that are stored in hundreds of thousands of computer systems around the world. If the 1 ZB of digital data out there were all original data, that would be like a straight road. This straight road would not only be a long road, but a useful road as well. But the reality is with 75% of the data being duplicate material, the digital universe is not a straight road at all. It is a complex maze where it’s easy to get lost and hard to find anything quickly. The digital universe is simply unmanageable and, as storage increases, storage management technologies are just not able to catch up.
So while storage cost is a natural concern, the true issue is how all this data is going to be managed. The good news is that storage costs have been decreasing steadily, and in the last 50 years, it has decreased almost 400-fold. For example, storing 1 MB of data would have cost you $5000 in 1960; it costs about $.00005 in 2010. Therefore, the cost of storing 1ZB of data worldwide would be more manageable than it may sound.
But what about managing all that data, duplicates, originals, part duplicates, recovery data, backups, and other random data? That’s a whole different ballgame. There are storage management technologies that can possibly do that someday in the future, but for right now, they are at a very elementary stage of development. Storage virtualization and de-duplication techniques need to develop a lot more before they can handle the world’s total data.
Besides that, there’s also the fact that most large enterprises have very inadequate storage management policies in place, and few are even aware of what enterprise level information lifecycle management (ILM) is. That’s the scale of unattended tasks that storage management technology faces in the next few years: large data producing enterprises sprouting out zettabytes of unorganized data, emails, documents, PowerPoint presentations, pdf files, images, movies and what have you, and no one able to find anything important in this maze. This is the area that we should be most concerned about.