Data Storage Management, the New Enemy

by Anya Jennings on May 26, 2010

No Gravatar

We all know digital data is stored….somewhere, but few of us truly understand what it takes to manage the digital data we all produce.  And while data storage management as a concept may sound simple- when it exists on a greater scale than just your personal computer, it can get pretty complicated.


Not surprisingly, the growth of digital data has been one area that has been thriving, and even growing at an unbelievable pace through the recession. A recent study by IDC Digital Universe says there is over 1 zettabyte of data in the world right now. 1 zettabyte! You don’t hear that term very often do you?

Curious as to how much data that really is?

Okay, say you started buying ordinary laptops and filled them up with the world’s digital data – documents, images, emails, movies etc. Now imagine that you began distribution of these laptops to each of the approximately 7 billion people in the world. You would have to give each person in the world about 20 of said laptops, each loaded with digital data, before you would exhaust 1 zettabyte. That’s an incredible amount of data if you wrap your head around that picture.

So how much would all that cost…?

Before you begin wondering about the cost of data storage, let’s check out some facts. About 75% of the total digital data out there is duplicate data. That ratio is probably higher for many of these giant data producing behemoths, where, for every byte of original data, 20 duplicates are produced.

What is duplicate data?

Duplicate data are copies of original data that you arrive at through backup files, computer recovery systems, digital archives and warehousing, or just plain copies that are stored in hundreds of thousands of computer systems around the world. If the 1 ZB of digital data out there were all original data, that would be like a straight road. This straight road would not only be a long road, but a useful road as well. But the reality is with 75% of the data being duplicate material, the digital universe is not a straight road at all. It is a complex maze where it’s easy to get lost and hard to find anything quickly. The digital universe is simply unmanageable and, as storage increases, storage management technologies are just not able to catch up.

So while storage cost is a natural concern, the true issue is how all this data is going to be managed.  The good news is that storage costs have been decreasing steadily, and in the last 50 years, it has decreased almost 400-fold. For example, storing 1 MB of data would have cost you $5000 in 1960; it costs about $.00005 in 2010. Therefore, the cost of storing 1ZB of data worldwide would be more manageable than it may sound.

But what about managing all that data, duplicates, originals, part duplicates, recovery data, backups, and other random data? That’s a whole different ballgame. There are storage management technologies that can possibly do that someday in the future, but for right now, they are at a very elementary stage of development. Storage virtualization and de-duplication techniques need to develop a lot more before they can handle the world’s total data.

Besides that, there’s also the fact that most large enterprises have very inadequate storage management policies in place, and few are even aware of what enterprise level information lifecycle management (ILM) is. That’s the scale of unattended tasks that storage management technology faces in the next few years: large data producing enterprises sprouting out zettabytes of unorganized data, emails, documents, PowerPoint presentations, pdf files, images, movies and what have you, and no one able to find anything important in this maze. This is the area that we should be most concerned about.

Related Posts

{ 1 comment… read it below or add one }

Anya Jennings June 7, 2010 at 5:29 pm

Thanks for your thoughtful comments! You have an interesting idea about symbolic language and storage.

I usually look at everything through the web perspective, and I recall a time back when websites were small and lean to provide fast load times on visitors’ slow dial-up Internet connections. Now that web connections are incredibly fast, sites are much larger and potentially much more bloated (Flash??). Less care is put into faster load times (although SEOs and usability people would argue for fast load times). A bit closer to this topic though, the same with software – heck, even video games. When software or games were delivered on floppies or cartridges, extra care was put into coding so storage space wasn’t wasted with inefficient coding. I think you touched on that point where you mentioned using C++ or C# vs. assembly at times. Even using the “right” language can make a difference.

It almost seems like now, as storage prices keep dropping, we don’t always take as much care in storing data as smartly as possible because all that data can all be dumped somewhere and accessed (although perhaps not as efficiently) as the post points out.

Leave a Comment

Previous post:

Next post: