“Big data” is a widely used term in the IT business and what it means depends on who’s doing the talking. It can mean one thing for the business analytics crowd and another for the storage people. However, the bottom line is “big data” refers to the fact that businesses large and small are generating ever increasing volumes of data with almost every click and transaction, and this information needs to be stored and processed.
Storage administrators today face the Sisyphean task of saving torrents of data and making it available quickly. Storage systems like RAID arrays now offer solid-state drives that greatly accelerate access to data, but must companies routinely add more of these systems to their networks as their existing arrays fill up? Fortunately, there are techniques available that improve the efficiency of data storage. They allow more data to be stored within fixed capacities and a particularly effective strategy is storage dedupe.
The premise behind storage dedupe is simple. Save only one copy of a file rather than multiple copies. A PowerPoint or Excel file might be sent to a workgroup of twenty and each member will want to save his copy. Traditionally, this means saving twenty copies of exactly the same file.
Advanced storage arrays will “dedupe” these files, meaning that they will save only one copy and use markers that enable all users to access it. All of this storage dedupe is transparent to users. The result is you would need only one twentieth of the space to store everyone’s file. The percentage of savings will vary, of course, but they will generally be substantial. Moreover, when storage dedupe is done on a block level, the efficiencies are greater. This is because when a file is modified, the system will save only the changed blocks rather than the entire edited file. The system will use markers to point to the changes, enabling users to open the modified files as if they were stored in their entirety. Storage dedupe is not a gimmick or passing fad. The technology is mature and will be essential for storing data efficiently and cost-effectively.