Distributed Storage Distributes the Risks

Distributed storage arose to meet a growing array of needs confronting enterprises.  We’ve long known that data is the lifeblood of our businesses.  Leveraging data is how we prosper and grow.  Yet, how do we keep our data safe and private?  We need to safeguard our data while keeping the data always accessible.  We must meet regulatory demands as well as corporate governance.  We need to keep our data secure from hackers and the scourge of ransomware attacks.  We could go on.

In antediluvian days, we stored our precious data on robust storage arrays in the data center, an approach that had its limitations.  For example, it was vulnerable to physical calamities such as fires and floods.  The risk was too great, so we began to store our data in the cloud.  But this was still a centralized approach that presented many of the same risks.

Therefore, distributed storage became the go-to strategy.  Though simple in concept, it can present complications in practice.  Distributed storage stores data on physical servers that are geographically distributed, ranging from the building next door to clouds and data centers on other continents.  Each server is called a node and each node can house part or all of a dataset. How you distribute your data across the nodes will depend on your particular needs.  Replicate your data so that if an entire node is removed from service for any reason, you still have access to your entire dataset.  A failed server doesn’t disrupt the system.  Such an infrastructure is called a distributed data store.

Of course, you will need a solution to replicate your data according to policies, but the advantages of distributed data stores are many.  They include:

  • Data protection—Even if a flood or tornado wipes out a data center, you’ll still have access to all your data. Distributed data offer strong disaster recovery and backup capabilities.
  • Constant availability—Gain reliable uptime. You’ll be immune to any disruptions from outages, maintenance, or malfunctions.
  • Strong performance—Replicate production data to relatively nearby nodes for rapid reads and writes, while keeping chillier data on slower, more distant, and less costly servers.
  • Tiered storage—Tier your storage on servers that offer high performance or greater economy, depending on the data they store.
  • Highly scalable—Simply add more nodes to the system as needed.
  • Flexibility—Distributed storage certainly works with object stores, but it also supports other kinds of data such as file and block storage.

There are considerations such as costs.  Multiple servers at data centers/clouds that are far and wide can be expensive.  Moreover, you need to ensure each node is properly maintained and updated as required.  Distributed storage, however, can effectively meet many of today’s storage needs.