The Voracious Appetites of Machine Learning & Artificial Intelligence

Like the Internet of Things, machine learning (ML) and its more august sibling, artificial intelligence (AI), are upon us. The former is impacting business and IT operations, and the latter will impact nearly all of society. When most of us think of them, we envision incomprehensible algorithms and the brawny CPUs and GPUs all but running on nitro that bring them to life. Who thinks of quotidian storage?

The fact is the very foundation of AI and ML is data, lots of it, and the data must be stored somewhere. The data coming out of AI and ML are only as good as the data going into them. And the more data, the better. The larger the datasets, the more accurate will be the pattern recognition, correlations, analyses, and decision-making. The more data, the smarter our machines will be. This is true for any use case or workload, from sequencing genomes, improving agricultural yields, and scientific research to fraud detection, customer support, and self-driving automobiles.

Additionally, AI and ML generate data. Once AI/ML applications process their source data, the results will need to be safely stored and reused for further analyses.

Feeding the gluttony of AI and ML for data presents challenges. Data will come from many sources, such as business operations within the enterprise and IoT and social media from outside the enterprise. Data repositories must be extremely scalable, while still being cost-efficient, which often means hybrid infrastructures combining on-premise and cloud storage. Object storage will be a common solution for its ability to present vast troves of data in a single namespace.

Additionally, using high-octane GPUs will be wasted if the storage is a bottleneck. For this reason, AI and ML can best be served by flash drives, particularly for real-time use cases like assessing financial transactions.

Finally, AI and ML will improve storage itself. Vendors have already started to include logic in their offerings to better understand and manage enterprise environments. Armed with AI and ML, administrators will determine usage and detect patterns, and make more informed decisions about I/O patterns and data lifecycles. They’ll more accurately project future capacity needs and even perhaps predict failures, permitting proactive measures to safeguard operations.

The bottom line: if you’re planning on ML or AI applications in your enterprise, strongly consider the storage that will enable them. In storage, as in life itself, the one thing that never changes is that things are always changing.

Surviving the IoT Flood

Last month’s blog addressed edge computing and how it supports the Internet of Things.  Now, let’s look a bit further into the storage demands of IoT. IoT storage can’t possibly be covered in a blog, but here are some thoughts.

We’ve all been introduced to an IoT-embellished future. Smart homes, buildings, cities, and cars. Industrial, transportation, environmental, and scientific sensors. Sensors that tell us what’s in the soil, what’s in the air, and what’s in the water. Sensors in our clothes and, eventually, even ourselves. A 2017 white paper by IDC forecast that by 2025, IoT devices worldwide will generate some 40 zettabytes of real-time data (www.seagate.com/files/www-content/our-story/trends/files/Seagate-WP-DataAge2025-March-2017.pdf). For those who are counting, that’s 40 billion terabytes. The data onslaught will only intensify.

These data will need to be moved, stored, and shared. And, of course, value must be extracted from them. Otherwise, why waste the time and money to collect them in the first place?

Where will zettabytes of data reside? Where should your IoT data reside? It all depends on the data and how they’re used. Complicating things, there’s a vast diversity of IoT data, ranging from tiny file logs to huge video surveillance files.

Some data needs to be processed immediately for safety or well-being. Think critical avionics or data exchanged between smart cars approaching an intersection. Similarly, healthcare providers must know immediately when a device detects that an at-home patient is suffering from a medical crisis.

Some data need to be processed soon, such as sensors in an industrial device that can indicate an impending failure. Some data can be processed later. An oil exploration firm’s geological data, for example, needs to be analyzed carefully over time. Or a vending machine can send pings to the datacenter just whenever a purchase is made or inventory needs to be restocked. Evaluating data for insights into consumer preferences and behaviors can be done periodically at the datacenter.

Data that is analyzed in real time may not need to be stored, other than in a temporary cache. Other data need to be retained for set periods of time, such as the aforementioned video surveillance files. Some data can be discarded right after analytics. When the value of the data is to reveal outliers, for example, data points that indicate otherwise can be discarded after analysis. Do you really need to store the sensor reading in your refrigerator informing you that you need milk?

The nature and purpose of the data will determine where they are stored and for how long. It would make sense to store video surveillance files in object storage in a cloud, but large data streams can congest pipelines to cloud repositories. This is also true of data streamed across substantial geographical distances to the enterprise datacenter. Consequently, with large and ongoing data streams, it can make sense, as discussed in last month’s blog, to adopt an edge computing strategy and store these data in mini-datacenters located in general proximity to the data’s sources. Processing can then be done locally.

For critical data that demand real-time analytics, edge computing is a practical option. What would be forwarded to the enterprise datacenter are just the results of the analytics. Of course, you still need to determine if these data must be retained after processing.

For critical data collection and transactions, fast solid-state storage is a must. How much latency is tolerable between two cars communicating with other as they approach an intersection?

The point is you may have to address many kinds of data with many kinds of processing and storage needs. Presently, your storage options basically are clouds, edge computing, or your datacenter, where the value and insights from IoT data will probably be ultimately realized. But does your datacenter have the resources to house a steady flow of IoT data? Is your WAN up to the task without impeding business operations? Resorting to clouds is cost-effective for some IoT data, but moving data from many disparate sources to clouds still takes time.

Once you figure out your IoT management and storage needs, you must address security requirements. How important is each IoT data stream and what is a commensurate level of security?

Finally, you can start pondering what IoT data to backup and how. Once you figure all of this out, go home and spend some time with your family.

As you do so, carry this thought. The IoT certainly presents a host of challenges, but it is also a revolution in the making that offers unprecedented opportunities for prosperity and well-being.

Edge Computing—Bringing Value to IoT

There are already many billions of Internet of Things (IoT) devices in the business, consumer, civic, science, and industrial sectors. They’re scattered across the planet in assembly lines, vehicles, hospitals, cities, homes, environments, and even our clothing. They continuously generate vast amounts of data and their numbers will proliferate almost exponentially.

Edge computing exists to give meaning to that data. It is the IT infrastructure that connects datacenters and corporate offices to the real world that IoT devices sense. To understand edge computing is to understand the issues it addresses to make IoT useful.

JetStor 826AFA – Storage at the Edge

One is what to do with all this remotely-sensed data? There’s too much to stream to clouds or enterprise datacenters. The pipelines are too small, if available at all, for all the countless devices.

Moreover, is every data point of value? Often, device owners need to know only of outliers, such as when a machine is being overtaxed or when a patient suffers a medical event.

And does all the data need to be saved? Video surveillance, for example, must be saved for specified periods of time, or compliance demands might require some data to be preserved. But is this true for everything? If so, storage costs would skyrocket.

Edge computing offers decentralized processing at the edge, in proximity to the sensors and devices, to power analytics. Analytics convert raw data into real-time actionable intelligence. It extracts business, scientific, industrial, and personal value from the streams. The analytics generally occur at small, local data centers that are linked to the IoT devices. From these edge-computing sites, unnecessary data are weeded out and only processed data—the information of real value—are sent to clouds or corporate data centers. Edge computing helps to ensure insights are delivered rapidly and it lowers storage and networking costs.

Of course, storage is key to the process. The local data centers, often consisting of a few racks, require enough storage to store data streams and support analytics. For these deployments, far-sighted organizations will rely on all-flash storage solutions. Flash storage lowers energy consumption and offers the reliability needed for sites that often lack on-premise staff. Moreover, flash delivers the performance needed for real-time analytics.

In summary, edge computing helps to make sense of IoT data to improve efficiencies, productivity, safety, health, and knowledge.

Virtual Desktop Infrastructures Need Very Fast Storage

Virtual desktop infrastructure (VDI), also known as desktop-as-a-service (DaaS), has been widely adopted. VDI simplifies IT management and backups, facilitates security, and reduces hardware and operating costs. VMware, Citrix, Amazon, Parallels, and others offer VDI solutions, each with its own implementation and features. Some run in the data center, some in the cloud, and some are hybrids. Some target small businesses, some large enterprises, and others for organizations in-between.  

All, however, depend on robust storage to work well.

To reduce costs with VDI deployments, organizations generally place as many virtual machines as possible onto the fewest number of physical servers. They then connect the servers to a shared storage system. This creates I/O issues, which undermines the predictable performance that users came to expect when their OSs, applications, and files resided on their workstations or laptops.

Over the course of the day, hundreds or thousands of users accessing applications and data, or storing and searching for files, means that shared storage must have substantial IOPS capabilities to keep pace. The biggest pressure comes from boot storms, a tsunami wave when the bulk of users simultaneously arrive in the morning and log in, expecting their desktops to be instantly available.

Not long ago, the only storage solution was spinning disks.  Yet, even with tricks like short-stroking spindles, traditional arrays were hard-pressed to keep up with VDI demands. Especially boot storms. Hard drives may be relatively inexpensive, but they’re not the smart investment for VDI storage platforms. All-flash arrays are. They deliver far superior I/O capabilities and they keep getting bigger and less expensive. They also consume less power and run far cooler than traditional platforms, slashing energy costs.

For a while, hyper-converged infrastructures (HCIs) were used with VDI, but the shortcoming of HCI soon became apparent for this use case. HCI doesn’t scale well for specific workloads like VDI. Although vendors are trying to remedy this, you must pay for an entire HCI block even if you only need more storage or CPU power.

An all-flash array enables you to efficiently balance storage with your VDI workload. You’ll gain easy, cost-effective scalability, power consumption so low you’ll be hard-pressed to quantify it in context of your data center, and the hyperspeed performance that dissipates boot storms and keeps users productive.

DropBox Turns to On-Premise Storage Rather than the Cloud, Saves $Millions

A story published on GeekWire should make many organizations rethink their storage strategies. It recounts how DropBox bucked industry trends by moving its popular file-storage service away from the cloud—AWS’s S3 storage service—to its own infrastructure (www.geekwire.com/2018/dropbox-saved-almost-75-million-two-years-building-tech-infrastructure/). By investing in its own data centers rather than spending on third-party infrastructure, DropBox saved $39.5 million in 2016 and $35.1 million in 2017. What DropBox discovered through its “Infrastructure Optimization” project is on-premise storage designed for an enterprise’s specific needs can be much more efficient than relatively generic cloud offerings.

There are persistent arguments for on-premise storage. You don’t have the security worries when your data leave the confines of your firewalls—and your control—to traverse the Internet to somebody else’s network. You lack concerns about sketchy neighbors on multi-tenant clouds, which is what most commercial clouds are. Additionally, moving data across your on-premise resources is simpler and faster than moving data between clouds. Your compliance, governance, and peace-of-mind needs can be best met when you maintain local ownership of your data.

Data are far more rapidly accessible when stored locally rather than somewhere else in the country or, worse, somewhere else on the planet. On-premise data benefits such established needs as time-sensitive transactional processing, backups, and data recovery, and there are emerging applications for which speed and safety will be paramount.

For example, analytics now go well beyond Hadoop-style, big-data projects as even small organizations increasingly use analytics to extract more value from their data. Analytics offers the knowledge to improve everything from IT and operational efficiencies to marketing and customer relations. But to avoid latency, especially when real-time or near real-time analysis is required, compute and storage must be close together, not separated by the Internet. The argument for local storage becomes even stronger with the adoption of technologies like NVMe and NVMe over Ethernet, which will greatly speed data movements across local networks and expedite data analytics.

Examples like DropBox show that public clouds are not always the most cost-effective solution for data storage. The best bet is keeping data on premises creating a private cloud, or a hybrid cloud, replicating them to a second site for backup and recovery, and archiving everything on a slow but inexpensive cloud service like private Cloud providers or Amazon Glacier. You’ll gain control, performance, and security. And you can economize on your IT expenses.

Hyper-convergence vs Convergence

Once upon a time, enterprises bought the components needed to deliver IT services, cobbled them together, and with a little sweat and aggravation, got them to work. Demands for more robust services prompted companies to turn to best-of-breed solutions, but this resulted in a mélange of systems that presented management and interoperability issues. These problems were exacerbated by virtualization technologies that must span devices.

warp drive Hyperconvergence

In response, converged solutions arrived on the market. These are turnkey systems that include everything IT required—servers, networking, storage, hypervisors, and management capabilities. The components come from various vendors, but they are all pre-tested to ensure interoperability and are supported by a single vendor. Converged solutions are quick to deploy and easier for IT staffs to maintain, although larger enterprises with a separate server, storage, and networking teams require organizational restructuring. Regardless, fully-converged, single-vendor solutions allow IT organizations to do more at less cost.

But a new approach—hyper-converged infrastructure (HCI)—further integrates components to better support virtualized services, particularly software-defined storage (SDS). HCI is appliance-based and software driven, and like converged solutions is supported by a single vendor. The appliances are commodity hardware boxes each integrating compute, storage, networking, and virtualization technologies. Unlike converged systems, the storage, compute, and networking of HCI are so tightly fused together, they cannot be broken down into separate components. Each appliance is a node in the system and all services are centrally controlled. Storage is decoupled from the hardware so the storage across all the nodes appears as one virtualized pool. Scaling means simply adding additional appliances.

HCI isn’t always a slam dunk. When IT department needs to scale just one resource, like computing or storage, it will have to pay for boxes that contain all the resources. Absolutely mission-critical applications might perform better on dedicated hardware, isolated from other apps that could consume essential bandwidth. HCI also might not make sense for ROBOs.

Yet, HCI offers many benefits over converged infrastructures, such as superior scalability, flexibility, control, and ease of use. IT can deploy the most advanced SDS functionality and automation, and achieve remarkable efficiencies. It reduces latency, better exploits the performance of solid-state drives, and leverages software-defined infrastructure. It might your best choice…until something better comes along.

Cloud Storage Hosting

Storage has always been a primary reason why companies turn to cloud computing. Clouds are ideal for backing up data and storing archival data. This is underscored by the advent of object storage, which makes vast data stores practical. Over time, providers offered additional use cases such as Software as a Service (SaaS), which delivers applications from the cloud, and Infrastructure as a Service (IaaS), which augments or replaces the whole data center.

Solutions must be cost-effective for companies and yet profitable for providers. The keys are economies of scale and efficiencies. Virtualization is the sauce that makes clouds work. Providers use virtualization to efficiently pool storage resources and enable multi-tenancy, thus leveraging hardware and lowering the costs of storage and computing. They can deliver scalability and services like cloud bursting, increasing the value of their offerings to customers. Now it’s up to providers to deliver performance, security, and compliance to ensure clouds make more business sense than do-it-yourself data centers.

All-Flash Storage Promises a Go-To Strategy for MSPs

Managed service providers (MSPs) need to offer fast, affordable storage. Storage is a perennial concern for enterprises of all sizes, and many are considering offloading their storage to reduce costs and headaches. But to win storage business, MSPs face formidable players like AWS, Azure, and Google Cloud. To compete, they must offer high-value, cost-effective solutions.

To this end, forward-thinking MSPs should consider all-flash solutions, which are positioned to furnish the performance, scalability, and availability that enterprises demand. The capital costs of flash storage are still greater than spinning disk, but its price per gigabyte continues to drop and it’s a matter of time before it achieves parity with the legacy technology.

Flash storage platforms can trim operating costs. They consume less power than their mechanical-drive counterparts and require less space, providing substantial cost-savings every month.

Flash also offers much greater IOPS than spinning disks and this performance gap will only widen with the adoption of NVMe (Non-Volatile Memory express) drives. NVMe eliminates the bottlenecks caused by storage subsystems engineered for slower spinning disks. Additionally, flash is getting denser all the time, which boosts scalability and further conserves space. Finally, the use of technologies like dual controllers and virtualization strategies will help ensure flash storage-systems offer the levels of availability that the large cloud providers tout.

Increasingly, all-flash storage solutions present MSPs the advantages and economies to compete against AWS and the other behemoth cloud repositories.

Storage Upheavals Are Opportunities for MSPs (Managed Service Providers)

For years, the storage business had been reasonably stable. Primary storage was local and backed-up data were nearby or at remote sites along with archived data. Production files were rapidly accessible, and governance and compliance demands were more or less met. The choices were finite and life was, for the most part, relatively orderly.

RAID Fault Tolerance - ACNC

But things change over time. Clouds changed the economics of storage from a capital expense to an operating cost. They offer repositories that are less expensive than do-it-yourself solutions and are virtually infinite in capacity. Moreover, there are now public and private clouds to choose from and the advent of object storage delivers a metadata-based file system that is ideal for stashing billions or even trillions of files.

Add to this software-defined storage (SDS), which abstracts storage from the underlying hardware. SDS offers control and scalability but requires expertise to deploy effectively. Throw in solid-state and in-memory storage, which can greatly enhance the performance of storage systems, as well as the rise of containers, and the storage business has become a vexing mélange of technologies and options.

Which is fertile ground for MSPs. Rather than acquire new layers of IT know-how, enterprises of all sizes and in all verticals can turn to MSPs to navigate the new world of storage. MSPs can manage services in both the data center and cloud to meet the needs for primary, backup, and archival storage cost-effectively.

Opportunistic MSPs can assess organizations’ demands for performance and scalability, and oversee migrations of data to public or private clouds. They can ensure that security, compliance, and governance needs are always met, and implement disaster recovery strategies that satisfy each client’s business mandates. Moreover, MSPs can leverage hardware resources by using virtualization to consolidate multiple resources into a single, cost-efficient cloud offering.

By offering storage as a service, MSPs can find that change and upheaval present business opportunities.

Hybrid vs Public Clouds; Which makes sense for you?

If your organization is not utilizing some form of a cloud, the odds are it soon will. Your question will be what kind cloud—public, private, or hybrid? Private clouds can be dedicated clouds provided by vendors in which no resources are shared by any other customer. They also can be onsite solutions, but building and operating a cloud data center is costly and demands solid IT services. Large enterprises deploy private clouds when security, compliance, and control are paramount, requiring that data be kept within enterprise firewalls or a vendor’s dedicated firewalls. Public and hybrid clouds are for everyone else.

Public clouds offer compelling benefits. They let you outsource IT infrastructure, reducing both hardware expenses and the operational costs of maintaining skilled IT staff. They provide dependability, automated deployments, and scalability. They enhance cost-efficiencies by allowing you to pay only for the resources you use, when you use them. They enable testing environments to be quickly constructed and deconstructed. You can manage your resources and assets in the cloud yourself, or have a managed cloud services provider do it for you.

Moreover, public clouds can slash the relentless costs of storing ever-increasing data troves. Rather than continually invest in on-site storage capacity, you can convert short-term, long-term, backup, and archival storage costs to a more economical operating expense in public clouds. When users must access large files without latency, they can turn to emerging on-premise caching solutions that sync with cloud stores.

There isn’t one definition for hybrid clouds, but they generally offer the best of both worlds by combining public and private cloud services. You can still control key applications and data sets within the confines of the enterprise network while keeping other applications and long-term data sets in the cloud. You can use the public cloud as an insurance policy, expanding services onto the cloud during peak periods when your compute and/or storage demands exceed your onsite capabilities. This is known as cloud bursting. A hybrid cloud can help operational continuity during planned or unplanned outages, blackouts, and scheduled maintenance windows. Hybrid clouds also offer security options. You can keep the apps and data that demand vigorous protections onsite where you have complete control, while running less security-sensitive apps in a public cloud. You even can use a hybrid cloud to experiment with incrementally migrating infrastructure offsite.

Hybrid clouds offer many substantial benefits but bear in mind that they can present management challenges. You’ll need to federate disparate environments (although some vendors argue that management is simplified when the datacenter and cloud both use similar technologies). Scripts need to span public and private infrastructure, policy changes must be applied consistently, and operations and tasks automated across multiple environments. Fortunately, there are cloud management solutions available, with more to come.

Align a cloud solution with your performance needs, security requirements, and budget. You don’t have to put everything in a cloud, but you’ll need to deploy and manage what you don’t. Make a balance sheet of your current and projected capital and operating costs to determine which solution makes the most business sense for your organization.