HYBRID FLASH STORAGE

HYBRID FLASH STORAGE

Flash Hybrid Storage

IT departments are struggling to keep up with the massive growth in data as well as the complexity in the types of data stored. From digital text files, to videos to machine-generated data, a storage solution has to be able to handle not just the volume, but the variety of data. With all of that data, inevitably there becomes the IO (input/output) bottleneck.

Flash technology is the most widely accepted solution for addressing the IO bottleneck, but even Flash arrays differ greatly. Depending on your business needs, investing in server-side flash or all-flash arrays can be justifiable, but is generally at a much steeper price.

That’s where the hybrid flash storage solution comes into play. For businesses looking for low latency, high IOPs performance, and high storage capacity, a hybrid flash fits the bill. The top advantages of a hybrid-flash solution include:

1. Cost

2. Performance

3. Reliability and error reduction

4. Data protection

5. Scalability

Hybrid flash arrays benefit many environments by balancing performance and capacity needs at a lower cost than most all-flash systems.  Learn more about AC&NC’s Flash Hybrid solutions for your business.

Fibre Channel RAID: When Bandwidth Matters

Fibre Channel RAID

Big data is commonly perceived as large data sets comprised of multitudinous small files. Often, however, big data’s data sets contain files that are relatively few in number, but very large in size. This is particularly true for companies in such businesses as video production. Video files have always been large, and the increasing resolution of video formats renders them even larger. Production houses can routinely generate terabytes of video every week. Editing these files, along with other post-production work, demands storage with exceptional I/O capabilities.

Another video application that is growing rapidly and challenging storage solutions is video surveillance. Recordings made by surveillance cameras can be hours if not days in length and their improving resolution further swells the size of the files. Video surveillance systems, especially those with multiple cameras, put great pressure on a storage system’s I/O capabilities.

What to do? Fibre Channel RAID platforms deliver the throughput to meet such needs. They can ingest and serve up vast amounts of data, making them ideal for storing and working with high-definition or multi-camera video feeds. Moreover, they feature lossless operation, eliminating such issues as dropped packets. With FC RAID arrays, users are assured of storing every single pixel.

Such reliable, high performance can serve other I/O-intensive applications as well, such as server and desktop virtualizations. The infamous boot storms of virtual desktop infrastructures (VDI) can overwhelm all but the most robust storage solutions. If you want to do video editing or support scores of virtual applications on multiple physical servers all without latency, look into FC RAID arrays. In addition to their impressive throughput, they also can be flexible, highly reliable solutions.

The Growing Popularity of IP SANs

IP SAN CCTV

The Internet Protocol, simply known as IP, is easily the most widely used networking standard on Earth. The technology is well proven, familiar to IT staffs, and affordable. IP is common within corporate networking infrastructures, except when it comes to storing data. Today, however, there is growing adoption of IP-based storage area networks (SANs).

An IP SAN uses the iSCSI protocol to transfer block-level data over a network, generally with Gigabit or 10 Gigabit Ethernet. The advantages of this approach are well defined. Deploying IP SANs is simpler than traditional strategies. Because the technologies are well known and used in most networks, IP SANs can reduce both interoperability issues and training costs. IP SANs also offer better performance than file-level network attached storage (NAS). They are formidable choices for supporting server and storage virtualization deployments using VMware, Citrix Xen, and Microsoft Hyper-V. They are also effective for video applications like CCTV IP video because they allow multiple networked video recorders to share a single storage device. Similarly, they are ideal for data backup because multiple backup servers can share one storage platform.

If you want a storage pool that delivers enterprise-class performance and reliability and is also cost-effective and relatively simple to deploy, consider building an IP SAN with robust RAID solutions. You’ll be working with familiar technologies and eliminate the need for alternatives that are more expensive and complicated.

JBOD RAID – Cost Effective Storage Solution

JetStor JBOD RAID

RAID gets all the attention nowadays, but there’s still a place for RAID’s less talented cousin, JBOD.  Both are comprised of multiple physical drives, so what’s the difference?  Relying on storage virtualization, RAID, or redundant array of inexpensive/independent disks, divides, replicates, and distributes data across the multiple drives.  JBOD, an acronym for just a bunch of disks, is precisely that—just a bunch of disks.  It has few of the merits of RAID, but there’s virtue in simplicity.

The individual disks of a JBOD array can each serve as a volume or they can be concatenated, or spanned, to form one single logical volume or LUN.  Whereas RAID demands that all drives be of similar capacity, JBOD drives can be of various sizes.

JBOD’s primary advantage is capacity utilization.  A JBOD array fully utilizes all the space on its drives.  A JBOD with four 300 gigabyte drives, for example, provides 1200 gigabytes of usable capacity.  In contrast, a similar RAID array will offer less than 1200 gigs of storage capacity because of the need to store redundant data.  How much space is needed for this redundancy depends on the particular RAID configuration.  RAID 1, for example, mirrors data on two or more drives, which means that only half the capacity of the disks is available for storage, the other half being used to store the duplicated copy.

As a result, JBOD is a cost-effective storage solution. JBOD controllers are less expensive than RAID controllers and you can mix and match disks while using every block for primary storage.

But JBODs have downsides.  The read/write operations of RAID can be much faster than those of JBODs.  With RAID, the data stream can be divided and stored on multiple disks concurrently, whereas JBODs store the data stream on one disk at a time.  RAID 0, for example, stripes data across two more or more disks to accelerate read/write operations.  Note, however, that it provides no redundancy.

More importantly, JBOD does not offer the redundancy of RAID.  If a disk is corrupted in a JBOD array, all your data is at risk, including what’s on the other drives.  With RAID, you can lose one or more disks and still preserve your data, depending on the configuration.

So when are JBODs useful?  If you have a lot of data to store, particularly if just temporarily, JBODs are an economical solution.  Make sure, however, that the data is not critical or you have an effective backup scheme in place.

Data Deduplication Efficiency

Data Deduplication Efficiency

Every day, companies generate or acquire more data, often at alarming rates. This data needs to be stored on disks or eventually tape. Reliable data storage is not cheap, however, especially when ancillary costs like electricity, cooling, maintenance, and floor space are factored in. This is why data deduplication is gaining in popularity. It reduces the amount of data that must be stored.  Data deduplication does this by ensuring that only a single instance of data is saved. For example, imagine that a PowerPoint presentation is distributed to the ten members of a workgroup. Each member will save a copy of the presentation. This means ten copies of the same file are stored, which is hardly efficient. Data deduplication comes to the rescue by replacing nine copies with pointers to the one unique file. When a user accesses the file, the one unique file is opened thanks to the pointer. The user assumes the file resides on a local laptop or desktop and enterprises greatly stretch their storage resources. In addition to reducing the need for storage capacity, data deduplication can improve recovery time objectives (RTOs) and lessen the need for tape backups.

There are several flavors of data deduplication. File level data deduplication eliminates entire files that are duplicated, such as the PowerPoint example above. Block level data deduplication, on the other hand, is much more granular. It identifies and preserves only the blocks in a file that are unique and discards all the redundant ones. When a file is updated, just the changed data is saved. By saving only unique blocks rather than entire unique files, block level data deduplication is much more efficient than file level data deduplication.

Another way that data deduplication strategies differ is where they are actually deployed. Source data deduplication is performed in primary storage before the data is forwarded to the backup system. This approach reduces the bandwidth needed to perform backups, but there can be interoperability issues with existing systems and applications, and it consumes more CPU cycles, which could impact performance elsewhere.

The other strategy is target data deduplication that occurs in the backup system, such as on a RAID storage array. Target data deduplication is simpler to deploy and is available in two modes. Post-process data deduplication occurs after the data has been stored and, consequently, requires greater storage capacity. In-line data deduplication occurs before the data is copied and, as a result, requires less storage capacity.

Data deduplication can do nothing to stave off the torrents of data that need to be saved, but it can make storage more cost-effective.  A robust RAID array with inline target data deduplication can reduce the quantity of data that is stored with minimal impact on other systems, making for greater storage efficiencies.

Big Data and HPC

Big Data HPC

One consequence of “big data” is high-performance computing is creeping into the enterprise space. High-performance computing, known as HPC, was once confined to scientific and engineering endeavors that require immense number crunching, such as modeling weather systems or nuclear reactions. Now enterprises are deploying their own big data analytics as they seek to process and understand ever-increasing torrents of big data. Their troves are fed by such sources as online shopping, customer interactions, social media clicks, product development, marketing initiatives, and events on their sprawling networks.

There are two basic strategies for attaining HPC. One is deploying supercomputers, but this big iron approach is very costly. The other is using clusters of computers that are linked by high-speed connections. This approach leverages commodity hardware, which reduces costs, and is thus gaining in popularity. Making it more practical are solutions like Hadoop MapReduce, which enable the distributed processing of big data sets on clusters of commodity servers.

HPC deployments also use parallel file systems like IBM’s General Parallel File System™ (GPFS™) that provide CPUs with shared access to big data in parallel and address the extreme I/O demands of HPC. The need to move huge amounts of big data in and out of processing clusters is why these implementations are often anchored by high-speed RAID solutions. By using robust RAID arrays that support such file systems as GPFS and the Zettabyte File System (ZFS), enterprises have affordable, extremely fast throughput to support the number crunching of scores if not hundreds of computing clusters.

We live in a data-intensive age and those businesses and organizations that best derive insight and meaning from their big data will prosper. To do so, however, they need solutions for big data analytics that not only deliver the horsepower to do the job, but also are practical and cost-efficient to buy, maintain, and operate.

Public, Private & Hybrid Clouds

Hybrid Clouds

Most people by now have a sense of what clouds are. When IT services and resources are hosted offsite somewhere else, they are often in clouds. But clouds have certain characteristics that define them, as opposed to simple offsite hosting. Clouds are elastic in that their services can scale up and down according to their customers’ needs. When you require additional storage, for example, you can get more capacity quickly and easily in a cloud. Conversely, you can flexibly decrease capacity. By extension, clouds have billing and metering of service usage so you pay only for what you use. Additionally, your users, such as workgroups, have self-service provisioning that enables them to rapidly obtain resources for projects or other needs. These functionalities make clouds flexible, cost-effective, and attractive for many enterprises.

There are several kinds of clouds, however, and some are more appropriate for particular circumstances than others. Public clouds are virtualized data centers outside of their customers’ private networks. Companies access their resources via the Internet and these resources reside on virtualized servers that share physical devices with other customers in what is known as a multi-tenant architecture. Public clouds are useful for enterprises with distributed employees working collaboratively on projects, or for testing and developing applications.

However, although many enterprises value the versatility offered by clouds, they are wary of security concerns when their precious data must traverse the Internet to be hosted offsite by third parties. This has led to the rise of private clouds, which have the characteristics of public clouds but are hosted locally within a company’s data center(s). Enterprises must buy and manage their cloud resources, which can be costlier than outsourcing to public clouds. But companies also have greater control over their data and applications, and because everything is behind their firewalls, they can enjoy greater security. Private clouds are appropriate in industries with stringent regulatory requirements.

Many companies are turning to a combination of public and private clouds in a model appropriately called hybrid clouds. Hybrid clouds enable businesses to keep vital resources like financial and proprietary data local in private clouds within the company firewalls, but they permit the outsourcing of less sensitive functions like email services to public clouds for cost savings. Moreover, they allow cloud bursting to meet spikes in demand. When seasonal sales or major projects exceed the compute or storage capabilities of the private cloud, public clouds can be called upon temporarily to meet the extra demand.

The bottom line is clouds not only offer remarkable flexibility, but they also can be flexibly deployed depending on each company’s needs and resources.

Hadoop & Big Data Analysis

Hadoop

Enterprises generate vast volumes of data every day and they need to extract business value from this information. However, data troves are so huge that traditional business intelligence tools like relational databases and math packages are no longer effective. What has recently emerged as the de facto standard for big data analysis is Apache Hadoop, a free, Java-based programming framework. Hadoop is effective because it distributes the processing to where the big data is stored. A large data cluster is broken down into hundreds or thousands of nodes where the computing is actually done. This provides for extraordinarily scalable workloads and because Hadoop replicates big data across the cluster, the failure of any node does not impact processing. This enables Hadoop jobs to be conducted, and big data to be stored, on commodity hardware. Hadoop is said to be a little like RAID in that instead of replicating big data across many inexpensive disks, big data is replicated across many inexpensive servers.

Hadoop basically consists of two parts—the Hadoop Distributed File System (HDFS) and MapReduce, both of which are derived from Google technologies. HDFS enables the distributed architecture and uses a system called NameNode to track big data across the nodes. MapReduce is the not so secret sauce at the core of Hadoop. The Map function distributes the processing to the individual nodes and Reduce collates the work from all the nodes to produce a single result.

Hadoop is a platform upon which specific applications can be created and run to process and analyze even petabytes of big data. It is a practical, cost-effective big data solution for data mining, financial analyses, and scientific simulations. If it isn’t already, Hadoop will someday directly or indirectly impact your business.

Big Data Footprint

Big Data Footprint

Definitions of big data, like those of clouds, vary, but at its essence big data is simply very big data sets.  How big is relative.  On one extreme, big science initiatives like the Large Hadron Collider or global climate studies routinely produce data sets that are many petabytes in size.  Large enterprises also confront petabyte-sized data troves, although smaller organizations may find that terabyte-sized stores can qualify as big data.

What’s for sure is big data will only get bigger.  Every day, increasingly large amounts of data are being generated from computer clicks, software logs, transactions, mobile devices, cameras, and sensors.  According to IBM, we create 2.5 quintillion bytes of data daily.  That’s over two million terabytes.  Ninety percent of the world’s data was created in just the last two years.

What’s also certain is much of this data needs to be stored.  There have always been reasons for preserving information, such as to meet industry regulations, corporate governance, and best business practices.  But what’s upsetting the apple cart is the recognization that large data sets can hold invaluable information.  While this has always been true in big science, organizations of all kinds are discovering that data, when aggregated in large sets, can reveal invaluable trends and intelligence on their processes, customers, and industries.  Traditional data processing and database tools, however, are not always effective with very large data sets.  As a result, new analytics tools are being developed to coax insights, knowledge, and value out of caches of big data.

In our next blog, we’ll peek at how big data sets can be stored, processed, and analyzed.

Resilient File System (ReFS) in Windows Server 8

Windows Server 8

Cost-conscious storage administrators will want to note a feature in Windows Server 2012, popularly known as “Windows Server 8,” called Resilient File System (ReFS). Working with Storage Space, another Windows Server 2012 feature that provides data protection similar to RAID arrays, Resilient File System (ReFS) makes the use of commodity disks for storage more feasible. Organizations can reduce their expenses by safely storing and protecting their data, regardless of the underlying hardware and software stack. Resilient File System (ReFS) relies on existing NTFS code to ensure strong compatibility and offers data integrity and availability. For example, when the system is used with a mirrored Storage Space, corruption of either metadata or user data can be repaired automatically by using the mirror copy in Storage Spaces. Corrupt files are deleted and restored from the backup. Even better, repairs are localized to the area of corruption and are done online, eliminating the burdensome need to take a volume offline for repairs.

Moreover, Resilient File System (ReFS) supports petabytes of storage, making it a highly scalable solution able to meet burgeoning storage needs well into the future. As a result, enterprises, large and small, can use the file server with a JBOD configuration with Serial ATA (SATA) or Serially Attached SCSI (SAS) drives for storage that is both safe and cost-effective.

Resilient File System (ReFS) does have some limits. You cannot boot with it and the file system is not supported on removable media.  Resilient File System (ReFS) also does not offer data deduplication, but third-party dedupe solutions will continue to work.