Big Data and HPC

Big Data HPC

One consequence of “big data” is high-performance computing is creeping into the enterprise space. High-performance computing, known as HPC, was once confined to scientific and engineering endeavors that require immense number crunching, such as modeling weather systems or nuclear reactions. Now enterprises are deploying their own big data analytics as they seek to process and understand ever-increasing torrents of big data. Their troves are fed by such sources as online shopping, customer interactions, social media clicks, product development, marketing initiatives, and events on their sprawling networks.

There are two basic strategies for attaining HPC. One is deploying supercomputers, but this big iron approach is very costly. The other is using clusters of computers that are linked by high-speed connections. This approach leverages commodity hardware, which reduces costs, and is thus gaining in popularity. Making it more practical are solutions like Hadoop MapReduce, which enable the distributed processing of big data sets on clusters of commodity servers.

HPC deployments also use parallel file systems like IBM’s General Parallel File System™ (GPFS™) that provide CPUs with shared access to big data in parallel and address the extreme I/O demands of HPC. The need to move huge amounts of big data in and out of processing clusters is why these implementations are often anchored by high-speed RAID solutions. By using robust RAID arrays that support such file systems as GPFS and the Zettabyte File System (ZFS), enterprises have affordable, extremely fast throughput to support the number crunching of scores if not hundreds of computing clusters.

We live in a data-intensive age and those businesses and organizations that best derive insight and meaning from their big data will prosper. To do so, however, they need solutions for big data analytics that not only deliver the horsepower to do the job, but also are practical and cost-efficient to buy, maintain, and operate.