Big Data Footprint

Big Data Footprint

Definitions of big data, like those of clouds, vary, but at its essence big data is simply very big data sets.  How big is relative.  On one extreme, big science initiatives like the Large Hadron Collider or global climate studies routinely produce data sets that are many petabytes in size.  Large enterprises also confront petabyte-sized data troves, although smaller organizations may find that terabyte-sized stores can qualify as big data.

What’s for sure is big data will only get bigger.  Every day, increasingly large amounts of data are being generated from computer clicks, software logs, transactions, mobile devices, cameras, and sensors.  According to IBM, we create 2.5 quintillion bytes of data daily.  That’s over two million terabytes.  Ninety percent of the world’s data was created in just the last two years.

What’s also certain is much of this data needs to be stored.  There have always been reasons for preserving information, such as to meet industry regulations, corporate governance, and best business practices.  But what’s upsetting the apple cart is the recognization that large data sets can hold invaluable information.  While this has always been true in big science, organizations of all kinds are discovering that data, when aggregated in large sets, can reveal invaluable trends and intelligence on their processes, customers, and industries.  Traditional data processing and database tools, however, are not always effective with very large data sets.  As a result, new analytics tools are being developed to coax insights, knowledge, and value out of caches of big data.

In our next blog, we’ll peek at how big data sets can be stored, processed, and analyzed.