The amount of information in the world is currently doubling roughly every 20 months. Is the emergence of huge external memory capacities driving this information explosion or resulting from it?
Much as we attribute erratic weather to global warming, I attribute the need, nay the demand for ever-increasing external memory capacity to Big Data.
We may unequivocally state that the era of Big Data has begun. Computer scientists, physicists, business, economists, mathematicians, political scientists, government, intelligence entities, law-enforcement, bio-informaticists, sociologists, and many others are clamoring for access to the massive quantities of information produced by and about people, things, and their interactions. And to obtain this data they mine and analyze information from Twitter, Google, Verizon, , Facebook, Wikipedia, and every space where large groups of people leave digital traces and deposit data (Boyd & Crawford, 2011).
A typical business practice for large-scale data analysis is the utilization of an Enterprise Data Warehouse (EDW) that is queried by Business Intelligence (BI) software. These BI applications produce reports and interactive interfaces that summarize data via aggregation functions, so as to facilitate business decisions (Cohen et al., 2009).
As an example, the oil and gas industry long has utilized high-performance computing (HPC) systems to analyze large data sets and to model underground reserves from seismic data. Critical to this has been the requirement of redundant commodity servers with direct-attached storage (DAS), and their ability to provide the input/output operations per second (IOPS) required to transport the data to the analytics tools (Adshead, 2014).
Furthermore, we find that the data of the world data is doubling every three years and is now measured in exabytes. According to the How Much Information project, print, film, magnetic and optical storage media produced about 5 exabytes of new information in 2002. That is equivalent to 37,000 new Library of Congress size libraries, with its 17 million books. Of this new data, roughly 92% resides on magnetic media, mostly on hard drives (“Executive Summary, 2014).
Intel predicts the Era of Tera, will necessitate systems that process teraflops (a trillion floating point operations per second), terabits / second of bandwidth, and terabytes (1,024 gigabytes) of data storage.
To handle all this information, people will need systems that can help them understand and interpret data, and they will find that search engines will not be up to the task.
As anyone who has searched for anything on the Web knows, results will often yield tens of thousands of results with no relevance to the search. Thus we need computers to be able to “see” data the way we do, to look beyond the 0’s and 1’s and identify what is useful to us and assemble it for our review (Dubey, 2005).
Thus future computer and data innovations must be cognizant that: “The great strength of computers is that they can reliably manipulate vast amounts of data very quickly. Their great weakness is that they don’t have a clue as to what any of that data actually means” (Cass, 2004).
References
Adshead, A. (2014). Big data storage: Defining big data and the type of storage it needs. Retrieved from http://www.computerweekly.com/podcast/Big-data-storage-Defining-big-data-and-the-type-of-storage-it-needs
Boyd, D., & Crawford, K. (2011). Six provocations for big data.
Cass, S. (2004). Fountain of knowledge [analysis engine]. Spectrum, IEEE, 41(1), 68-71.
Cohen, J., Dolan, B., Dunlap, M., Hellerstein, J. M., & Welton, C. (2009). MAD skills: new analysis practices for big data. Proceedings of the VLDB Endowment, 2(2), 1481-1492.
Dubey, P. (2005). Recognition, mining and synthesis moves computers to the era of tera. Technology@ Intel Magazine, 1-10.
Executive Summary. (2014). How Much Information Project. Retrieved from http://www2.sims.berkeley.edu/research/projects/how-much-info/summary.html