In a world of unabashed corporate antagonism, replete with umpteen “founding” and first-mover claims to breakthrough ideas, concepts or methodologies, certain mavericks stand out for their quiet authority.
Like computer scientist John Mashey, founder of the ASSIST assembler language teaching software and author of PWB Unix shell, or "Mashey Shell". He’s arguably believed to be the father of the term Big Data, having christened it in 1994 in a remarkably matter of fact
fashion while he was chief scientist with Silicon Graphics, then a hot and happening Valley player working on Hollywood special effects and spy surveillance systems and hence playing with a lot of data.
Devoid of any academic attribution save for numerous technical talks, thankfully available on websites devoted to technical research, Mashey has only his unflinching conviction to fall back on. He doesn’t need to simply because he’s not staking any claim. Instead, he selflessly right sizes the imagination of people keen to confer the founding title on him, humbly summarising the coinage as only an attempt to settle on an all-inclusive phrase to convey the explosive growth and advancement in computing. This hiking, biking, skiing enthusiast is too busy with his intellectual and creative pursuits to seek reverence for his prescience. This introduction slide from one of his technical presentations (http://www.slideshare.net/amhey/big-data-yesterday-today-and-tomorrow-by-john-mashey-techviser) is a good window into his talent and temperament.
data analyst Douglas Laney, who first recalled Mashey’s name - in the context of big data - through a media correspondence. Douglas is the author of the 2001 pioneering research note 3-D Data Management: Controlling Data Volume, Velocity and Variety and among the earliest to discern that more than growing volumes, it was the data flow speeds, thanks to the collective handiwork of e-commerce and post-Y2k ERP application boom that posed a real challenge to data management teams worldwide. As expected, several vultures from the unabashedly ambitious market place claimed Laney’s research as their own, peddling muddled replications and variations of his 3-V (Volume, Velocity and Variety) framework. Laney’s retort befits his nonconformist nature. He’s posted the contents of his original paper (sadly no longer available in Gartner archives) “for anyone to reference and attribute”. Here it is: http://blogs.gartner.com/doug-laney/deja-vvvue-others-claiming-gartners-volume-velocity-variety-construct-for-big-data/
Like etymologist, editor and Yale researcher Fred Shapiro who traces the origin, development and spread of words as a means to study intellectual evolution, not for academic posterity.
Like University of Pennsylvania economist Francis X. Diebold, who initially claimed to have coined the term in his paper “Big Data Dynamic Factor Models for Macroeconomic Measurement and Forecasting,” but later wrote another research paper to humbly reverse the claim, circuitously acknowledging Mashey’s contribution. To quote him, “The term “Big Data,” which spans computer science and statistics/econometrics, probably originated in lunch-table conversations at Silicon Graphics Inc. (SGI) in the mid 1990s, in which John Mashey ﬁgured prominently.”
And last but not the least, like award-winning journalist Steve Lohr, author of the definitive software chronicle “Go To: The Story of the Math Majors, Bridge Players, Engineers, Chess Wizards, Maverick Scientists and Iconoclasts — The Programmers Who Created the Software Revolution” and “Data-ism: The Revolution Transforming Decision Making, Consumer Behavior, and Almost Everything Else.”
Mashey’s deep connect with Big data came to light through Lohr’s perceptive 2012 search for the term’s origins in loads and loads of digital archives. It was at Lohr’s behest that Shapiro dug out several digital references to trace the origin of Big Data. When he could not come up with anything conclusive, Lohr approached people with knowledge of the subject matter and Diebold and Laney were one of the many people to respond.
Unfazed by the inconclusive results of his hunt, Lohr kept it going, looking for the two words, not merely used as a pair, but used in a manner that would connote the essence as we know it today: massive volumes of structured and unstructured data that move too fast and call for new ways of management. Such usage, Lohr believed, could only be steered by someone with a computing context. Precisely why he zeroed in on Mashey, not on other intriguing but out-of-context references like these two lines from bestseller author Erik Larson’s Harper’s Magazine piece on mailbox junk spread by the direct-marketing industry: “The keepers of big data say they do it for the consumer’s benefit. But data have a way of being used for purposes other than originally intended.”
Hats off to Lohr for his inquisitive and informed search for the name of a phenomenon that’s a now a household name across spheres. Companies flaunting their smallest of Big Data initiatives would do well to learn from Mashey’s prolific nonchalance and Laney’s altruistic activism. Armed with the duo’s frame of mind, they would be in a better position to lock horns with the multihued Big Data challenges including curation, updation and integration. Read all about Lohr’s account in this dated but delightful piece: http://bits.blogs.nytimes.com/2013/02/01/the-origins-of-big-data-an-etymological-detective-story/?_r=0