Is big data snake oil?

The business world is hypnotized by the big money promises of big data. But is it just another empty trend? To get some real answers, we need to ask the right questions.



Acclaimed by the Crowned Heads of Europe

There’s a scene in The Wizard of Oz where Dorothy, fresh from the farm, comes across Professor Marvel and his wagon-of-wonders. Marvel, obviously a con man, uses common tricks to convince Dorothy she should return home.

Big Data is like Professor Marvel in the Wizard of Oz: a snake oil salesman with a latent sense of conscience. He hawks empty promises, but he also can use his illusion for a higher purpose.

There is some good in Marvel, but you need to present him with the right situation to uncover it.

Like Professor Marvel, big data also promises to predict the future. “Look into the mighty crystal ball of user behavior” big data seems to say, “and you’ll discover a miraculous gold road leading to an Emerald City of profits.” Unfortunately, Professor Marvel faked it and the world of big data is faking it too. Professor Marvel scammed Dorothy by extrapolating information from a photograph and using gimmicks. Big data fakes it by answering the wrong questions.

Bad Structure and Even Worse Philosophy

Here’s the typical scene in board rooms today: CEO tells the CIO to hire some “data people.” Data people, mostly coming from either a marketing or a programming background, set up shop somewhere in the business or IT department. They use Python, R, Hadoop etc., software and languages meant for the job of handling millions, maybe billions of tiny bits of information. “OK, we’re ready,” the data scientists say. “What do you want to know?”

Oops! We’ve already taken a wrong turn on the yellow-brick road. Did you catch it?

Data science is SCIENCE. It is not meant to be housed under sales or IT. A data science team should report to R&D. If your company outsources its research and development, then it should outsource its data science. Without a well-mapped reporting hierarchy and a centralized research structure, any data team is set up to fall into very common error traps, ones the academic world knows all too well. 

Businesses like AT&T and Proctor & Gamble were founded on research principles, and have, by many reports, succeeded in wrangling big data. This isn’t coincidence. In my career, I’ve hopped between academic research and business/IT; I can tell you science has its own culture that needs protection to eke out trustworthy results.

Good science can’t be conducted in the tornado fields of quarterly goals. If a lab’s culture is dominated by the profit-is-king sentiment, the research is more likely to be tainted by common fallacies and decision errors. The very questions asked will be wrongly posed. In journal publishing, we see this often when a study’s funders “magically” come up with supporting results. Human biases are so predictable that transparency in funding is an explicit and strict tenet in academic research. Academics judge a study’s results on many factors, one of the most important being the source of the study’s financial support.

Knowledge for knowledge’s sake, or even a simple search for truth over illusion, is the purpose of any data inquiry.

But when misplaced data scientists are working under business mores, the path to Oz starts to crack. The data scientists don’t feel the fissures forming because their focus is on the data. Middle management doesn’t notice anything amiss because their eyes and ears are awaiting actionable information. Top management doesn’t look at much besides profit margins and quarterly goals. But these cracks grow and what eventually comes out is an expensive wagon-full of reports that have no more predictive power over the future than Professor Marvel’s crystal ball.

Start simply and give it time

The reporting structure and the philosophy are the keys to successfully wield big data into info you can use. Data science is R&D and the pursuit of truth over fantasy are the core values on which to grow any big data endeavor. Once the team is in the right place with the right attitude, start them off with data gathering. Most companies aren’t even capturing half of the data that’s available to them. Run it off a cloud server service and capture as many points of data as possible. You have to make decisions, obviously, because there exists an infinite amount of data in each user transaction. Still, I’m positive your company isn’t anywhere near exhausting its data mining capability. It will take time to hit the sweet spot, and as the world changes, so will your data points. But data is a living thing, constantly changing and fluid. There will be some basics that will always be relevant (e.g., rudimentary demographics), but be prepared to re-map your route on a regular basis.

No place like home

At the end of The Wizard of Oz, the Wizard (aka Professor Marvel) ends up floating away without Dorothy. Dorothy finds another way back to Kansas, one that had been with her all along. We have it in us to find a truthful and honest way to handle big data. Treat it like the science that it is and integrate it carefully into the business structure, and you’ll discover all you ever needed to know was right there with you the whole time.


Photo credit: Wizard of Oz still, by Insomnia Cured Here on Flickr