With few exceptions, articles about Big Data start off with promises to be smarter, run more efficiently, or make more money. As proof, each article cites standard examples of how data analytics and robotics have transformed warehouse operations, IBM’s Watson’s mastery over Jeopardy, the game show, and how firms will make decisions more effectively.
Examples of success may be far fewer than we realize given the context of a future state as opposed to the few actual case studies cited above. Real or not we may learn more from stories of failure to gauge how much progress we have yet to achieve.
Big Data requires an infrastructure that does not exist in its entirety today. The infrastructure of Big Data is evolving very rapidly but exists at the lower end of the S-Curve in its development and sophistication. In other words, it is still an immature concept. What is this infrastructure?
- A robust Big Data infrastructure requires the following
- Skilled knowledge workers – quantitative and qualitative
- A set of business standards succinctly defining Big Data
- Well defined data set of structured and unstructured data
- Data scrubbing capabilities: in-house or vendor-based
- Efficacious and repeatable operating standards allowing for industry adoption as opposed to one-off solutions
This incomplete framework is not intended to be exhaustive or comprehensive. Its intent is to acknowledge that Big Data may evolve along the same parallel path as the evolution of cloud computing which is also in its infancy as an industry.
There is a major race to gear up and develop talent for what may become one of the largest growth industries in the 21st century. At a recent business conference in Boston real case studies demonstrated the success and obstacles to realizing the potential of Big Data.
Major tax preparer prepares to learn more about its customer’s needs for new product development.
Opportunity: a high velocity/volume business (data rich); high security IT (demographic data); high contact (good historical data).
Challenges: complex software (multiple versions); multiple SKUs (inconsistent data); high levels of text data (unstructured); data set definition (lack of taxonomy defining key data); recycle results (continuous trial and error cycles)
Outcome: Long cycle project; steep learning curve; continuous restarts
Web-based start-up for mothers focused on child development.
Opportunity: Multiple data collectors (suite of apps used to collect a variety of data); Baby social network (user-generated data); Adaptive learning (behavioral patterns discernible);
Challenges: Lack of real-time processing of data (suboptimal feedback); missing data (gaps in clean data); lack of end-to-end clarity (cause and effect of change); length of big data projects costly and time –consuming (start small); lack of specialists to code scrubbing scripts (business acumen)
Outcome: Costs exceeded budget; redundant processes; lack of appropriate skills to complete project
These case stories represent a small sample of the not so successful implementations of Big Data. Small samples should never be used to predict outcomes. These case stories do however provide useful and sobering information and should be included along with the benefits of Big Data.
Here are a few additional observations:
- The cost of storage of Big Data is large
- What is the net present value of Big Data? ROI may be hard to quantify
- The tools for system developers are very immature to process Big Data effectively
- Redundancy of effort is a problem; but may be unavoidable due to immature processes
- Bridge the gap between technical expertise, which exists, and a well-defined business vision for Big Data.
- Bioinformatics skills do not exist today or are in short supply
- Understanding the right data to solve a specific business problem
- Deciding early on if the right data exists to solve a business problem
- Start small
- Organize around small data upfront to ensure that Big Data produces reliable outcomes
- The legal and regulatory environment may not keep up with technical product cycles – limits on trademark and intellectual property will be challenged
Looking backwards from the future these observations may simply turn out to be speed bumps in the progress towards Big Data. Unimagined new industries may undoubtedly follow yet much work is needed to build a sustainable framework in support of Big Data. Failures in Big Data warn us not to become too complacent.
The art and science of Big Data whether transformational or not is here to stay as a tool for converting data into information. How we use and build the tools of Big Data will ultimately depend on the infrastructure to support these efforts.
The 2010 Flash Crash is a perfect example of the disastrous effects of the unintended circumstances of Big Data and use of data analytics to perform tasks human are unable or no longer willing to do.
Traders and investors far removed from algorithmic trading lost thousands if not millions of dollars in a matter of minutes because of unknown triggers that sent the Dow plummeting 600 points.
Precisely because of these challenges a number of financial engineers are seeking to find ways to anticipate systemic risks in the economy before they happen. The complexity of algorithmic trading and the interrelationships across global economies and markets will require a better understanding of the cascading effects of these systems.
Andrew Lo, a professor at Sloan Business School and director of MIT’s Laboratory for Financial Engineering, kicked off his talk, called “Measuring and Managing the Complexity of the Financial System,” by showing two charts that neatly illustrate the complexity and interdependency of the current financial system.
“The first shows relationships between various major financial institutions roughly 20 years ago:”
“The second shows the same relationships just 10 years later:”