Big Data and Big Content – Just Hype or a Real Opportunity?
John Mancini

By: John Mancini on March 15th, 2012

Print/Save as PDF

Big Data and Big Content – Just Hype or a Real Opportunity?

Enterprise Content Management (ECM)  |  Big Data

I've been thinking about the question of the relationship of content and unstructured information to the seemingly never-ending parade of articles about Big Data. There is a certain element to this thinking that frankly is opportunistic, finding me thinking, "Hey, if EVERYONE is going to talk about Big Data, I want a piece of that." But there is also the stubborn reality that unstructured information is the red-headed stepchild of the Big Data equation - and the source of so much untapped value and intelligence in organizations. And our community - users, solution providers, and consultants - knows something about this whole messy question of unstructured information.

I think understanding our role begins with some of the work we have done relative to Systems of Record and Systems of Engagement. Yuchon Lee, a VP with IBM, described the relationship this way:

For the past decade, companies have been accumulating data in what we call a system of record. Those who survive going forward will also have systems of engagement, which start with evaluating how you can have a relevant conversation with each individual customer across all channels. And ensuring you have the analytical capability and the data to support that analysis. That is where the linkage is between the system of record data to system of engagement. On the technology side, we believe the future of handling this volume lies in leveraging the capability of the cloud. A lot of the analysis is done behind a firewall, but the analysis, platform, and architecture is really a hybrid. That is how you solve the problem and get the most value out of the data.

Get Your Free eBook: Big Data - Extracting Value from Your Digital Landfills

This leads then to the kinds of business applications that are dramatically changed - or even made feasible - by tapping into the power of aggregating and interpreting large volumes of information. I like this list, which comes in part from Cloudera. All of these are applications with a fairly high connection to the land of Systems of Engagement - but are improved by tapping into the information - especially unstructured - hidden away in the land of Systems of Record.

  1. Modeling risk and failure prediction
  2. Analyzing customer churn
  3. Web recommendations (ala Amazon)
  4. Web ad targeting
  5. Point of sale transaction analysis
  6. Threat analysis
  7. Compliance and search effectiveness

So, I decided to try and get my thinking about the connections between Big Content and Big Data onto one slide. Here it is...

The relationship between big data and big content

Here's how to walk through my little chart by the numbers...

1. We have spent the past few decades focused on bringing information with a high “value-density” under management. I saw this notion of high-value information first in some good work done by Freeform Dynamics. Translating this concept into my Systems of Record and Systems of Engagement framework, the systems that we have deployed until know have largely centered around trying to identify, store, preserve, and act upon information that has intrinsic value, usually directly in the context of a business process.

2. We have done a better job of managing this high-value Systems of Record information on the structured side than on the unstructured side. This is not only in terms of the % of information under some sort of governance (think of our usual lament that 80% of the information in an organization is unstructured, and most of this in unmanaged), but also in terms of the lack of tools to actively interpret and mine all of this unstructured information in any meaningful way.

3. Systems of Engagement are generating massive volumes of new structured and unstructured information. Per Fortune, by 2020, Internet-connected devices will grow from 400 million today to 50 billion. These devices will be talking to each other and to the Internet. By 2020, it is also predicted that our smartphones will have the capability of storing and accessing as much information as IBM’s Watson and super-computers can. The core difference between this "low-value-density" information and all of the high-value information in Systems of Record is that this new information tends to have value in the aggregate or as it is interpreted rather than intrinsically. In other words, it is easy to see the value in storing a document or a piece of data that documents a specific transaction or process. It is more difficult - and it has been too expensive in the past - to do so with vast quantities of digital flotsam, and jetsam that has value only as it is aggregated and analyzed.

4. Cloud technologies such as HADOOP and NoSQL have dramatically changed the cost of analyzing large volumes of information, making analysis of large amounts this information affordable for the first time.

5. Advances in semantics, search, content and text analytics, and print stream analytics are now making analysis of large amounts of information practical for the first time - especially all of that unstructured information hidden away in digital landfills. In addition, for the first time, natural language processing and visualization technologies are moving the analysis of all of this data and information from technical back rooms and into the executive suite (to help solve the vexing business problems listed above).

6. Lastly, the opportunity that exists now - as reflected in the opening IBM quote - is the marriage of the cloud technologies that are making large scale information analysis affordable for the first time with new analytic and reporting technologies that are making all of this information comprehensible for the first time. A marriage with rich opportunities to move the management of large aggregations from a pure cost calculus (whether hard dollars or risk-based) to one that is balanced by the potential value hidden away in digital landfills.

 

bFree eBook: Big Data - Extracting Value from Your Digital Landfills

About John Mancini

John Mancini is the President of Content Results, LLC and the Past President of AIIM. He is a well-known author, speaker, and advisor on information management, digital transformation and intelligent automation. John is a frequent keynote speaker and author of more than 30 eBooks on a variety of topics. He can be found on Twitter, LinkedIn and Facebook as jmancini77. Recent keynote topics include: The Stairway to Digital Transformation Navigating Disruptive Waters — 4 Things You Need to Know to Build Your Digital Transformation Strategy Getting Ahead of the Digital Transformation Curve Viewing Information Management Through a New Lens Digital Disruption: 6 Strategies to Avoid Being “Blockbustered” Specialties: Keynote speaker and writer on AI, RPA, intelligent Information Management, Intelligent Automation and Digital Transformation. Consensus-building with Boards to create strategic focus, action, and accountability. Extensive public speaking and public relations work Conversant and experienced in major technology issues and trends. Expert on inbound and content marketing, particularly in an association environment and on the Hubspot platform. John is a Phi Beta Kappa graduate of the College of William and Mary, and holds an M.A. in Public Policy from the Woodrow Wilson School at Princeton University.