By: John Mancini on April 16th, 2012

Big Data – Does Size Really Matter?

According to IDC, the amount of information in the Digital Universe will grow by a factor of 44 between now and the end of the decade. Even more challenging, the number of containers or files will grow by a factor of 67. The subset of information that needs to be secured is growing almost twice as fast. And the amount of UNPROTECTED yet sensitive data is growing even faster.

And while all of this is going on, the number of IT professionals in the world will grow only by a factor of 1.4. [IDC, Digital Universe]

Well, hell, that's a lot of bits and bytes.

The shift to Systems of Engagement dramatically increases the complexity and volume of data and information that must be managed within an organization. IT, we understand that not everything can or should be saved forever because of the litigation risk associated with saving "everything." We also understand that there is a growing value in mining the huge masses of information we are accumulating. We also understand that these two statements are not necessarily consistent with each other. Help us make sense of our irrationality.

On the "risk" side of the equation, the volume of information coming at us is making it clear that manual information retention and disposition processes simply extended from the world of Systems of Record will no longer suffice. Aside from the sheer enormity of the task, a lack of clarity about what content is valuable is the main obstacle, along with the fear of getting it wrong and a sense that there is no immediate ROI from getting rid of outdated information.

The reality in most organizations is that traditional approaches to information governance are a joke, and it's not for lack of effort. It was never realistic to assume that knowledge workers would assist in manually classifying documents according to a complex records retention schedule, and it is equally unrealistic to assume that we will manage the firehose of data and unstructured ephemeral social content with the same kinds of records rigor that we applied to retention of a life insurance policy for the life of the policyholder.

Clearly, adapting to the world that is upon us is proving problematic:

Two-thirds of organizations have an Information Management Strategy, only 22% use it.
79% of organizations have an Information Retention policy, but only 32% enforce it.
70% have mobile device rules, and social media rules, only 30% enforce them.
58% of organizations say that a single enterprise records management model underlying all content systems is their goal, yet only 9% have achieved this.

Yuchon Lee, a VP with IBM, describes the "value" side of the equation this way...

"For the past decade, companies have been accumulating data in what we call a system of record. Those who survive going forward will also have systems of engagement, which start with evaluating how you can have a relevant conversation with each individual customer across all channels. And ensuring you have the analytical capability and the data to support that analysis. That is where the linkage is between the system of record data to system of engagement. On the technology side, we believe the future of handling this volume lies in leveraging the capability of the cloud. A lot of the analysis is done behind a firewall, but the analysis, platform, and architecture is really a hybrid. That is how you solve the problem and get the most value out of the data."

Many analytical solutions were not possible previously in the world of unstructured information because: 1) they were too costly to implement; 2) they were not capable of handling the large volumes of data involved in a timely manner; or 3) the required data simply did not exist in an electronic form. New tools now bring the capabilities of business intelligence and the benefits of optimization, asset management, pattern, detection and compliance monitoring to the world of unstructured information. Cloud technologies such as HADOOP and NoSQL have dramatically changed the cost of analyzing large volumes of information, making analysis of large amounts this information affordable for the first time.

The systems that we have deployed until know have largely centered around trying to identify, store, preserve, and act upon information that has intrinsic value, usually directly in the context of business process. We have done a better job of managing this high-value Systems of Record information on the structured side than on the unstructured side. This is not only in terms of the % of information under some sort of governance (think of our usual lament that 80% of the information in an organization is unstructured, and most of this in unmanaged), but also in terms of the lack of tools to actively interpret and mine all of this unstructured information in any meaningful way.

Systems of Engagement are generating massive volumes of new structured and unstructured information. Per Fortune, by 2020, Internet-connected devices will grow from 400 million today to 50 billion. These devices will be talking to each other and to the Internet. By 2020, it is also predicted that our smartphones will have the capability of storing and accessing as much information as IBM’s Watson and super-computers can. The core difference between this "low-value-density" information and all of the high-value information in Systems of Record is that this new information tends to have value in the aggregate or as it is interpreted rather than intrinsically. In other words, it is easy to see the value in storing a document or a piece of data that documents a specific transaction or process. It is more difficult - and it has been too expensive in the past - to do so with vast quantities of digital flotsam, and jetsam that has value only as it is aggregated and analyzed.

Advances in semantics, search, content and text analytics, and print stream analytics are now making analysis of large amounts of information practical for the first time - especially all of that unstructured information hidden away in digital landfills. In addition, for the first time, natural language processing and visualization technologies are moving the analysis of all of this data and information from technical back rooms and into the executive suite (to help solve the vexing business problems listed above).

The business needs to acknowledge that the old world of paper-driven records management approaches is dead; and we need IT's help in mitigating the risks associated with the death of this nice, predictable world. We also desperately need to get more value out of all the "stuff" we are gathering -- and use this intelligence to improve customer responsiveness and anticipate and predict where the business will go next.

About John Mancini

John Mancini is the President of Content Results, LLC and the Past President of AIIM. He is a well-known author, speaker, and advisor on information management, digital transformation and intelligent automation. John is a frequent keynote speaker and author of more than 30 eBooks on a variety of topics. He can be found on Twitter, LinkedIn and Facebook as jmancini77. Recent keynote topics include: The Stairway to Digital Transformation Navigating Disruptive Waters — 4 Things You Need to Know to Build Your Digital Transformation Strategy Getting Ahead of the Digital Transformation Curve Viewing Information Management Through a New Lens Digital Disruption: 6 Strategies to Avoid Being “Blockbustered” Specialties: Keynote speaker and writer on AI, RPA, intelligent Information Management, Intelligent Automation and Digital Transformation. Consensus-building with Boards to create strategic focus, action, and accountability. Extensive public speaking and public relations work Conversant and experienced in major technology issues and trends. Expert on inbound and content marketing, particularly in an association environment and on the Hubspot platform. John is a Phi Beta Kappa graduate of the College of William and Mary, and holds an M.A. in Public Policy from the Woodrow Wilson School at Princeton University.