I've been thinking about the question of the relationship of content and unstructured information to the seemingly never-ending parade of articles about Big Data. There is a certain element to this thinking that frankly is opportunistic, finding me thinking, "Hey, if EVERYONE is going to talk about Big Data, I want a piece of that." But there is also the stubborn reality that unstructured information is the red-headed stepchild of the Big Data equation - and the source of so much untapped value and intelligence in organizations. And our community - users, solution providers, and consultants - knows something about this whole messy question of unstructured information.
I think understanding our role begins with some of the work we have done relative to Systems of Record and Systems of Engagement. Yuchon Lee, a VP with IBM, described the relationship this way:
For the past decade, companies have been accumulating data in what we call a system of record. Those who survive going forward will also have systems of engagement, which start with evaluating how you can have a relevant conversation with each individual customer across all channels. And ensuring you have the analytical capability and the data to support that analysis. That is where the linkage is between the system of record data to system of engagement. On the technology side, we believe the future of handling this volume lies in leveraging the capability of the cloud. A lot of the analysis is done behind a firewall, but the analysis, platform, and architecture is really a hybrid. That is how you solve the problem and get the most value out of the data.
This leads then to the kinds of business applications that are dramatically changed - or even made feasible - by tapping into the power of aggregating and interpreting large volumes of information. I like this list, which comes in part from Cloudera. All of these are applications with a fairly high connection to the land of Systems of Engagement - but are improved by tapping into the information - especially unstructured - hidden away in the land of Systems of Record.
So, I decided to try and get my thinking about the connections between Big Content and Big Data onto one slide. Here it is...
Here's how to walk through my little chart by the numbers...
1. We have spent the past few decades focused on bringing information with a high “value-density” under management. I saw this notion of high-value information first in some good work done by Freeform Dynamics. Translating this concept into my Systems of Record and Systems of Engagement framework, the systems that we have deployed until know have largely centered around trying to identify, store, preserve, and act upon information that has intrinsic value, usually directly in the context of a business process.
2. We have done a better job of managing this high-value Systems of Record information on the structured side than on the unstructured side. This is not only in terms of the % of information under some sort of governance (think of our usual lament that 80% of the information in an organization is unstructured, and most of this in unmanaged), but also in terms of the lack of tools to actively interpret and mine all of this unstructured information in any meaningful way.
3. Systems of Engagement are generating massive volumes of new structured and unstructured information. Per Fortune, by 2020, Internet-connected devices will grow from 400 million today to 50 billion. These devices will be talking to each other and to the Internet. By 2020, it is also predicted that our smartphones will have the capability of storing and accessing as much information as IBM’s Watson and super-computers can. The core difference between this "low-value-density" information and all of the high-value information in Systems of Record is that this new information tends to have value in the aggregate or as it is interpreted rather than intrinsically. In other words, it is easy to see the value in storing a document or a piece of data that documents a specific transaction or process. It is more difficult - and it has been too expensive in the past - to do so with vast quantities of digital flotsam, and jetsam that has value only as it is aggregated and analyzed.
4. Cloud technologies such as HADOOP and NoSQL have dramatically changed the cost of analyzing large volumes of information, making analysis of large amounts this information affordable for the first time.
5. Advances in semantics, search, content and text analytics, and print stream analytics are now making analysis of large amounts of information practical for the first time - especially all of that unstructured information hidden away in digital landfills. In addition, for the first time, natural language processing and visualization technologies are moving the analysis of all of this data and information from technical back rooms and into the executive suite (to help solve the vexing business problems listed above).
6. Lastly, the opportunity that exists now - as reflected in the opening IBM quote - is the marriage of the cloud technologies that are making large scale information analysis affordable for the first time with new analytic and reporting technologies that are making all of this information comprehensible for the first time. A marriage with rich opportunities to move the management of large aggregations from a pure cost calculus (whether hard dollars or risk-based) to one that is balanced by the potential value hidden away in digital landfills.
b