Overcoming the Challenge of Unstructured Information

By: Kevin Craine on August 26th, 2019

Overcoming the Challenge of Unstructured Information

According to AIIM research, 75% of the organizations we surveyed view digital transformation as “important” or “very important” to their organization. Survey respondents point to techniques like advanced data capture, machine learning, and process automation to provide the powerful potential to reengineer and improve core business processes.

The trouble, however, is that that the majority of information capture and content management solutions on the market have been built to work with highly-structured and pre-determined information and workflows. Feedback from our AIIM community of practitioners tells us that working with unstructured information is one of the biggest barriers to digital transformation.

Structured and Unstructured Information

So how can you begin to overcome the challenge? One place to start is to differentiate between structured and unstructured information clearly. We can do this in several key ways – and we need to because how we manage information is significantly influenced by whether it is structured or unstructured.

Structured information has a fixed structure, hence the name, and refers to information that consists fundamentally of columns and rows of data in a table, or spread across several or many linked tables. A spreadsheet is a simple example of this. A form could also be considered structured insofar as the purpose of most forms is to gather information that is then put into this sort of a structure. Most structured data is stored and managed in a database. In fact, most information repositories are a combination of this sort of structured data and someplace to store the binary files associated with them.

In contrast, unstructured information is much more variable both in format and in content. Consider a contract, or a project initiation document, or a personnel review. Each of these simple examples might be created or captured in a variety of formats. While each might have some rules that guide their content, all of these documents will vary greatly in terms of their form, format, content and context to the business.

Capturing Structured and Unstructured Information

Capturing structured information is accomplished in several ways. Data can be input manually or extracted from structured forms. It can also be extracted through some sort of structured output from another system – for example, an HR or accounting application. There may be some requirement to transform the data from one syntax to another, but structured applications in the form of databases are designed to ingest structured content and apply appropriate access controls, business rules and logic, and lifecycle management.

But capturing unstructured information is more challenging. A common example is email. Email may appear to have structure and context, as it is addressed to people and sits in an inbox, or maybe in a filing category in the inbox or private folders. But the emails in a user’s email system are not controlled. There are no rules to the retention and disposition of the information over time.

Current practice is usually to send emails to those who need it, and more often than not, also to those who may only be interested in the content. This creates many copies and reduces the likelihood and possibility of control.

Effective information management provides a clear policy and structure, and the ability to capture and save all types of unstructured information so it can be protected, retained, and searched.

Facing File Formats

How we capture and manage unstructured digital information is closely tied to the file format used to store it. Most organizations don’t give much thought to the file formats used to store their information – and this can cause problems in the short and long term. Many file formats are highly proprietary and can only be manipulated using a specific software application, or even a specific version of that application. When formats are less proprietary, such that more applications can interact with them, the resulting files may not be 100% compatible with each other and with every application.

The better approach is to determine the appropriate file formats for creating and/or capturing information based on a number of factors. Who is the intended audience? Are there any specific regulatory requirements to maintain information in a certain format, or in a non‐proprietary, open, or standard format? And perhaps most importantly, what’s the value of the information over time?

Digital Asset Management

Rich media like audio, video, photographs and infographics constitute an increasingly important type of enterprise content that must be managed. Design documents, marketing assets, logos, architectural and engineering documents are all possible document types held in rich media formats. Rich media also often has extended metadata to indicate camera types, geographical data, or resolution. Metadata can also reflect any license or copyright restrictions. For example, a digital photo might be licensed from the owner for a one year campaign, and the rights to use it online expire after that period.

Advanced users may use a dedicated digital asset management (DAM) system. Many solution providers have additional modules or extended packages for the digital asset management power user. Another important factor to consider is that most rich media use file formats that are very large in size – videos may easily be dozens if not hundreds of megabytes in size. Additional storage and retrieval systems may be needed for sufficient capacity and response times if many rich media documents are stored in your system.

Moving Forward

We believe that every organization should be on a digital transformation journey. Getting to your destination of innovation, efficiency and process improvement will require that you form thoughtful strategies to better understand and manage unstructured data. It will take strong executive support as well as focused technical expertise to get it done. And look for providers and partners that have the right mix of capability, experience and vision to help you make the most of your efforts.

About Kevin Craine

Kevin Craine is a professional writer, an internationally respected technology analyst, and an award-winning podcast producer. He was named the #1 Enterprise Content Management Influencer to follow on Twitter and has listeners and readers worldwide. Kevin creates strategic content for the web, marketing, social media, and more. He is the written voice for some of North America's leading brands and his interviews feature today's best thought leaders. His client list includes many well-known global leaders like IBM, Microsoft and Intel, along with a long list of individuals and start-ups from a wide variety of industries. Kevin's podcasts have been heard around the world, including the award-winning weekly business show "Everyday MBA". He is also the host and producer of "Bizcast" on C-Suite Radio and the producer behind podcasts for Epson, Canon, IBM and AIIM International, among others. Prior to starting Craine Communications Group, Kevin was Director of Document Services for Regence BlueCross BlueShield where he managed high volume document processing operations in Seattle, Portland and Salt Lake City. He also spent time at IKON as an Enterprise Content Management consultant working with national and major accounts. He was the founding editor of Document Strategy magazine. Kevin has also been, at one point or another, an adjunct university professor, a black belt martial artist, and a professional guitarist. Kevin holds an MBA in the Management of Science and Technology as well as a BA in Communications and Marketing.