I think we need to talk about what we call “unstructured data.”
Unstructured data used to mean whatever couldn't be defined into a standard format, stored in a regular database, or queried with SQL. Transaction data, customer data, sales data, headcount data: that was structured. Everything else got labeled "unstructured."
That distinction made sense when our tools were limited. Extracting meaning from a video, an image, or a free-text survey response required significant manual effort or specialized processing. I've sat through hundreds of survey responses, building bigrams and trigrams, running NLP analysis just to find patterns. Only then could we use that data for any real purpose.
But the technology landscape has shifted dramatically. Mobile phones have exponentially expanded the volume of images, videos, and text-based data that organizations generate. Cloud storage has made it economical to keep rather than discard. And most importantly, AI has changed what's possible. Generative AI, large language models, and optical character recognition now allow us to work with this data as it is, without first converting it into structured formats. We can query a document. We can extract insights from video. We can analyze thousands of survey responses in minutes.
According to MIT Sloan, multiple analyst estimates put unstructured data at 80% to 90% of an organization's total data. That's not a small category we can afford to ignore.
The term "unstructured data" creates a sense of mystery. It signals "I don't know what to do with this, so let me just not use it."
It also creates a lack of ownership. Unstructured data becomes a heap of stuff lying out there that nobody wants to claim, because nobody wants to put a name to it. And that's where information management becomes critical.
Sitting on Lost Opportunity
By classifying 80-90% of an organization's data as "I don't know what to do with it," we're sitting on a huge pile of lost opportunity.
We're paying to store it. We're paying to organize it. We're paying to manage and eventually dispose of it. But we're not using it.
The reason I want to push back on the term itself is because data is always for a purpose. And data needs to be owned by somebody.
I'm a strong advocate of data democratization. If one team in a company has access to data, that data should be accessible to other teams, considering policies and confidentiality. Organizations shouldn't be creating multiple duplicates of the same data.
Unstructured data needs to fit into that same model. If it's invoices, finance should own them. If it's videos, websites, or marketing content, corporate relations or external communications teams should own them. We can't just carve out one category of data and leave it to the information management team to create context on their own. All the pillars of data readiness apply regardless of whether data is structured or unstructured.
Transaction data is transaction data, not "fruit data" or "miscellaneous data." We name it for what it is and who owns it.
Maybe we should do the same here. Data is for a purpose. Data needs an owner. The name should reflect that.
The views expressed by Subhadra Dutta are her own and do not necessarily reflect the views of her employer.
This blog post is based on an original AIIM OnAir podcast. When recording podcasts, AIIM uses AI-enabled transcription in Zoom. We then use that transcription as part of a prompt with Claude Pro, Anthropic’s AI assistant. AIIM staff (aka humans) then edit the output from Claude for accuracy, completeness, and tone. In this way, we use AI to increase the accessibility of our podcast and extend the value of great content.