By: Tori Miller Liu on April 30th, 2024
The Difference Between Unstructured Data and Structured Data
Big Data | Artificial Intelligence (AI)
If you are new to AIIM, you might be wondering what AIIM means when we say "information," which we admittedly say a lot. My favorite explanation of information is from Steve Weissman, CIP, who told me that he simply refers to information as "stuff in a box." Information represents all the data you manage within your organization. Information means both structured and unstructured data.
Structured Data
Structured data has a fixed structure, hence the name and consists of columns and rows of data in a table, or spread across several or many linked tables. For example, data found in spreadsheets or Customer Relationship Management systems is typically structured.
Unstructured Data
Conversely, unstructured data comes in a variety of formats, like emails, documents, videos, audio, text messages, images, and more. Due to this variety, it's harder to collect, process, and analyze. Hilariously, our analyst friends at Deep Analysis often refer to unstructured data as "ugly data." Unstructured data is vital to organizational operations, but it's messy and undeniably harder to manage.
Unstructured data is typically stored in unstructured repositories or as unstructured data inside of structured systems. Unstructured data is typically much larger in volume than structured data. Some industry experts estimate that unstructured data makes up 80% of an organization's total data.
Importantly, unstructured data is the primary fuel for generative AI applications and because of this, it's been receiving more attention lately.
Who cares?
Ouch...but this question is legitimately important to answer. Ultimately, the difference between structured and unstructured data may have no bearing on business outcomes, but it's important to understand the differences between these two types of data during any sort of technology project involving data, particularly AI implementation.
Our Certified Information Professional Study Guide includes an interesting example of a migration involving unstructured data, such as a migration to Microsoft SharePoint. Any migration involving unstructured data, that is, individual files, is bound to run into issues migrating certain file formats. These issues include:
- Proprietary formats. The older these files are, the more likely they are to have issues. Special care should be taken to ensure that they need to be migrated and that they were migrated successfully.
- Complex formats. These are similar to proprietary file formats; in fact, most proprietary formats are also complex.
- Linked formats. Engineering drawings with linked external reference drawings, spreadsheets or PDFs that link to each other, or any other types of linked files often run into issues with the paths to the linked documents.
- Unknown formats. Most repositories can store any kind of digital data, but if you run into unknown formats there is a question as to whether to even bother migrating them.
- Duplicate files. Most organizations grapple with 3-10 copies on average of every document they store. During a migration, it's important to identify those duplicate files, determine which one should be the official copy, and mark the other copies for deletion.
Solutions for Structuring the Unstructured
Intelligent information management solutions can help you structure the unstructured. For example, software can apply natural language processing techniques to transform free-format text within documents into core elements, terms, and characteristics. There are also available market solutions that can scan images, like identification cards or passports, and using optical character recognition translate data into a structured format. Take a look at AIIM's Buyers Guide to find a solution provider to help you solve your structured and unstructured data quandaries.
About Tori Miller Liu
Tori Miller Liu, MBA, FASAE, CAE, CIP is the President & CEO of the Association for Intelligent Information Management. She is an experienced association executive, technology leader, speaker, and facilitator. Previously, she served as the Chief Information Officer of the American Speech-Language-Hearing Association (ASHA) and has 16+ years of experience in association management. Tori is a former member of the ASAE Technology Professional Advisory Council and a founding Board Member of Association Women Technology Champions. She was named a 2020 Association Trends Young & Aspiring Professional and 2021 Association Forum Forty under 40 award recipient. She is also an alumna of the ASAE NextGen program. She is a Certified Association Executive and holds an MBA from George Washington University. In 2023, Tori was named as a Fellow of the American Society of Association Executives (ASAE).