8 Things to Consider When Using Semantics in Your Information Management Strategy
John Mancini

By: John Mancini on October 20th, 2010

Print/Save as PDF

8 Things to Consider When Using Semantics in Your Information Management Strategy

Metadata  |  Enterprise Content Management (ECM)  |  Enterprise Search

1. No One Wants Unfettered Information

Unless your IT budgets are growing faster than your enterprise content volumes, you need an approach to manage, surface, and control information that does not mandate adding more storage, staff, or restrictions. The systems responsible for content understand nothing about the subject or domain of information under their management. Search engines, content management systems, process engines are all blind to meaning and context. In real life, the meaning of a piece of information determines its usefulness, relevance, and treatment. Semantics add a layer of intelligence by describing what the content is about, using structured data – a.k.a. metadata. Metadata can be used to drive workflows, archiving policy, search, compliance, access control, and discovery.

2. All Applications can Benefit from Semantic Enrichment

Any system connected to enterprise information can use the facilities of a semantic platform. Systems like SharePoint need automatic classification facilities; Content and records management need precise metadata to run the lifecycle; search needs enhanced indices to offer quality facets; websites need SEO-optimized pages to be positioned at the top; workflow needs to send information to the right next process, based on its meaning. The semantic platform approach offers these capabilities across the enterprise systems, is deployed once, maintained centrally, and used many times.

3. Make the Semantic Layer Part of your Infrastructure

By adopting a platform approach to semantics, existing information management investments can all benefit from the one semantic implementation. Embedded solutions that offer point solutions for a specific application but are inaccessible to others cause work to be repeated, opportunities missed, and benefits diminished. Documented, open interfaces are key to easy integration, high performance and reliability are needed to support enterprise volumes, and adherence to industry standards avoids vendor ‘lock-in’.

Get Your Free eBook: From ECM to Intelligent Information Management

4. Delivering Semantics Requires a Blend of Capabilities

A modular platform that allows licensing of separate constituent parts yields flexibility and budgetary pragmatism. As each phase is completed and new modules added, so the sum of the parts becomes larger than the whole. Ontology management is needed for model management and governance; text mining is useful for quick-fire model building; rules-based classification is needed for transparent, understandable and accurate metadata tagging; a user experience engine ensures information is surfaced in context; an application framework embodying best practices means projects are off the ground in days; out of the box integrations with popular systems, such as SharePoint, takes the leg work out of systems integration.

5. No One Wants a Theoretical Approach to Ontologies

Ontologies are important to semantic systems as they make a great container for a domain, or subject-based model. Historically enterprise taxonomies (a subset of ontologies) have often lacked application and become academic shelf-ware. If an organization accepts that labeling its content accurately and at a sufficiently granular level will make a difference to the findability experience, then it needs to take the ontology and apply it to the content as metadata. The semantic platform is the technical enabler to embed the labels and links that drive context, meaning, and discovery in the systems that can benefit.

6. Building an Ontology does not have to be a Man-Years Project

A pragmatic approach to model building will deliver early results. Define a skeleton structure: acquire 3rd party ontologies as a starting point; import data lists and assets from around the organization; use the search logs as a guide to which are important subjects; text mine content and gain feedback.

There is no one right answer, and there is no perfect ontology. Putting a good framework in place, to begin with, allows the model to be delivered incrementally, and users will be more inclined to contribute as they see the model improving search and content management.

7. Poor Semantic Tagging is Worse than no Semantic Tagging

Typically, Content Management Systems rely on users manually tagging their content – i.e., populating document properties or adding labels in a library. Most organizations want their information labeled in a specific way or to a standard, so this process becomes a burden on the users. The issue is that this metadata is important for search to work, for a workflow to operate, for records policies to enact, so it makes sense to automate this important task. The tags returned by the automatic classifier must be of the highest accuracy. Unfortunately, language is complex, and so care needs to be taken to choose the best classification approach. A statistical (often using a Bayesian probabilistic model) promises an almost artificial intelligence way to tag content. However, this black box approach can be hard to train, and so the statistical guesswork can be just that. Entity extraction is another approach whereby the words in a document are identified, and if they match a linguistic rule or entry in a dictionary, they return the metadata value. There is no concept of relevance, many passing mentions are matched, and ultimately there is no 'aboutness' so many erroneous tags are returned with the good ones. What works best is a deterministic approach, whereby classification happens by certain rules. The investment needed is in building the ontologies to provide evidence that drives the rules. The result is an accurate, transparent, auditable, and precise classification that yields high-quality metadata.

8. Is a Semantic Platform a ‘Nice to Have’, or does it Deliver a Real RoI?

  • Media companies use semantics to improve the quality of their information feeds, boosting distribution, readership & subscriptions.
  • Government authorities use semantics to tag information according to their standards for compliance, intelligence processing & citizen self-service.
  • Healthcare companies use semantics to boost the level of web self-service and improve the quality of critical health information they provide to patients.
  • Investment banks use semantics to consolidate their information costs, better promote their primary research, and automate information compliance.
  • Research organizations use semantics to speed up their time to market by re-using information.
  • Online directories use semantics to increase their advertising revenues.
  • Corporate intranets & websites use semantics to boost their use and maximize their return on information assets.

 

Free eBook: Moving From ECM to Intelligent Information Management

About John Mancini

John Mancini is the President of Content Results, LLC and the Past President of AIIM. He is a well-known author, speaker, and advisor on information management, digital transformation and intelligent automation. John is a frequent keynote speaker and author of more than 30 eBooks on a variety of topics. He can be found on Twitter, LinkedIn and Facebook as jmancini77. Recent keynote topics include: The Stairway to Digital Transformation Navigating Disruptive Waters — 4 Things You Need to Know to Build Your Digital Transformation Strategy Getting Ahead of the Digital Transformation Curve Viewing Information Management Through a New Lens Digital Disruption: 6 Strategies to Avoid Being “Blockbustered” Specialties: Keynote speaker and writer on AI, RPA, intelligent Information Management, Intelligent Automation and Digital Transformation. Consensus-building with Boards to create strategic focus, action, and accountability. Extensive public speaking and public relations work Conversant and experienced in major technology issues and trends. Expert on inbound and content marketing, particularly in an association environment and on the Hubspot platform. John is a Phi Beta Kappa graduate of the College of William and Mary, and holds an M.A. in Public Policy from the Woodrow Wilson School at Princeton University.