The AIIM Blog - Overcoming Information Chaos

8 Things to Consider When Using Semantics in Your Information Management Strategy

Written by John Mancini | Oct 20, 2010 11:41:10 AM

1. No One Wants Unfettered Information

Unless your IT budgets are growing faster than your enterprise content volumes, you need an approach to manage, surface, and control information that does not mandate adding more storage, staff, or restrictions. The systems responsible for content understand nothing about the subject or domain of information under their management. Search engines, content management systems, process engines are all blind to meaning and context. In real life, the meaning of a piece of information determines its usefulness, relevance, and treatment. Semantics add a layer of intelligence by describing what the content is about, using structured data – a.k.a. metadata. Metadata can be used to drive workflows, archiving policy, search, compliance, access control, and discovery.

2. All Applications can Benefit from Semantic Enrichment

Any system connected to enterprise information can use the facilities of a semantic platform. Systems like SharePoint need automatic classification facilities; Content and records management need precise metadata to run the lifecycle; search needs enhanced indices to offer quality facets; websites need SEO-optimized pages to be positioned at the top; workflow needs to send information to the right next process, based on its meaning. The semantic platform approach offers these capabilities across the enterprise systems, is deployed once, maintained centrally, and used many times.

3. Make the Semantic Layer Part of your Infrastructure

By adopting a platform approach to semantics, existing information management investments can all benefit from the one semantic implementation. Embedded solutions that offer point solutions for a specific application but are inaccessible to others cause work to be repeated, opportunities missed, and benefits diminished. Documented, open interfaces are key to easy integration, high performance and reliability are needed to support enterprise volumes, and adherence to industry standards avoids vendor ‘lock-in’.

4. Delivering Semantics Requires a Blend of Capabilities

A modular platform that allows licensing of separate constituent parts yields flexibility and budgetary pragmatism. As each phase is completed and new modules added, so the sum of the parts becomes larger than the whole. Ontology management is needed for model management and governance; text mining is useful for quick-fire model building; rules-based classification is needed for transparent, understandable and accurate metadata tagging; a user experience engine ensures information is surfaced in context; an application framework embodying best practices means projects are off the ground in days; out of the box integrations with popular systems, such as SharePoint, takes the leg work out of systems integration.

5. No One Wants a Theoretical Approach to Ontologies

Ontologies are important to semantic systems as they make a great container for a domain, or subject-based model. Historically enterprise taxonomies (a subset of ontologies) have often lacked application and become academic shelf-ware. If an organization accepts that labeling its content accurately and at a sufficiently granular level will make a difference to the findability experience, then it needs to take the ontology and apply it to the content as metadata. The semantic platform is the technical enabler to embed the labels and links that drive context, meaning, and discovery in the systems that can benefit.

6. Building an Ontology does not have to be a Man-Years Project

A pragmatic approach to model building will deliver early results. Define a skeleton structure: acquire 3rd party ontologies as a starting point; import data lists and assets from around the organization; use the search logs as a guide to which are important subjects; text mine content and gain feedback.

There is no one right answer, and there is no perfect ontology. Putting a good framework in place, to begin with, allows the model to be delivered incrementally, and users will be more inclined to contribute as they see the model improving search and content management.

7. Poor Semantic Tagging is Worse than no Semantic Tagging

Typically, Content Management Systems rely on users manually tagging their content – i.e., populating document properties or adding labels in a library. Most organizations want their information labeled in a specific way or to a standard, so this process becomes a burden on the users. The issue is that this metadata is important for search to work, for a workflow to operate, for records policies to enact, so it makes sense to automate this important task. The tags returned by the automatic classifier must be of the highest accuracy. Unfortunately, language is complex, and so care needs to be taken to choose the best classification approach. A statistical (often using a Bayesian probabilistic model) promises an almost artificial intelligence way to tag content. However, this black box approach can be hard to train, and so the statistical guesswork can be just that. Entity extraction is another approach whereby the words in a document are identified, and if they match a linguistic rule or entry in a dictionary, they return the metadata value. There is no concept of relevance, many passing mentions are matched, and ultimately there is no 'aboutness' so many erroneous tags are returned with the good ones. What works best is a deterministic approach, whereby classification happens by certain rules. The investment needed is in building the ontologies to provide evidence that drives the rules. The result is an accurate, transparent, auditable, and precise classification that yields high-quality metadata.

8. Is a Semantic Platform a ‘Nice to Have’, or does it Deliver a Real RoI?

  • Media companies use semantics to improve the quality of their information feeds, boosting distribution, readership & subscriptions.
  • Government authorities use semantics to tag information according to their standards for compliance, intelligence processing & citizen self-service.
  • Healthcare companies use semantics to boost the level of web self-service and improve the quality of critical health information they provide to patients.
  • Investment banks use semantics to consolidate their information costs, better promote their primary research, and automate information compliance.
  • Research organizations use semantics to speed up their time to market by re-using information.
  • Online directories use semantics to increase their advertising revenues.
  • Corporate intranets & websites use semantics to boost their use and maximize their return on information assets.