Search engines have come a long way in their ability to help users find desired information among electronic content within an enterprise. Technologies vary, but no matter the search method, all kinds of searches can be enhanced if a taxonomy is also brought into play. Regardless of how content is indexed (simple crawling, concept extraction, use of algorithms, use of rule sets, etc.), taxonomies can enhance the search, by boosting accuracy in retrieval results and by improving the user experience. More specifically, a taxonomy can support search in the following eight ways.
Retrieve Matched Concepts, Not Just Words.
Search engines match user-entered queries, or search strings, with words or phrases within the text of documents in order to retrieve matching documents. Despite these various methods and technologies, search engines alone are far from perfect in their retrieval results. This is due to two issues concerning text matching: (1) a single concept may be worded in multiple ways, and (2) the same word may have multiple meanings.
A taxonomy or controlled vocabulary addresses and resolves both of these issues, and this is the clearest benefit of any taxonomy associated with search. A taxonomy can contain multiple variants for each term, and the concept will always include the fullest name for disambiguation.
Leverage Metadata.
Within any enterprise, some content is structured, such as residing in databases and/or containing designated metadata, and some content is unstructured. The structured content is the “low hanging fruit” for accessing with search. Metadata is standardized administrative or descriptive data about a document that is common for all documents in a given repository. Examples of metadata elements include title, author, publication date, organization, audience, document type, language, description, and subject. If your content records or documents that already have metadata fields assigned to them, you should take advantage of these fields and support them with controlled vocabularies where applicable.
Enhance Web Search Engine Optimization (SEO).
Search engine optimization (SEO) includes all the methods used to make a site or a single page within a web site more likely to be picked up by search engines when users query for information relevant to the site. Certain SEO features will help make all relevant pages more easily found with a search engine, so aspects of SEO should be considered for internal content and not just for public web sites. Keywords for SEO are found in HTML pages tagged as keywords, within titles, headings, and linked text, among other places. Taxonomies on web pages can also figure prominently into SEO, especially if the taxonomy terms are hyperlinked, which they often are in navigation links.
Support Browsing through Hierarchies.
The best known or “classic” taxonomies are hierarchical structures of terms arranged in parent/child or broader/narrower relationships. The idea is to create a logical structure of the terms so that the users can easily locate the most desirable term, even if they do not know ahead of time exactly how that term is worded. Information-seekers may search for something specific and know what it is called, or they may browse for information on a subject area, but are not sure exactly how to call it or have various possible ideas. A hierarchical taxonomy specifically helps this latter kind of information-seeking.
Support Faceted Search.
Although sometimes a search query is simple (find all there is about x), often users want to get more precise information that involves a more complex query. A simple query of one to three words, even if it matches a taxonomy term, will often return too many search results to easily evaluate and select. Searching on the combination of words and phrases alone without the structure of facets will retrieve some of the same results, but they won’t be as accurate, and additional irrelevant results will also be retrieved. The use of facets, thus, reduces ambiguity in queries, and it also enhances the user experience. The presence of facets removes the guesswork and frustrations in how to word a complex search.
Leverage Text Mining and Auto-Classification.
The more sophisticated methods of search utilize methods of text mining or auto-classification in order to identify concepts and not just words. In order to do this, though, a set of concepts needs to be developed, and this set of concepts constitutes a taxonomy. Suggesting candidate taxonomy terms is typically a component of text mining. There is a synergy between taxonomy and text mining or auto-categorization, as they both support each other. The technologies suggest taxonomy terms, and the presence of taxonomy improves the search experience and accuracy, as explained previously.
Support Discovery.
Search alone is not always a sufficient operation to satisfy the information-gathering needs of individuals. Sometimes individuals are interested in obtaining information on something that they do not know how to define at all, or, more significantly, they do not even exist. In other words, they don’t know exactly what they are looking for, but they’ll know it when they find it. A taxonomy can support discovery and suggest terms and associated content of potential interest that the searcher was not originally directly searching for by including related terms, by displaying additional terms tagged to the document, or by dynamically suggested related content that shares auto-categorization keywords.
Facilitate Semantic Search.
If a taxonomy is sufficiently structured, to the point that it might be called an “ontology,” further benefits of searching are facilitated that go beyond merely finding data records. The user can make inferences, explore tangent searches, and learn more about the broader subject area, perhaps by discovery or perhaps more directed. Taking the notion of “related terms” a step further, terms in a taxonomy (or set of taxonomies) are not merely related but are related to each other in certain ways, depending on what kind (or “class”) of term they are. A semantic taxonomy ensures accuracy of results, rather than the retrieval of documents that merely due to the co-occurrence of terms.