By: John Mancini on June 28th, 2013

Why Can't Document Auto-Classification Be as Much Fun as Google Image Recognition?

A recent AIIM Industry Watch, Information Governance - records, risks and retention in the litigation age, highlighted some of the challenges associated with managing the accumulation of information at increasingly exponential rates.

Despite good intentions, the delete button isn’t being pressed. Electronic records aren’t being deleted even when retention periods are set.
The answer to the data problem is to let the computer do the filing. 14% are already doing autoclassification of electronic records, 37% are keen to do it.
Something has to be done about content accumulation. For 29%, the response to the information deluge is “buy more discs."
The content may be electronic, but the e-discovery mechanisms are still manual. 53% are still reliant on manual processes for e-discovery searches across file shares, email, and physical records.
45% of organizations plan to increase their records management spend over the next two years. In particular, automated classification is set for strong growth, along with enterprise search, RM modules, E-discovery, and email management.

Of course, auto-classification is one of those things that everyone is talking about, but few have actually implemented it. I came across an amazing example -- at least to me -- of the power of auto-classification in the consumer world. I was recently listening to a TWIG podcast (yes, I understand how incredibly nerdy this is), and they were talking about auto image recognition and classification within images stored in Google.

This is actually quite fun. One thing -- you need to have your pictures stored in Picasa.

So if your pictures are on Picasa and you are logged in, then try this for fun and amaze your friends.

Do a Google search on "my pictures of XXX." (On second thought, just a reminder to not actually put XXX in the search string. Put a noun in there. There is no telling what may turn up if you type in XXX.)

So, a couple of examples.

Here are my pictures that contain images of our devil cat, Snickers, who is the worst pet in the world.

My pictures of cats

This search result is looking for pictures that have wine in them.

My pictures of wine

And this one has found pictures with houses in them.

My pictures of houses

All of this is really quite amazing when you think about it. These results are not from having tags on the picture or manually identifying what's in the picture. The results are from interpreting the words and knowing what the words mean and what the words "look like" and finding them in the picture and doing so across a fairly significant number of pictures in a matter of a second.

These next two or even more interesting. What this indicates is that the software has an understanding of what kinds of foods are served at breakfast and what kinds of foods are served at dinner and what a dinner might look like compared to a breakfast.

My pictures of breakfast

My pictures of dinner

So give it a try this weekend. Amaze your friends. And think about what is increasingly possible not only in terms of finding images of a cat, but in auto-classifying and interpreting content in a business context.

About John Mancini

John Mancini is the President of Content Results, LLC and the Past President of AIIM. He is a well-known author, speaker, and advisor on information management, digital transformation and intelligent automation. John is a frequent keynote speaker and author of more than 30 eBooks on a variety of topics. He can be found on Twitter, LinkedIn and Facebook as jmancini77. Recent keynote topics include: The Stairway to Digital Transformation Navigating Disruptive Waters — 4 Things You Need to Know to Build Your Digital Transformation Strategy Getting Ahead of the Digital Transformation Curve Viewing Information Management Through a New Lens Digital Disruption: 6 Strategies to Avoid Being “Blockbustered” Specialties: Keynote speaker and writer on AI, RPA, intelligent Information Management, Intelligent Automation and Digital Transformation. Consensus-building with Boards to create strategic focus, action, and accountability. Extensive public speaking and public relations work Conversant and experienced in major technology issues and trends. Expert on inbound and content marketing, particularly in an association environment and on the Hubspot platform. John is a Phi Beta Kappa graduate of the College of William and Mary, and holds an M.A. in Public Policy from the Woodrow Wilson School at Princeton University.