Why Can't Document Auto-Classification Be as Much Fun as Google Image Recognition?

Written by John Mancini | Jun 28, 2013 11:10:49 AM

A recent AIIM Industry Watch, Information Governance - records, risks and retention in the litigation age, highlighted some of the challenges associated with managing the accumulation of information at increasingly exponential rates.

Despite good intentions, the delete button isn’t being pressed. Electronic records aren’t being deleted even when retention periods are set.
The answer to the data problem is to let the computer do the filing. 14% are already doing autoclassification of electronic records, 37% are keen to do it.
Something has to be done about content accumulation. For 29%, the response to the information deluge is “buy more discs."
The content may be electronic, but the e-discovery mechanisms are still manual. 53% are still reliant on manual processes for e-discovery searches across file shares, email, and physical records.
45% of organizations plan to increase their records management spend over the next two years. In particular, automated classification is set for strong growth, along with enterprise search, RM modules, E-discovery, and email management.

Of course, auto-classification is one of those things that everyone is talking about, but few have actually implemented it. I came across an amazing example -- at least to me -- of the power of auto-classification in the consumer world. I was recently listening to a TWIG podcast (yes, I understand how incredibly nerdy this is), and they were talking about auto image recognition and classification within images stored in Google.

This is actually quite fun. One thing -- you need to have your pictures stored in Picasa.

So if your pictures are on Picasa and you are logged in, then try this for fun and amaze your friends.

Do a Google search on "my pictures of XXX." (On second thought, just a reminder to not actually put XXX in the search string. Put a noun in there. There is no telling what may turn up if you type in XXX.)

So, a couple of examples.

Here are my pictures that contain images of our devil cat, Snickers, who is the worst pet in the world.

This search result is looking for pictures that have wine in them.

And this one has found pictures with houses in them.

All of this is really quite amazing when you think about it. These results are not from having tags on the picture or manually identifying what's in the picture. The results are from interpreting the words and knowing what the words mean and what the words "look like" and finding them in the picture and doing so across a fairly significant number of pictures in a matter of a second.

These next two or even more interesting. What this indicates is that the software has an understanding of what kinds of foods are served at breakfast and what kinds of foods are served at dinner and what a dinner might look like compared to a breakfast.

So give it a try this weekend. Amaze your friends. And think about what is increasingly possible not only in terms of finding images of a cat, but in auto-classifying and interpreting content in a business context.

View full post