Data Privacy in the Age of AI

Written by Steve Weissman | Mar 27, 2024 3:30:00 PM

Data privacy and Artificial Intelligence (AI) are two of biggest issues in the information spaces today. However, despite the enormous amount of coverage they receive in the trade and general media, what is not yet well understood is how tightly intertwined they are, and how risky it can be to address them without a proper foundation. Here are a few points to ponder to help you avoid the most common risks.

To begin, let’s establish two primary touch-points to guide our conversation:

As a practice, data privacy refers to the safeguarding of personally identifiable information (PII) of the people you serve, be internal (e.g., employees, investors) or external (e.g., customers, suppliers) to your organization. It’s not a new issue, but thanks to the passage of the GDPR by the European Union several years ago, the spate of state laws now coming online in the U.S., and the spiraling costs of compliance and penalties, it’s getting to be a major management concern.
As a technology, AI the way we tend to think of it (e.g., smart robots that will take over our jobs and then the world) doesn’t exist – and its absence is especially pertinent in the infogov realm, where what’s actually being used is the bit called machine learning. Here, the computer is fed a corpus of information and a set of rules to use to analyze that information for a pre-defined purpose. And by and large, it works ridiculously well.

The Good, the Bad, and the Ugly

Because it works so well, machine learning is often used to find PII in piles of text so the sensitive information can be removed, redacted, or subject to some other means of remediation. For instance, a rule causing the engine to look for the numerical pattern XXX-XX-XXXX can flag it as a likely social security number and the document containing it as something requiring privacy protection. This is what’s known in the profession as a Very Good Thing.

However, there is a Very Bad Side to the AI equation as well that is rooted in the popular perception that the technology is great at generating new content. Aptly known as Generative AI, the truth is that it really is good at taking people’s inputs and turning them into something fresh. BUT – it’s only as “good” (i.e., accurate, timely) as the inputs it receives. So if that input happens to contain PII, then the resulting output likely will as well, as will any subsequent documents that repurpose what the AI created (a technique I call “Ruminant AI”).

Where it gets Ugly is when the generative AI engine being used is one that uses your inputs to boost the amount of information available for it to use to improve its algorithms. What this means is that any PII fed into it then is used as source material for any and every other piece of content it generates from that point forward – which is why I cringe every time hear a client say “I signed up for ChatGPT and pasted some stuff into it as a head start.”

Three Takeaways

As is the case with all new technologies, there are definite benefits and risks associated with AI, and nowhere is this clearer than in the realm of data privacy, as just outlined. To maximize the former and minimize the latter, here are three takeaways to consider when plotting your path forward:

Don’t put ANYTHING sensitive into a public commercial engine, EVER. If you do, there’s no way to keep others from seeing or using it, even if they don’t mean to.
Do license and implement AI behind your own organizational walls If you want to investigate its use or develop an application pilot. In theory, this will allow you to maintain control over your data and thereby ensure it stays private and secure.
And do engage in information governance beforehand to ensure the source material you’re feeding into the engine is as accurate, timely, and privacy-controlled as possible. For at the end of the day, the better-vetted information you put in, the higher confidence you can have in the results you get out.

View full post