Data privacy and Artificial Intelligence (AI) are two of biggest issues in the information spaces today. However, despite the enormous amount of coverage they receive in the trade and general media, what is not yet well understood is how tightly intertwined they are, and how risky it can be to address them without a proper foundation. Here are a few points to ponder to help you avoid the most common risks.
To begin, let’s establish two primary touch-points to guide our conversation:
As a technology, AI the way we tend to think of it (e.g., smart robots that will take over our jobs and then the world) doesn’t exist – and its absence is especially pertinent in the infogov realm, where what’s actually being used is the bit called machine learning. Here, the computer is fed a corpus of information and a set of rules to use to analyze that information for a pre-defined purpose. And by and large, it works ridiculously well.
Because it works so well, machine learning is often used to find PII in piles of text so the sensitive information can be removed, redacted, or subject to some other means of remediation. For instance, a rule causing the engine to look for the numerical pattern XXX-XX-XXXX can flag it as a likely social security number and the document containing it as something requiring privacy protection. This is what’s known in the profession as a Very Good Thing.
However, there is a Very Bad Side to the AI equation as well that is rooted in the popular perception that the technology is great at generating new content. Aptly known as Generative AI, the truth is that it really is good at taking people’s inputs and turning them into something fresh. BUT – it’s only as “good” (i.e., accurate, timely) as the inputs it receives. So if that input happens to contain PII, then the resulting output likely will as well, as will any subsequent documents that repurpose what the AI created (a technique I call “Ruminant AI”).
Where it gets Ugly is when the generative AI engine being used is one that uses your inputs to boost the amount of information available for it to use to improve its algorithms. What this means is that any PII fed into it then is used as source material for any and every other piece of content it generates from that point forward – which is why I cringe every time hear a client say “I signed up for ChatGPT and pasted some stuff into it as a head start.”
As is the case with all new technologies, there are definite benefits and risks associated with AI, and nowhere is this clearer than in the realm of data privacy, as just outlined. To maximize the former and minimize the latter, here are three takeaways to consider when plotting your path forward: