Unstructured Data Defense: Navigating AI Compliance and Secure Data Architecture

Written by Tori Miller Liu, CIP | Oct 10, 2024 5:45:54 PM

At the AIIM Information and Data Leadership Symposium on October 1, 2024, in Arlington, VA, James Crifasi (COO & CTO, RedZone Technologies) and Jay Leask (Principal Technical Architect, Microsoft) engaged in a fascinating discussion about defending and protecting unstructured data.

Here are the key takeaways from their conversation:

Understanding AI's Use of Unstructured Data

Leask and Crifasi explored how AI models utilize enterprise unstructured data. They emphasized the importance for information leaders to understand data usage in AI training to make informed decisions. Leask clarified, "Microsoft does not train models with enterprise data. Instead, Microsoft uses enterprise data to surface information."

The speakers stressed that governance policies should guide how employees leverage AI output. "Microsoft Copilot is a digital assistant," Leask explained. "You still have to make critical decisions and check sources."

Security and Governance: Starting from the Foundation

Leask highlighted the necessity of understanding how AI developers secure input data and their ownership of security practices. He quipped, "A CIO will ask a developer, 'How did you set this up,' and the developer will say a friend of a friend set it up." Such deflection is insufficient for maintaining enterprise security.

Both speakers emphasized incorporating information governance at the outset of any AI project. Leask advised, "You have to figure out your risks and build mitigation into your structure. You need a data hygiene strategy and to build governance around the data."

Crifasi added, "IT leaders should not be the voice of no, but should be asking project teams if they considered unstructured data defense before implementation."

The Role of Auto-Tagging in Security

Tagging content is crucial for ensuring security and permissions enforcement. Leask noted, "Realistically, if your data is tagged, you can act because of that tag." However, it's vital that AI systems respect auto-tagging.

Balancing Data Retention

Crifasi explained the need to weigh business needs against legal and ethical considerations when determining data retention policies. He stressed the importance of practical, user-friendly policies to ensure adoption, stating, "The only policy worse than having no policy is having a policy that no one follows."

Leask offered these tips for managing data retention:

Ask, "What type of information older than five years do I need to make decisions?"
Look for trivial information to target
Focus on areas where deletable data is concentrated
Set and automate your deletion or retention policy
Collaborate with other departments like legal and IT to develop and update policies
Document when policy is not met to inform future updates

Evolving AI Governance Practices

The speakers discussed how permissions practices are becoming more granular. Historically, executives were often granted unlimited access to data. However, this approach can be inappropriate and unnecessary. Leask explained that Active Directory serves as the base for security with Microsoft, allowing the creation of granular groups and permissions based on roles and responsibilities.

Key considerations include:

Who has the right to see data to ensure safe utilization?
Securing the structure inside datasets and preparing for least privileged access
AI's impact on least privileged access within applications
Understanding that platform compliance doesn't guarantee compliant usage

Up-at-Night Issues in Information Management

The speakers asked the audience about issues keeping them awake at night. Responses included:

Protected data isn't findable
Increasing difficulty in finding data across expanding technology stacks
Concern that current compliance efforts may become obsolete as AI regulations evolve
Risk of treating AI compliance as a one-time effort
Need to address data hygiene issues at the point of data creation
Misunderstanding Copilot as a universal solution for specific issues

Staying Informed on Best Practices

When considering the defense of unstructured data, it's crucial to stay aware of current practices and relevant frameworks. The speakers recommended familiarizing oneself with NIST's AI Framework and the AI ISO Standard, which provide guidelines for responsible AI use.

I used Claude.ai Pro to help identify the specific standards and frameworks you may want to reference when considering your organization’s approach to AI governance.

ISO/IEC 23894: This standard provides guidelines for AI risk management. It aims to help organizations identify, evaluate and manage risks related to the development and use of AI systems.
ISO/IEC 38507: This standard focuses on the governance implications of AI for organizations. It provides guidance on how to effectively govern AI within an organizational context.
ISO/IEC 42001: This is a management system standard for artificial intelligence. It aims to provide a framework for organizations to develop, implement, and improve their AI management practices.
ISO/IEC 25059: This standard provides a framework for assessing the quality of AI systems.
NIST AI Risk Management Framework: This framework “can help organizations identify unique risks posed by generative AI and proposes actions for generative AI risk management that best aligns with their goals and priorities.”

In conclusion, as AI continues to reshape the landscape of information management, professionals must remain vigilant in protecting unstructured data, implementing robust governance practices, and staying informed about evolving standards and regulations.

View full post