Unauthorized reidentification and the Mosaic Effect in the EU AI Act



A key concern in today's digital era is the amplified risk of unauthorized reidentification brought on by artificial intelligence, specifically by the large and diverse data sets used to train generative AI models, such as large language models. However, these risks can be effectively mitigated. By adopting technology solutions that uphold legal mandates, organizations can harness the power of AI to realize commercial and societal objectives without compromising data security and privacy.


This underscores the timeliness of the 4 July Court of Justice of the EU ruling on Meta, as the trilogue negotiations between the European Commission, Council and Parliament to finalize the AI Act begin. The trilogue has been expedited with the goal of finalizing the AI Act by the end of the year, so it can be enacted before the 2024 European Parliament elections.


While the CJEU ruling specifically addressed Meta's personalized advertising practices, its implications extend much further. It clarified the EU General Data Protection Regulation's data use minimization requirements, casting them as a mandatory principle in their own right and a necessity for justifying different legal bases for lawfully processing EU personal data. Therefore, the CJEU ruling offers essential insights for trilogue negotiators, guiding them in the critical considerations that form the backbone of the EU AI Act's risk-based approach.


The European AI Alliance summarizes the act's risk-based approach as follows: While the mass adoption of AI can have many social and economic benefits, using these systems poses novel risks if not managed effectively. As a result, governments around the world are starting to propose legislation to manage these risks.


The EU AI Act, predicted to become the global standard for AI regulation, sets out a risk-based approach, where the obligations for a system are proportionate to one of four levels of risk:



  • "Low-risk systems.

  • "Limited or minimal risk systems.

  • "High-risk systems - systems that can have a significant impact on the life chances of a user. Eight types of systems fall into this category. These systems are subject to stringent obligations and must undergo conformity assessments before being put on the EU market.

  • Systems with unacceptable risk - these systems are not permitted to be sold on the EU market."


Generative AI and LLMs are trained on massive amounts of data and have seemingly infinite memory and pattern recognition capabilities, resulting in the potential to recognize patterns and establish connections between various data points. Without adequate technologically enforced controls, the use of AI exponentially increases this risk due to the amount of data processed and the AI's inherent ability to find connections in this data.


Recent research involving 2,000 information technology decision-makers across the U.S., Canada, the U.K., France, Germany, the Netherlands, Japan and Australia, indicates that, while three out of four global commercial enterprises see great opportunities in embracing AI, they are banning popular AI systems until technology solutions — not just word-based promises in contracts, terms and conditions, and privacy policies — are put in place to enforce legal requirements.


Determination of the level of risk associated with different AI systems under the EU AI Act must take into account technological measures — such as GDPR-compliant data use minimization, as highlighted in the ruling — to defeat unauthorized reidentification risks via the "Mosaic Effect." The Mosaic Effect occurs when multiple data sets are combined to identify individuals within those data sets, even if they were anonymized within each individual set.


"As GenAI and LLM become more widely adopted and AI becomes common in applications and application environments, it dramatically increases the risk of unauthorized reidentification," IDC Research Vice President Carl Olofson said.


"This is because AI is all about 'finding patterns' on a massive scale. The risk of unauthorized reidentification by merging different datasets that, when combined, uniquely identify an individual, even if no single dataset alone contains personally identifiable information (aka the 'Mosaic Effect'), is no longer limited to the data in any one or several data stores but from the combination of near-unlimited data sources with pattern recognition capabilities that go beyond the reach and capabilities of humans."


Mosaic Effect reidentification risks and lawful processing of EU personal data


Mosaic Effect reidentification risks, caused by AI's data collection and pattern-finding prowess, underscore the importance of embracing fundamental GPDR requirements for legally processing EU personal data. The CJEU clarified these requirements in its ruling.


Even if a data subject consents to the collection of data under GDPR Article 6(1)(a) , it is doubtful that the consent will be "informed." They likely will not fully understand the extent of potential AI uses, many could be unknown at the time of collection and will develop as the technology continues to evolve. In most instances, the requirements for the contract as a legal basis for AI processing of EU personal data under GDPR Article 6(1)(b) will not be satisfied due to the existence of less-intrusive, more "data minimized" means of achieving the proper performance of the contract without requiring AI processing.


When consent and contract are unlikely to be sufficient lawful grounds for AI processing, legitimate interest processing under GDPR Article 6(1)(f) is the most likely remaining alternative for organizations to conduct processing under in compliance with the requirements of the EU AI Act. This may shape the future of AI and data privacy in the EU, and potentially worldwide.


"As a trusted global systems integrator and technology partner, we're acutely aware of AI's challenges regarding unauthorized reidentification and data privacy," Cognizant Data Management, Governance and Privacy Practice Lead, Diptesh Singh. "However, these challenges are not insurmountable. Through innovative technology solutions that enforce legal requirements, these risks can be effectively mitigated to enable the benefits of transformative AI."


Conclusion


The CJEU ruling constitutes essential input for trilogue negotiators to consider by highlighting the importance of foundational GDPR principles, like data-use minimization, for AI systems to be classified as "low, limited, or minimal risk." Following the logic of the CJEU analysis, the most likely scenario for lower-risk designations for systems under the EU AI Act is processing EU data that is truly "anonymous" under the strictest reading of GDPR Recital 26. This means the data is not subject to Mosaic Effect unauthorized reidentification risks, is not within the jurisdiction of the GDPR and does present risks under the EU AI Act.


Another scenario that emerges for lower risk designations for systems is using GDPR Article 6(1)(f) legitimate interests processing, provided: 



  • The controller informs data subjects of the legitimate interest being pursued by the data controller or third party at the time of data collection.

  • The least identifiable, technologically enforced, "minimized" version of data is processed, to limit processing what is necessary to protect the fundamental rights of data subjects and ensure the processing is adequate, relevant and limited to what is necessary, e.g., processing dynamically deidentified data that cannot be relinked to individuals without access to additional information kept separately and securely.

  • The controller implements auditable and demonstrably effective data use minimization controls. Disclosures regarding such controls, the scale of their processing and their impact on data subjects are adequate to avoid the interests of data subjects taking precedence over the legitimate interests of the controller or third party.





Post a Comment

0 Comments