Generative Artificial Intelligence and Data Protection

The end of 2022 marked the explosive rise of ChatGPT. Every channel was just clogged with the news about how ChatGPT could revolutionize the world, render certain professions unnecessary, simplify our lives, and the endless possibilities it offered. At the same time, there was a noticeable lack of concerns about ChatGPT, as an example of generative AI, needing to fully adhere to data protection requirements, especially those outlined in the GDPR, and that such a tool is a real risk to privacy. So, how did previous excitement turn into worries about our privacy and insufficient data protection? Or perhaps this is yet another hot topic taken by data experts, generating unnecessary noise?

Artificial intelligence is a machine’s ability to display human-like capabilities such as reasoning, learning, planning, and creativity. Generative AI is a broad label that describes any type of artificial intelligence that can be used to create new text, images, sounds, animation, 3D models, synthetic data, or other types of data. ChatGPT, as the best example of generative AI, is a free chatbot that can generate an answer to almost any question we ask. Sounds perfect, right? But what about issues that:

generative AI models are trained on large amounts of data. This data is taken from different sources, sometimes without the knowledge of the initial owner of such data;
are companies developing generative AI entitled to collect personal data?
are data subjects informed about the use of their data?
how much data-generative AI is needed, and what about compliance with the data minimization principle?
how can these generative AI models be protected from so-called model inversion attacks?
how will companies ensure that all data is erased, which was used to train generative AI, if such demand comes from supervisory authority, etc.?

Concerns regarding ChatGPT in Poland

There are many more issues, and if you go deep into them, you can easily see that concerns raised by different data experts and data protection authorities have a solid basis. In addition to already publicly known cases, discussed about the interaction between generative AI and data protection, the data protection authority in Poland made a public announcement to confirm it has opened an investigation regarding ChatGPT on 20 September 2023. The data protection authority in Poland is handling a complaint on ChatGPT, in which the complainant accuses the tool’s creator, OpenAI, of, among other things, processing data in an unlawful, unreliable manner and that the rules on which this is done are not transparent.

Local privacy and security researcher Lukasz Olejnik filed the complaint. One accusation was that ChatGPT failed to respond to his subject access request appropriately; the company also failed to inform the complainant of the data source on him or the recipients or categories of recipients of those data.

“The case involves the breach of a number of data protection provisions, so we will ask Open AI to answer a number of questions in order to be able to thoroughly conduct administrative proceedings” – says Jan Nowak, President of the Polish supervisory authority. He assures that the Office is taking the matter very seriously. More about that case you can find here.

Of course, we do not know what the decision will be from the Poland data protection authority on this case; however, it seems that ChatGPT would be in trouble if they do not have answers to the issues raised in the abovementioned complaint.

Transparency requirements and artificial intelligence

So, as can be seen, many different issues should be considered when using generative AI or planning to create some model of it. Among all others concerns mentioned above, generative AI, like ChatGPT, would have to comply with transparency requirements too:

disclosing that the content was generated by AI;
designing the model to prevent it from generating illegal content;
publishing summaries of copyrighted data used for training.

After the subsidence of initial excitement and fear, we are at the stage of reality where we start to understand that AI has unsolved issues with data protection and privacy. Companies should also consider the requirements arising from other areas of law, such as copyright, as well as remembering the liability. The liability to laws and society.

The content of this article is intended to provide a general guide to the subject matter. If you need assistance regarding the specific situation related with the use of AI and its compliance with data protection, or any other question related to personal data protection, please consult the experts of ECOVIS ProventusLaw.

This review was prepared by internationally certified ECOVIS ProventusLaw data protection expert Milda Šlekytė