Artificial intelligence (AI) has existed for several decades, but it is only recently that it is beginning to play an important role in our daily lives. ChatGPT is one of the latest developments in this field. The AI revolution is underway…
A conversational agent
At the end of 2022, the appearance of a new conversational robot made a lot of noise: ChatGPT was created by the American artificial intelligence company Open AI, whose CEO is Sam Altman. Open AI was already behind various artificial intelligence software, such as, for example, DALL-E, which creates images from a written description provided to it. With ChatGPT, it emerges into the textual world.
Writing of texts and content
ChatGPT aims to be able to answer all of our requests: it is indeed capable of answering our questions with very precise information, writing a text according to our expectations, whether it is a completely invented fictional text, a cover letter, an essay, giving us a recipe, correcting the mistakes in our foreign language writings or even lines of code… the possibilities are endless. And for now, they are free in the basic version. OpenAI has launched, on February 1st, ChatGPT Plus, which allows, for $20 per month, access to an accelerated version available at all times, avoiding the numerous periods of saturation that make the site inaccessible.
The particularity of this conversational robot is its ability to respond to these requests in a language that we understand, with a truly impressive writing capacity. Unlike Siri or Alexa, with whom conversations quickly reach the limits of their capabilities, it is possible to have a real discussion with ChatGPT.
Large Language Model : GPT 3.5
To achieve this result, Open AI has developed what is called an “LLM”, which stands for Large Language Model: a language model that aims to imitate human language. The model used to date is GPT 3.5 (for “Generative Pre-trained Transformer 3.5”), which is one of the most powerful available. And Open AI is already preparing the next model: GPT 4. To achieve such a result, the developers of GPT 3.5 started from an incredible amount of text published on the internet. This allows it to reproduce human language in an impressive and deceptive way. This is both its strength and weakness.
Indeed, while ChatGPT makes many tasks easier for us, the quality of the results it provides is not impeccable, and the consequences of its errors can lead to real misinformation. Moreover, while this new technology seems at first glance simple and very useful, heralding the advent of Web 4.0, many questions arise in terms of intellectual property, personal data protection, and cybersecurity.
Competing efficiency with search engines
The emergence and success of ChatGPT raises the question of the relevance of search engines in the years to come. Indeed, unlike a search engine, ChatGPT does not provide a list of links that can provide an answer to our search, but a given answer. The conversational robot has already ingested the basic knowledge of important databases such as Wikipedia, for example, which allows it to directly provide an answer.
Artificial intelligence replacing humans?
Furthermore, the ability to interact with ChatGPT makes it easier to obtain answers to our questions than when using a search engine. Indeed, conversations with ChatGPT allow us to obtain the answer to our question in just a few exchanged sentences, whereas keyword searches on search engines would have taken much more time. It is therefore much more efficient than a human in processing large amounts of data.
This raises the question of whether future conversational robots will be intended to replace humans in certain jobs, thus devaluing human capital, or on the contrary, to assist humans to make them more efficient and thus, to revalue the work involved through a modification of the work process.
The integration of ChatGPT into the Bing search engine
That is why Microsoft has entered into a partnership with OpenAI by investing $10 billion to integrate ChatGPT into its Bing search engine. This allows Bing to be relaunched in the market as Google is far ahead. Following this, Microsoft’s competitors have also entered the race: on February 6, 2022, Google unveiled Bard, its own conversational robot integrated into the search engine, while Baidu, the Chinese equivalent of Google, announced the launch of Ernie Bot. Based on the first versions unveiled by Google and Microsoft, the answer to the search query would be shared between the one provided by the conversational robot and the links as we currently know them. It seems that large companies are therefore trying to integrate conversational robots into their search engines before being overtaken by these new artificial intelligences.
These are important questions raised by ChatGPT, as the answers it provides are based on data available on the internet. However, when ChatGPT responds to questions, it does not provide its sources. It provides information in a way that can make it seem as though the answers were generated by its own research. This raises several questions, including: Who owns the text generated (created or rearranged) by AI? Who holds the intellectual property (and more specifically, the copyright)? It seems that AI itself does not have any rights. Is it the person who asked the question? For this to be the case, it would have to be shown that this person had a significant creative role in generating the text, but given that ChatGPT is trained on an astronomical amount of text and that the user simply asks a question, it seems unlikely. So, is it the authors who inspired the AI? How could we attribute these rights to them when the robot does not cite its sources?
Plagiarism and counterfeiting
Should we then speak of plagiarism or counterfeiting? The question has not yet been raised in court regarding ChatGPT. However, it has already been raised regarding other AIs, although no clear legal response has yet been given by a court: the question has arisen regarding image or code generating AIs. It is therefore possible to consider several possibilities, but an explicit response from our jurisdictions would be necessary, pending an essential regulation of these new AIs.
Another response to protect copyright could be for AIs to pay the authors of the sources used. This is what could justify abandoning the free nature of these services.
The implications for the GDPR :
At the moment, ChatGPT is a free service. However, it is now common knowledge that if the service is free, it is because the user is actually paying for it with their data. And in the case of ChatGTP, the service is expensive…
Protection of personal data: the applicable law
The AI will then have to comply with the various principles of the Regulation, and it seems that this will not be easy as things stand at present.
Read about the Artificial Intelligence Act (AI Act).
ChatGPT vs. GDPR
Concerning the implementation of the various principles of personal data protection : various principles must be respected, including
– The minimisation principle : this means that only data that is adequate, relevant and limited to what is necessary for the purposes for which it is processed may be collected, no more.
– The principle of transparency: the user must be informed that their data is being collected and of the use that will be made of it, so that they are able to know whether this processing can cause them harm or not. This principle seems to be respected in that the AI properly requests consent for the processing of data and explains on its “privacy” page which data is concerned and what processing will be done, although it may be reproached to OpenAI for remaining very vague about the processing, for example, on third parties who may have access to the data.
– The fair processing of data: this involves requesting the user’s consent for the collection of data, and not doing so without informing them, which is not a problem regarding cookies. However, it completely bypasses our authorization when it comes to collecting information contained in the conversations we may have with the AI, which is a major problem.
– The lawful processing of data: this involves collecting and processing data in accordance with applicable rules. This is where more problems arise, as currently, the only applicable law seems to be Californian law, to the exclusion of the GDPR, which does not allow EU citizens to benefit from the protection normally afforded to them under the GDPR. According to this regulation, processing is only lawful if the user has consented to the processing of their data, or if the processing is necessary for one of the reasons listed by the regulation.
– The principle of accuracy: personal data used must be accurate and, as far as possible, kept up-to-date. It is not unlikely that data controllers may retrieve personal data through ChatGPT. How can one ensure the truthfulness and accuracy of such data?
The exercise of rights related to the protection of personal data.
Firstly, two major points need to be made:
– On the other hand, it is important to distinguish between data collected through cookies and data collected during conversations with the robot. Indeed, ChatGPT requests its users’ consent for the collection of cookies, but regarding other data… nothing. The AI uses them without asking for anyone’s opinion.
In a second step, other more specific questions arise for the different rights protected by the GDPR:
– The right to be forgotten: this is the right for a person to request the deletion of data collected about them. This right, formerly known as the “right to erasure,” took on particular importance with the ruling handed down by the Court of Justice of the European Union (CJEU) on May 13, 2014, Google Spain v. Costeja Gonzales, which upheld the protection of personal data against one of the 5 entities that make up the GAFAM: Google, Amazon, Facebook (now Meta), Apple, and Microsoft. This decision came even though the GDPR did not yet exist and Google was not subject to the right to be forgotten at that time. Thus, although ChatGPT does not claim to comply with the GDPR, it appears that it will still be possible to assert this right before the CJEU.
– The GDPR also provides for the right to rectification of one’s personal data. Given that ChatGPT feeds on information provided to it or that will be submitted in the future, how will it be possible to exercise this right when such information may be provided by users other than the one concerned? And can this right truly be exercised (like the right to be forgotten) regarding data collected up to now, i.e. before compliance with the GDPR was imposed on OpenAI?
– The right not to be subject to automated decision-making: this right allows any person not to be subject to a decision based solely on automated processing, when this decision produces legal effects concerning them or significantly affects them. On this point, the question is already being raised about the use of ChatGPT in judicial matters: a first case of use has been identified in Colombia, where a judge partly based his decision on the response provided by the AI to his question. Certainly, the decision is not exclusively based on the answer provided by ChatGPT, but the question arises as to whether this could not be the case in the future (or has already been the case), given the current lack of regulation.
Cybersecurity implications :
Generation of new strains of malware
First of all, the conversational robot allows malicious actors to initiate cyberattacks even if they have no expertise in cybersecurity. Indeed, being capable of creating code, it can generate malicious code if requested to do so. In late December 2022, a thread was created on a hacking forum titled “ChatGPT – Benefits of Malware.” In this thread, it was revealed that it was possible to use the AI to recreate strains of malware and techniques using online articles about malware, which may be included in the training data.
Improved phishing techniques
Furthermore, the use of ChatGPT could also be misused to write higher quality and therefore more credible phishing emails (phishing being a common cyber attack technique, in which the fraudster poses as someone trustworthy, such as our bank, in order to request confidential information, validate a banking operation, etc.). Today, many phishing emails can be identified by their unconvincing phrasing and spelling mistakes. Without these elements, phishing attacks will be much more difficult to detect
Finally, new types of cyber attacks are emerging, such as the so-called “poisoning” attack. This consists of introducing corrupted data into an AI, either during its development or during its learning phase, which will then modify its behavior. However, ChatGPT has been trained on data present on the internet, and there is no doubt that many of them are not morally irreproachable. Moreover, ChatGPT’s knowledge stops at its last update (it is not aware of the most recent events), which is a limitation that OpenAI will have to address. The risk will then be that malicious actors introduce corrupted data into the databases on which ChatGPT is likely to be trained, so that it can subsequently help them implement new cyber attacks.