Blog

Rethinking Plagiarism in the Age of AI
by Shouvik Paul, Copyleaks COO

August 9, 2023

Let’s start with a fundamental question: Why do people even plagiarize?

Typically it’s been one of three reasons:

They just don’t want to do the work.
Time management issues
Procrastination

This isn’t the definitive list, but historically, the reasoning tends to fall within these three categories.

But regardless of the reason, traditional plagiarism takes work. In school, if a student was going to plagiarize, they still had to do the research, copy the passages, and then tweak the words or thoroughly paraphrase portions to make them not appear as a direct copy.

Then came what is known as the “essay mill,” where students could pay someone else to write an original essay for them. Was that cheating? Yes. Plagiarized? Technically no. It was an original work, after all.

Cue generative AI.

Generative AI has undoubtedly revolutionized our world in less than a year. You see it among the arts, across enterprises worldwide, and within the halls and classrooms of educational institutions everywhere because of its capabilities to produce human-like long-form essays, blogs, articles, and more.

These AI models, such as ChatGPT, are trained on vast amounts of data from within their databases and across the internet, absorbing and studying the nuances of human language.

This begs the question: Why are we even worried about plagiarism if these models can generate human-like content?

Plagiarism and AI

Did AI end plagiarism as we know it?

Not quite. (But it severely slows down the essay mill industry because why pay someone to write an essay when AI can generate it in seconds for free?).

However, if you’re only looking for traditional plagiarism, you’re not looking at the whole picture.

First, let’s set aside the argument around whether or not using AI-generated content is plagiarism. Instead, let’s take it a step further and dive into an issue already on the rise: plagiarism and the ethics of AI model training.

Some argue that content generated by AI models is original since the combination of words produced is specific to the AI and has never existed before in that particular form. It is, in theory, a product of the AI model’s creative process. Therefore, wouldn’t that make it an original work for the person requesting the content?

Ah, but it’s more about what you don’t see.

As mentioned previously, these AI models are trained on vast amounts of data, from user input to content from the internet. According to the developers of these models, they are designed to prioritize originality, yet, very often, complete pieces of original text are reproduced verbatim within generated content.

The rise of generative AI capabilities ushers forth ethical considerations that traditional plagiarism did not. While AI-generated content is not the original work of a human author, it can often contain text from a human source without proper attribution, leading to arguments that AI-generated content is plagiarized and infringing on potential copyright laws, making it unethical. The trouble is that there are still no specific laws pertaining to AI, plagiarism, and copyright infringement—at least, not yet.

In a case dating back to 2022, a trio of artists filed a lawsuit against several generative AI platforms because they were using the artists’ original works to train the models to mimic their unique styles without a license, which could allow public users to generate pieces that may resemble the existing, protected works. In 2023, several novelists and comedian Sarah Silverman sued OpenAI, the creator of ChatGPT, because they used their work to train their model without permission, resulting in the AI models generating plagiarized text and infringing on copyright.

But that doesn’t mean traditional plagiarism has disappeared altogether. Generative AI did not kill plagiarism; it simply changed its shape.

The New Age of Plagiarism

Our data shows that people, including students, still plagiarize.

But what they aren’t doing is going into the far reaches of some research journals, textbooks, etc., to copy the content to plagiarize. They’re asking AI. Assuming they weren’t would be the equivalent of thinking everyone still uses paper maps in the age of GPS. Why do so much work when it can easily be done for you?

Plagiarism has become more nuanced with generative AI, so we must also become more nuanced. As AI expands its reach, we learn more about its capabilities, benefits, and risks, and we have to adapt right along with it or get left behind holding a paper map, wondering where everyone else went.

It’s not enough to detect whether something is just AI-generated content; you have to have solutions that allow you to detect beyond that to find out if the content is plagiarized and possibly infringing on a copyright.

The first step is detecting whether the content is AI-generated or human-written. If it is AI-generated, the natural next step should be to run a detection scan to verify where the content is sourced from to avoid any potential copyright infringement and plagiarism accusations. This is particularly important among enterprises whose reputations could be on the line. If it is human-written, you also need to verify the originality to mitigate possible plagiarism and ensure originality, especially among student assignments. Either way, you need to know more than who or what wrote it.

Looking at things through the traditional lens isn’t going to work, nor is outright banning generative AI without putting the right solutions in place. If the history of plagiarism has taught us anything as we move into the age of AI, it’s that if people want to do it, they will find a way. Schools and organizations that have yet to adopt tools to detect AI along with plagiarism, or tools that only do one or the other, are getting left behind.

Circling back to the set-aside argument about whether or not using AI-generated content is considered plagiarism, that is a conversation and discussion that needs to be addressed within organizations and educational institutions. Because each one will have their unique view on the matter, one decision does not fit all in this case.

But what does apply to everyone, from the students to the CISOs, is having the right tools to provide the necessary insight to help make more informed decisions around generative AI when having those discussions. It is imperative to stay knowledgeable and aware of the generative AI content produced by implementing the right solutions that can help mitigate all potential risks. It’s also essential to have a multi-pronged solution that also helps ensure the originality of the content generated.

Remember, with AI-generated content, easy isn’t always better, and what you see isn’t always what you get.

Rethinking Plagiarism in the Age of AI by Shouvik Paul, Copyleaks COO

Plagiarism and AI

The New Age of Plagiarism

Find out what's in your copy.

Rethinking Plagiarism in the Age of AI
by Shouvik Paul, Copyleaks COO