SEO – How to prevent AI from taking your content
Artificial intelligence offers exciting opportunities but also raises understandable concerns – including the potential for generative AI models to “take” or misuse content created by human writers and marketers.
This article aims to clearly define these risks, analyze scenarios of how AI could replicate or plagiarize your work, and offer practical tips to protect yourself in generative AI-powered world while still benefiting from emerging technologies.
Why do we care about AI taking our content?
Let’s define our terms. When we speak of AI “taking content,” we are actually discussing multiple distinct risks to us as individuals, the marketing campaigns that we run, and the work that we’ve created.
To help us navigate this web of scary possibilities, I’ve made a graphic summarizing a couple of ways in which AI could take your content:
Let’s think through all of these risks in more detail.
What can AI take from your content?
First, what can generative AI even take from us? I see five categories of what we’d actually stand to lose:
Your entire work or multiple works
Generative AI can take an entire piece you may have produced, like a blog post, a video, a social post, an image, or a combination of the above.
Potentially, generative AI might index your entire website and all the content you’ve published on any of your social media accounts.
Your words or elements of your work
Generative AI could copy full bits of your work, such as direct word-for-word quotes from your content, including an image you created, or replicating frames from your videos.
This type of direct plagiarism could also include small changes in the words or colors within your content.
Your ideas or style
Generative AI could plagiarize you more indirectly by stealing your content’s ideas, format, or aesthetics. If you’ve done research combining a few sources in a new way, the AI could suddenly make the same comparison.
If you’ve made a clever observation about the future of SEO, generative AI might suddenly predict the same scenario but in its own words. Someone could also prompt generative AI to create content in your style, as many tools allow people to do.
Your marketing results
Generative AI could take your content results instead of copying your work.
You might lose your audience if AI content suddenly floods the SERP ranking for the same queries you were targeting or if other companies begin taking up social media feeds with AI-generated posts.
You may also have difficulty standing out, earning trust, or converting to sales.
Your job, tasks or budget
Generative AI could take your content by making content without you instead. Your tasks, role, or team might get cut and replaced by generative AI that is supposed to produce the same content that humans like you were producing.
How can this happen?
But how can generative AI even take those things from us?
If we are afraid of losing something, we should understand the mechanisms by which we could lose it. I would split those scenarios into four categories:
- Your content gets included in the training dataset for a large language model (LLM).
Generating responses to users
- Generative AI generates direct quotes or parts of your original content in its output to users.
- Alternatively, generative AI adapts your content but copies the main ideas or your style in its output to users.
Competing for marketing results
- AI-created content ranks on search, gets traffic on marketing channels, etc.
Influencing the economy and available work
- Work that was previously done by humans like you gets outsourced to AI instead.
How could it impact you?
Losing something is always unpleasant, but what impact would AI taking our content even have on us? I see three core types of harm that we’d face as content marketers:
You might lose money in the form of:
- Royalties or compensation for any profit that your work may generate for the organization that owns the LLM that gets trained on your content.
- Profits from your work getting copied and presented as someone else’s, whether via direct plagiarism or adapting your ideas and style.
Marketing, emotional, and reputation loss
You may lose out on the recognition and opportunities when:
- The marketing tactics and channels that you used no longer bring your organization the same types of returns.
- The thoughts, emotions, and experiences that went into your work get claimed by someone else for money or attention.
- You have to compete against AI content for your own marketing campaigns to achieve results.
Job success and security
Your work as a marketer may get affected if:
- Your job gets cut or changed as your tasks get outsourced to generative AI instead.
- Your team gets laid off and your responsibilities are increased, as you are expected to “leverage AI” to produce the same output as an entire team of humans.
- Your skills get devalued as AI can seemingly produce the same quality of work in seconds for a fraction of the cost.
A lot of these risks aren’t unique to content marketers. But while we might fear some of the same things as other professionals, all of these harms could affect our work directly and in unique ways.
On the other hand, are those risks necessarily unique to AI taking our content? Plagiarism can be damaging whether it is done by humans or software. Is there a difference between a human stealing your content and a large language model doing the same?
Get the daily newsletter search marketers rely on.
What is the difference between another human being stealing your content and generative AI?
The easy answer would be to say that a human knows what they are doing and can be held responsible.
A human has agency and free will. They have to actively decide to find your work, take some of it for themselves, and present it as their own.
Generative AI, on the other hand, is a program that includes a level of randomness and unpredictability to its output. When generative AI steals, it feels accidental.
There is no culpability because AI cannot have criminal intent or a mental state. Without a mind that could understand the consequences of its actions, generative AI is not culpable for the results of what it does.
But is that true? How different is plagiarism via AI from the old-school, 100% human version? Let’s think through some hypothetical scenarios and see where that leads us.
Scenario 1: Old-school human plagiarism
What does conscious and deliberate plagiarism look like?
Let’s say that a human writer named Jane Doe decides to take the article “12 SEO Tips to Boost Your Organic Rankings & Traffic” by Connor Lahey from the Semrush blog.
Doe may change up the title, intro and some headings, and then publish it under her own name on hottestseotipsblog.com. In this case, Doe knows what she is doing and actively chooses to misrepresent someone else’s work as hers.
What is the problem with Doe presenting Semrush’s piece as her own original content? Her plagiarism causes two types of harm: harm to Semrush and harm to people visiting Doe’s site.
Harm to the original author
Doe harms Semrush because when she steals their content, she might also steal its marketing and business impact.
Doe may end up taking traffic away from the original piece. She could end up:
- Getting backlinks and credit for the ideas she presents.
- Earning money off ads displayed on that page, affiliate links that she might include in the text, or business opportunities resulting from that content.
It’s reasonable to say that the money and traffic should have gone to the original author and the Semrush site instead.
Harm to the audience
Doe harms people visiting her site because she is misrepresenting her own expertise. By putting her own name on that content, Doe gives off the false impression that the words on that webpage are her original ideas and experience.
Doe may sell SEO consulting services, and she could get clients who read the plagiarized piece, liked her insights, and decided to hire her based on the knowledge implied through that content. Essentially, by plagiarizing Semrush’s piece, Doe is stealing trust and credibility.
Scenario 2: Hiring a ghostwriter with no oversight
Let’s say that instead of plagiarizing an existing piece of content, Doe hires a ghostwriter to write it for her. She tells the ghostwriter that she needs a blog post on the topic “10 Best SEO Tips” and that she wants it to be around 2,000 words long.
When the writer sends over a complete draft, Doe publishes it on hottestseotipsblog.com under her own name. Is this plagiarism?
Even if the writer crafted that piece as fully their original work, we can argue that the same harm from Scenario 1 is still present:
Harm to the original author
The ghostwriter might not be compensated for any resulting revenue from publishing that piece on Doe’s site.
They might not be allowed to use that content in their portfolio and lose out on opportunities to get additional clients even if their piece takes off and becomes popular.
For the public, the original writer is invisible and functionally does not exist.
Harm to the audience
Any users visiting hottestseotipsblog.com are still being given the impression that Doe is the author of that content. Doe is still borrowing credibility and trust from ideas and skills that were not hers.
Note: Ghostwriting arrangements are common. If the writer consented to giving up any credit or additional compensation for a fee, they should be allowed to do so.
This scenario is not necessarily “stealing,” but it is still misrepresenting who did the work that went into creating that content. Whether this is plagiarism or an acceptable practice depends on our own definitions and morals.
On the other hand, if the writer actually copied the piece from an existing source like Semrush, then this piece is definitely plagiarism. In that case, we could argue that Doe was responsible for cross-checking the final draft and ensuring it was original work.
But most of the responsibility would lie with the ghostwriter, who misrepresented the content as their own original work to Doe.
Scenario 3: Generic AI prompt that generates plagiarized quotes
What happens when AI enters the picture?
Perhaps Doe may open ChatGPT and prompt it with something like:
- “You are an SEO expert familiar with the best industry practices for content creation, marketing, technical SEO, keyword research, and writing for professional audiences. Draft a thorough and original blog post for the topic ’10 Best SEO Tips’ with level 1-3 subheadings and specific examples.”
Let’s assume that, in our scenario, ChatGPT has been trained on Semrush’s blog content. When ChatGPT generates the blog post for Doe, that draft copies entire paragraphs and most of the main points from the Semrush article with 12 SEO tips.
If Doe keeps that AI-created draft unaltered and posts it on hottestseotipsblog.com under her own name – did she plagiarize?
The end result is the same as in our first scenario – there is now a published blog post under Jane Doe’s name on another site using the exact same words as an article on Semrush’s website.
Hottestseotipsblog.com might not include any disclaimers that the content was produced with the help of AI. How does that fit into our framework?
Harm to the original author
In this situation, Semrush is still losing potential traffic, revenue, and other business opportunities.
Harm to the audience
To any user visiting Doe’s site, that content appears as if it is completely original and written based on Doe’s own expertise.
However, it doesn’t feel right to place the same level of blame on Doe as in the first scenario. After all, Doe doesn’t know that the content she published was plagiarized.
She may have never seen Semrush’s piece. She may sincerely believe that ChatGPT gave her a completely original blog post that has never been published elsewhere.
Yet, Doe still misrepresented her work. She didn’t write the piece – ChatGPT did. Doe might not have plagiarized from Semrush deliberately, but she did steal the expertise of ChatGPT and the data it was trained on.
Scenario 4: Specific AI prompt to copy someone’s style
What if Doe opened ChatGPT, copied the text of Semrush’s article, and wrote the following prompt:
- “Draft a thorough and original blog post for the topic ’10 Best SEO Tips’ with level 1-3 subheadings and specific examples in the same style as the following text [pasted text of ‘12 SEO Tips to Boost Your Organic Rankings & Traffic’ from Semrush]”
In this case, ChatGPT may not copy any of the exact phrasing or points from Semrush. But instead, Doe is asking the AI to copy the more nebulous notion of style and format from Semrush’s piece.
ChatGPT’s output would still sound similar to Semrush, and some of the ideas might even echo the original.
Doe is still stealing something, even if it’s harder to pin down. So, how does that fit into our framework?
Harm to the original author
Semrush is still not getting recognition for their original content. Style is the result of hard work, writing skill, and creativity. Doe is taking that without permission, and any traffic or revenue she generates is still relying on something that wasn’t hers.
Harm to the audience
Doe is presenting those stylistic choices as her own, and might come across as a much better writer or thinker than she actually is.
If the Semrush article had a particularly unique structure and ChatGPT copied it, then Doe is benefiting from multiple creative choices that now come across as her own.
Sure, copying the style of someone else’s content could be described as inspiration. But if no credit is given and that inspiration is heavily relied upon to the point of both pieces resembling each other, that is still likely plagiarism.
And Doe holds a level of responsibility for that plagiarism – she directly prompted ChatGPT to copy the style of someone else’s content.
Scenario 5: Specific AI prompt to paraphrase someone’s ideas
What if Doe prompts ChatGPT to copy the ideas from Semrush directly? She could write a prompt like:
- “Adapt the following text and rephrase it into a thorough and original blog post for the topic ’10 Best SEO Tips’ with level 1-3 subheadings . Keep the same ideas but draft your own unique examples in the same style: [pasted text of ‘12 SEO Tips to Boost Your Organic Rankings & Traffic’ from Semrush]”
Like with Scenario 4, including the word “original” in the generative AI prompt does not magically erase the intent to steal from others.
Doe is prompting ChatGPT to directly copy someone else’s original work and explicitly asking for it to copy not only the style but the main points of the Semrush piece.
Even if all of the words and phrases in the final draft are different, the resulting article is still plagiarized.
Here’s how it would fit with our framework:
Harm to the original author
Semrush and the article author are still not getting any credit, recognition, or revenue from their work.
Any hours of research that the original author may have spent creating the piece, thinking through the ideas, exploring best practices of SEO, and gathering information – they are simply lifted by Doe for her own benefit.
She is taking credit while stealing the time and effort required to come up with good content.
Harm to the audience
Anyone reading Doe’s piece will still believe that it’s her own.
Readers would trust Doe not simply because of her exact word choice but also because of the sophistication that she shows in her understanding of SEO and her ability to explain best practices to others.
When Doe steals ideas from Semrush, she is stealing all of those implications, even if the exact words are changed.
In this case, most of the responsibility for plagiarism is with Doe.
This scenario is nearly identical to Scenario 1: Doe deliberately chose to copy the content from Semrush and present it as her own with some modifications.
The only difference is the method she used: while in Scenario 1, Doe made those modifications manually, in this case, Doe used ChatGPT to do the dirty work for her.
Scenario 6: Creating a generative AI model to write blog posts based on existing content
In our discussions, we have not addressed one key party in scenarios involving AI: the people who built that AI model in the first place.
Let’s say that Jane Doe is actually a skilled developer who can build her own machine learning applications. She builds her own algorithm that can shuffle and automatically paraphrase language. Then, Doe prompts that algorithm to rephrase the Semrush blog post with 12 SEO tips. Is that plagiarism?
Functionally, this scenario is identical to Scenario 5. The only difference is that now Doe has also built her own paraphrasing software instead of relying on something like ChatGPT.
What if Doe builds an algorithm that is trained on all of Semrush’s blog posts and then she asks it to generate a draft for “10 Best SEO Tips”?
In this case, Doe isn’t stealing from one article. However, her software is still reshuffling and paraphrasing the language and ideas from Semrush. The resulting piece is still plagiarized, but pinning where each part comes from becomes harder.
Now, let’s say that Doe takes all of Semrush’s blog content and also includes blog posts from Ahrefs, Moz, Search Engine Land, and 50 other SEO websites.
She uses all of that content to train her own LLM and then prompts the AI to generate a blog post with “10 Best SEO Tips”. Then, Doe publishes that AI-created piece on hottestseotipsblog.com under her own name. How does this scenario fit within our framework?
Harm to the original author
We are now dealing with multiple original authors, not simply Semrush. But all of the sites whose content was used in the training dataset have had their ideas, style, and word choice copied to some degree.
Doe’s piece might still generate revenue, business opportunities, and traffic for her site. And none of the original authors and websites are getting credited or compensated for those results.
Harm to the audience
Anyone visiting Doe’s site still thinks that the content presented to them is a reflection of Doe’s original work, ideas, and expertise.
Sure, Doe has put effort into building generative AI that created the content. But if a reader is simply reading SEO tips and not learning about the intricacies of Doe’s AI software – is the reader actually witnessing any of Doe’s own work?
At the end of the day, if someone believes that Doe is an SEO expert because of that content, Doe still steals that trust from others. None of her SEO advice is her own. Doe deserves credit for creating a fancy stealing machine, not for the content that it produces.
Plagiarism is the same whether AI is involved or not
It doesn’t matter if AI is involved: as we see from the scenarios outlined above, plagiarism is always done by humans when they take credit for the work of other humans. AI is simply a technology that provides new ways of stealing content, but it hasn’t created a fundamentally new type of plagiarism or content creation to begin with.
You can think of AI as a particularly high-tech blender: you can stuff it with a bunch of pieces, press a button, and then get a homogeneous mixture to post on your own site. But that mixture is still made up of the same content that you put in there.
A blender doesn’t create anything new. It simply puts some pre-existing substance in a different form. If you make a carrot smoothie, can you say that you grew those carrots? The same could be said for content.
If you worry about AI taking your content, remember to look behind the curtain at who is actually pulling the strings. While the software itself may not be capable of a criminal mental state, the humans who created or prompted it certainly are. AI cannot steal from you, but humans using AI might.
How can we minimize risks posed by AI to our content?
OK, all of that sounds extremely depressing. You may be ready to throw up your hands and give up on marketing as a concept. And I don’t blame you – the threats that AI poses to our content are scary.
The way people obfuscate their culpability in stealing content, ideas, or jobs is dangerous. People using generative AI can cause real harm to you, me, or many people we work with and care about.
However, that doesn’t mean we should give up. Humans have been stealing ideas as long as there have been ideas to steal. Just like we’ve had ways of combating old-school human plagiarism, we can work to prevent AI-assisted plagiarism too.
Let’s wrap up by looking at how we can minimize risks and reduce the harms of generative AI to our content.
No, ‘becoming a generative AI expert’ isn’t a solution
Before I dive into specific suggestions, a short disclaimer: I won’t recommend that you become a generative AI expert or learn how to work with machine learning.
Sure, those kinds of skills could be valuable. But they are a distraction from the real question we’re wrestling with:
- How can we protect the work and skills we’ve done as content marketers in a world without AI?
AI, like any technology, will have some impact on the kinds of tasks we can do. But it doesn’t change the essence of “marketing” or “content”.
Saying that you can only remain a marketer if you learn about AI is tantamount to telling you that you must change careers entirely. A generative AI expert is not a marketer. A marketer might also be a generative AI expert, and a generative AI expert could be a specialist in content.
But these professions are not interchangeable and the skillset behind them is distinct. Think of it this way: does going into Excel now and then to prepare reports make you a data analyst? Probably not.
The work we do as marketers will remain valuable even if AI becomes a core part of how we do that work. So, as we think about minimizing risks from AI taking our content, we should think about the unique skills and challenges of marketing work in particular.
Avoid financial losses by opting out of AI scraping
The first thing you can do is to prevent your content from appearing in some AI datasets. This is the most brute force option, and won’t be practical for most marketers or web publishers.
Keep in mind that opting out of any AI indexing also means that your site and any associated information will not appear in that AI’s output. If any user is looking for information about your business by prompting a generative AI chatbot, they might not see you in the results.
By protecting your privacy this way, you are essentially sacrificing generative AI as a marketing channel. So make sure that you are ready for any consequences of taking that step.
You can opt out of OpenAI’s GPTBot crawling by adding these lines to your robots.txt file:
You can also block Google’s Bard and Vertex AI using the Google-Extended control, by adding the following to your robots.txt file:
Unfortunately, Google-Extended won’t block your site from getting indexed in Google’s AI-powered Search Generative Experience (SGE). It seems that the only way to avoid appearing in SGE is to opt out of Google’s indexing entirely.
You can read more about Google’s crawlers and editing your permissions in their own documentation.
Some other publishing tools might let you control these settings within the platform itself. For example, Substack includes a setting within its writer dashboard to “block AI training.”
This appears to mostly opt you out of GPTBot, but perhaps it will become a more robust blocker in the future.
Reclaim a marketing advantage by switching to harder-to-replicate marketing approaches
For most of us, blocking AI crawlers or opting out of SEO entirely isn’t exactly an option (otherwise, why would you be reading this on Search Engine Land?). So, what can we do instead?
Accept that someone may try to plagiarize your content using AI. Even if you could opt out of every training dataset, someone can still copy-paste your work into ChatGPT and steal it that way.
When someone wants to steal, they will find a way. Trying to prevent your content from getting copied will quickly become an endless game of whack-a-mole or lead you to stop publishing on the open web entirely.
How can you reduce harm even if others copy your content? By thinking about marketing approaches that are hard, if not impossible, to copy.
If Jane Doe did steal a piece from Semrush, and you came across both versions of that article – who would you be more likely to trust? If you’re a marketer, you are likely already familiar with Semrush as a company and a trusted source of information.
When seeing content that has been plagiarized, you are likely to assume that Semrush was the original author. Even if Semrush may lose some traffic or revenue, in the long run, Jane Doe cannot steal their brand, reputation, content creation process, team, or experience.
Others might be able to steal some of your work, but they cannot steal your expertise. If your content is truly original, trustworthy, and helpful – people will continue to trust you.
You can continue to:
- Make content that earns traffic.
- Leverage multiple marketing channels.
- Build your own brand and the personal reputations of people within your organization.
No AI can steal that.
Protect your job and budget by leaning on your humanity
What is your essence that no AI or other human could ever take from you?
What makes you, as an individual, a department, or an organization – you? Lean into your true differentiators from your unique blend of experience, positioning, and connections.
Someone else might be able to copy your idea or automate some of your skills. But if your job or marketing program is built on your fundamental humanity and unique point of view, they won’t be replaced.
If you want to be resilient in the face of AI, stay human. Invest in true thought leadership, developing relationships, expressing strong opinions, and telling the stories that only you could tell.
Don’t give up to AI – marketing isn’t going anywhere
Humans trust other humans. AI won’t change that, even despite all the current hype. A thief might be able to borrow your reputation, but true expertise will always become apparent. A plagiarist won’t be able to build on your ideas or recreate your success.
So, if you’re worried about AI taking your content, make sure that you keep building content that is worth stealing because your ability to create good content in the first place will be the best defense you could possibly have.
February 22, 2024
February 22, 2024