Should we block AI scraper bots?

By admin
Should we block AI scraper bots?

Michael Gearon
PERSON

Published on

October 9,
DATE


2023
DATE

The

BBC
ORG

said

last week
DATE

that they are blocking AI software OpenAI web crawler from scrapping it’s content. It follows other organisations like

Reuters
ORG

,

Getty Images
ORG

and other content providers.

Rhodri Talfan Davies
PERSON

who is the director of nations at the

BBC
ORG

said:

We do not believe the current ‘scraping’ of

BBC
ORG

data without our permission . . . to train ‘gen AI’ models is in the public interest and we want to agree a more structured and sustainable approach with technology companies.

So how do we prevent OpenAI and other AI software from scrapping our content? OpenAI said that if you want to discourage it’s GPTBot (this is what the bot is called that crawls websites) then you have to add this to your robots.txt file:

User-agent: GPTBot Disallow: /

What about other AI scraping website bots?

Although

ChatGPT
ORG

is the most well-known AI company right now, other companies like

Google
ORG

and

Facebook
ORG

are using bots to scrap content from the web. If you want to try and prevent all bots then you have to add these to your robots.txt file:

User-agent:

CCBot
PRODUCT

Disallow: / User-agent: ChatGPT-User Disallow: / User-agent: GPTBot Disallow: / User-agent:

Google-Extended
ORG

Disallow: / User-agent:

Omgilibot
WORK_OF_ART

Disallow: / User-Agent:

FacebookBot
ORG

Disallow: / User-agent: Amazonbot Disallow: /

But why should we block them?

I’m not trying to convince you should go and block all of these bots from scraping your content

today
DATE

. Instead it raises the question of should all of these bot have the ability, without your permission, scrap your content and not reference you as the source?

Chris Coyier
PERSON

recently blogged said that:

If a huge company sent a robot to your door to ask for a lock of your hair, would you give it to them? If they asked for

one square inch
QUANTITY

of your land, would you sign it over? If they asked you to run on a treadmill for

one minute
TIME

a day for them, would you hop to it? What if they didn’t ask?

Also we must remember that disallowing these bots doesn’t mean they will stop scrapping your content. It will discourage them but there are lots of bots out there and scrapping content is a popular theme at the moment.