In early March of this year I started talking about the importance of implementing “robots.txt” and the “nofollow” attributes because the great AI transcoding was in full effect.
The time has clearly come to use code as a explicit statement around IP
it looks like this :
User-agent: GPTBot
Disallow: /
A Verge piece looks at the NYT implementing this. The New York Times blocks OpenAI’s web crawler / The NYT’s robot.txt page that controls how it appears to automated bots built to index the internet now specifically disallows OpenAI’s GPTBot.