AEO/GEO · Glossary

AI crawler

Q: How do I block AI crawlers?

Add their user agent names to your robots.txt with a Disallow rule. Block them individually (for example GPTBot or ClaudeBot) rather than all at once, so you can keep the live answer crawlers that cite you while stopping the training only ones.

Last updated June 29, 2026 · by Tal Gerafi

An AI crawler is an automated bot that fetches web pages to feed AI systems — training data for models or live answers in tools like ChatGPT and Perplexity. Examples: GPTBot, ClaudeBot, PerplexityBot.

AI crawlers come in two basic jobs, and the difference matters for how you treat them. Some collect text to train future models. Others fetch a page in real time, the moment a user asks a question, so the AI can quote your site in its answer. If you block the second kind, you can disappear from AI answers.

How does an AI crawler work?

An AI crawler reads your robots.txt, follows links, and downloads the HTML of pages it's allowed to fetch — much like Googlebot. The job it does with that content depends on its operator. A training crawler stores the text to teach a model. A live "answer" crawler grabs the page during a search and passes the relevant passage straight into the model's reply, often with a citation back to you.

Each crawler has its own user-agent name, so you control them one by one in robots.txt. Knowing the names is half the work — see how a published llms.txt file complements this by pointing crawlers at your best content. The cleaner and faster your pages, the better your crawl budget is spent.

User-agent	Operator	Main job
GPTBot	OpenAI	Model training
OAI-SearchBot	OpenAI	ChatGPT live answers
ClaudeBot	Anthropic	Model training
PerplexityBot	Perplexity	Live answers + index
Google-Extended	Google	Gemini / AI training control

Why does it matter for B2B sites?

For B2B and SaaS, buyers now ask ChatGPT and Perplexity "what's the best tool for X" before they ever reach Google. If the live-answer crawlers can't reach your pages, you're invisible in that conversation — no logo, no mention, no citation. That's why generative engine optimization starts with one boring check: are the right crawlers actually allowed?

The honest move is to decide per-bot. You might allow the answer crawlers (so you get cited) while blocking the pure training ones, or allow both — it's a business choice, not a default. In our experience the most common mistake is a blanket block left over from a nervous IT policy, quietly costing visibility. To turn access into citations, the guide to ranking in ChatGPT and Perplexity walks through the content and structure that get quoted.

FAQ

Is an AI crawler the same as Googlebot?

No. Googlebot indexes pages for traditional search results. An AI crawler feeds AI systems instead — either training a model or fetching a live passage to quote inside an AI answer. They use different user-agent names, so you can allow or block them separately.

How do I block AI crawlers?

Add their user-agent names to your robots.txt with a Disallow rule. Block them individually (for example GPTBot or ClaudeBot) rather than all at once, so you can keep the live-answer crawlers that cite you while stopping the training-only ones.

Will blocking AI crawlers hurt my SEO?

It won't change your normal Google rankings, since those rely on Googlebot. But blocking the live-answer crawlers means AI tools like ChatGPT and Perplexity can't fetch and cite your pages, so you lose visibility in AI search.

Go deeper

How Do You Rank in ChatGPT and Perplexity? A GEO Playbook for B2B/SaaS →