AEO/GEO · Glossary
AI crawler
Last updated June 29, 2026 · by Tal Gerafi
An AI crawler is an automated bot that fetches web pages to feed AI systems — training data for models or live answers in tools like ChatGPT and Perplexity. Examples: GPTBot, ClaudeBot, PerplexityBot.
AI crawlers come in two basic jobs, and the difference matters for how you treat them. Some collect text to train future models. Others fetch a page in real time, the moment a user asks a question, so the AI can quote your site in its answer. If you block the second kind, you can disappear from AI answers.
How does an AI crawler work?
An AI crawler reads your robots.txt, follows links, and downloads the HTML of pages it's allowed to fetch — much like Googlebot. The job it does with that content depends on its operator. A training crawler stores the text to teach a model. A live "answer" crawler grabs the page during a search and passes the relevant passage straight into the model's reply, often with a citation back to you.
Each crawler has its own user-agent name, so you control them one by one in robots.txt. Knowing the names is half the work — see how a published llms.txt file complements this by pointing crawlers at your best content. The cleaner and faster your pages, the better your crawl budget is spent.
| User-agent | Operator | Main job |
|---|---|---|
| GPTBot | OpenAI | Model training |
| OAI-SearchBot | OpenAI | ChatGPT live answers |
| ClaudeBot | Anthropic | Model training |
| PerplexityBot | Perplexity | Live answers + index |
| Google-Extended | Gemini / AI training control |
Why does it matter for B2B sites?
For B2B and SaaS, buyers now ask ChatGPT and Perplexity "what's the best tool for X" before they ever reach Google. If the live-answer crawlers can't reach your pages, you're invisible in that conversation — no logo, no mention, no citation. That's why generative engine optimization starts with one boring check: are the right crawlers actually allowed?
The honest move is to decide per-bot. You might allow the answer crawlers (so you get cited) while blocking the pure training ones, or allow both — it's a business choice, not a default. In our experience the most common mistake is a blanket block left over from a nervous IT policy, quietly costing visibility. To turn access into citations, the guide to ranking in ChatGPT and Perplexity walks through the content and structure that get quoted.
FAQ
Is an AI crawler the same as Googlebot?
No. Googlebot indexes pages for traditional search results. An AI crawler feeds AI systems instead — either training a model or fetching a live passage to quote inside an AI answer. They use different user-agent names, so you can allow or block them separately.
How do I block AI crawlers?
Add their user-agent names to your robots.txt with a Disallow rule. Block them individually (for example GPTBot or ClaudeBot) rather than all at once, so you can keep the live-answer crawlers that cite you while stopping the training-only ones.
Will blocking AI crawlers hurt my SEO?
It won't change your normal Google rankings, since those rely on Googlebot. But blocking the live-answer crawlers means AI tools like ChatGPT and Perplexity can't fetch and cite your pages, so you lose visibility in AI search.