Top 12 LLM Crawlers and What They Do (2026 Directory)
Every major LLM crawler in 2026 — GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Applebot-Extended, and 7 more. User-agents, behavior, robots.txt rules.
TL;DR: In 2026, twelve LLM crawlers matter for AI discoverability: GPTBot, OAI-SearchBot, ChatGPT-User (OpenAI); ClaudeBot, Claude-User, Claude-SearchBot (Anthropic); PerplexityBot, Perplexity-User (Perplexity); Google-Extended, GoogleOther (Google); Applebot-Extended (Apple); BingBot/Copilot (Microsoft). Each has a distinct user-agent, role (training vs answer-citing vs both), and robots.txt rule. This directory covers all twelve. Test your site against every one of them with our free LLM Bot Checker.
Top 12 LLM Crawlers and What They Do (2026 Directory)
If you're optimizing for AI search in 2026, the first question isn't "what should I write?" — it's "are the right crawlers reaching me?" This directory lists every major LLM crawler, what it does, how to allow or block it, and how it cites content.
Want to skip to results? Test your URL with our free LLM Bot Checker →
OpenAI (ChatGPT)
OpenAI runs three distinct crawlers, each with a different role.
1. GPTBot
- User-agent:
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot) - Role: Training data collection for future models
- Citation behavior: Indirect — content fed into training may be reflected in model responses, but not as live citations
- Allow:
User-agent: GPTBot\nAllow: / - Block:
User-agent: GPTBot\nDisallow: /
2. OAI-SearchBot
- User-agent:
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot) - Role: Builds the index for ChatGPT search
- Citation behavior: Direct — pages that match a user query may be cited inline in ChatGPT answers
- Recommendation: Allow if you want ChatGPT to cite you
3. ChatGPT-User
- User-agent:
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ChatGPT-User/1.0; +https://openai.com/bot) - Role: On-demand fetcher when a user pastes a URL or asks ChatGPT to read a page
- Citation behavior: Direct — content is summarized in real time and cited in the answer
- Recommendation: Allow — blocking this hurts user-initiated reads, not bulk crawling
Anthropic (Claude)
Three crawlers as of 2026. Claude-SearchBot was added this year.
4. ClaudeBot
- User-agent:
Mozilla/5.0 (compatible; ClaudeBot/1.0; +claudebot@anthropic.com) - Role: Training data collection
- Citation behavior: Indirect via training
- Recommendation: Allow if you want long-term presence in Claude responses
5. Claude-User
- User-agent:
Mozilla/5.0 (compatible; Claude-User/1.0; +Claude-User@anthropic.com) - Role: On-demand fetch when a Claude user references a URL
- Citation behavior: Direct, real-time
- Recommendation: Allow
6. Claude-SearchBot (new in 2026)
- User-agent:
Mozilla/5.0 (compatible; Claude-SearchBot/1.0; +Claude-SearchBot@anthropic.com) - Role: Index for Claude search citations
- Citation behavior: Direct, with inline citations
- Recommendation: Allow — this is the equivalent of OAI-SearchBot for Claude
Perplexity
7. PerplexityBot
- User-agent:
Mozilla/5.0 (compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot) - Role: Builds Perplexity's primary search index
- Citation behavior: Direct, prominent citations
- Recommendation: Allow — Perplexity is the most citation-heavy AI search engine in 2026
8. Perplexity-User
- User-agent:
Mozilla/5.0 (compatible; Perplexity-User/1.0; +https://perplexity.ai/perplexity-user) - Role: On-demand fetch when users invoke a URL
- Citation behavior: Direct
- Recommendation: Allow
Google's AI crawling story is more nuanced. There are two relevant signals.
9. Google-Extended
- User-agent: Not a real crawler — it's a robots.txt-only signal
- Role: Controls whether Google can use your content for Gemini training and Vertex AI
- Citation behavior: Affects Gemini quality and Vertex AI grounding
- Blocking does NOT remove you from Google search
User-agent: Google-Extended
Allow: /
10. GoogleOther
- User-agent:
GoogleOther - Role: Experimental and product-related crawls (including AI Overviews features)
- Citation behavior: Influences AI Overviews appearance
- Recommendation: Allow
Apple
11. Applebot-Extended
- User-agent:
Applebot-Extended(robots.txt signal) - Role: Controls whether Apple can use your content for Apple Intelligence training
- Citation behavior: Indirect via Apple Intelligence
- Recommendation: Allow if you want presence in Apple's AI surfaces
Microsoft (Bing + Copilot)
12. Bingbot / BingPreview
- User-agent:
bingbot(andBingPreviewfor snapshots) - Role: Powers Bing search index, which feeds Microsoft Copilot
- Citation behavior: Direct in Copilot answers
- Recommendation: Allow — this is the Copilot equivalent of OAI-SearchBot
A robots.txt template for 2026
# Allow all major AI crawlers
User-agent: GPTBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: Claude-User
Allow: /
User-agent: Claude-SearchBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Perplexity-User
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: GoogleOther
Allow: /
User-agent: Applebot-Extended
Allow: /
After deploying, test with our free LLM Bot Checker — it pings every one of the user-agents above against your robots.txt and surfaces any block that slipped through.
Frequently Asked Questions
Which AI crawlers respect robots.txt?
GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, Claude-User, Claude-SearchBot, PerplexityBot, and Google-Extended all respect robots.txt. Compliance is high but not universal — for strict enforcement, use Cloudflare's AI Scrapers rule or similar at the CDN layer.
What's the difference between GPTBot and OAI-SearchBot?
GPTBot collects training data; OAI-SearchBot builds the live search index for ChatGPT. If you want ChatGPT to cite you in answers, you need OAI-SearchBot allowed. If you also want OpenAI to use your content in future model training, allow GPTBot.
Does blocking Google-Extended hurt my Google search ranking?
No. Google-Extended only controls AI training (Gemini, Vertex AI). Regular Googlebot is separate, and your search ranking is unaffected by Google-Extended decisions.
Is Claude-SearchBot a new crawler?
Yes — Anthropic added Claude-SearchBot in 2026 specifically for Claude's search citation feature. If your robots.txt was last updated before 2026, it almost certainly doesn't address Claude-SearchBot explicitly.
Should I allow or block all of these?
Default to allow for all twelve unless you have a specific reason to block. The economics of AI citation in 2026 favor visibility — being recommended in ChatGPT, Claude, Perplexity, Gemini, and Copilot answers is the new top-of-funnel.
How do I test that my robots.txt actually allows these crawlers?
Use our free LLM Bot Checker. It tests every user-agent in this directory against your robots.txt and shows you the allow/block result for each. Takes 5 seconds.
Test your robots.txt against 12 AI crawlers
The free Hyperleap LLM Bot Checker covers GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Applebot-Extended, and 7 more — including the 2026 additions like Claude-SearchBot.
Run the LLM Bot CheckerRelated Resources
- LLM Bot Checker — free tool, tests every crawler in this directory
- How to Check if AI is Crawling Your Site — practical audit guide
- llms.txt Explained — the AI-discoverability companion to robots.txt
- Will ChatGPT and Perplexity Recommend Your Business? — broader GEO strategy
- Free Schema Generator — structured data for FAQ, Article, Product
Related Articles
How to Check if AI is Crawling Your Site (2026 Guide)
The 2026 guide to detecting AI crawlers — GPTBot, ClaudeBot, PerplexityBot, Google-Extended — in server logs, robots.txt, and llms.txt.
llms.txt Explained: Should Your SaaS Site Have One in 2026?
llms.txt is an emerging standard for telling AI crawlers what your site is about. Here's what it is, what it does, and whether your SaaS needs one.
Will ChatGPT and Perplexity Recommend Your Business? The 2026 GEO Playbook
Generative Engine Optimization is how you show up in ChatGPT, Perplexity, and Gemini answers. Here's the practical 2026 playbook for getting recommended.