Top 12 LLM Crawlers and What They Do (2026 Directory)

TL;DR: In 2026, twelve LLM crawlers matter for AI discoverability: GPTBot, OAI-SearchBot, ChatGPT-User (OpenAI); ClaudeBot, Claude-User, Claude-SearchBot (Anthropic); PerplexityBot, Perplexity-User (Perplexity); Google-Extended, GoogleOther (Google); Applebot-Extended (Apple); BingBot/Copilot (Microsoft). Each has a distinct user-agent, role (training vs answer-citing vs both), and robots.txt rule. This directory covers all twelve. Test your site against every one of them with our free LLM Bot Checker.

Top 12 LLM Crawlers and What They Do (2026 Directory)

If you're optimizing for AI search in 2026, the first question isn't "what should I write?" — it's "are the right crawlers reaching me?" This directory lists every major LLM crawler, what it does, how to allow or block it, and how it cites content.

Want to skip to results? Test your URL with our free LLM Bot Checker →

OpenAI (ChatGPT)

OpenAI runs three distinct crawlers, each with a different role.

1. GPTBot

User-agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)
Role: Training data collection for future models
Citation behavior: Indirect — content fed into training may be reflected in model responses, but not as live citations
Allow: User-agent: GPTBot\nAllow: /
Block: User-agent: GPTBot\nDisallow: /

2. OAI-SearchBot

User-agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot)
Role: Builds the index for ChatGPT search
Citation behavior: Direct — pages that match a user query may be cited inline in ChatGPT answers
Recommendation: Allow if you want ChatGPT to cite you

3. ChatGPT-User

User-agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ChatGPT-User/1.0; +https://openai.com/bot)
Role: On-demand fetcher when a user pastes a URL or asks ChatGPT to read a page
Citation behavior: Direct — content is summarized in real time and cited in the answer
Recommendation: Allow — blocking this hurts user-initiated reads, not bulk crawling

Anthropic (Claude)

Three crawlers as of 2026. Claude-SearchBot was added this year.

4. ClaudeBot

User-agent: Mozilla/5.0 (compatible; ClaudeBot/1.0; +claudebot@anthropic.com)
Role: Training data collection
Citation behavior: Indirect via training
Recommendation: Allow if you want long-term presence in Claude responses

5. Claude-User

User-agent: Mozilla/5.0 (compatible; Claude-User/1.0; +Claude-User@anthropic.com)
Role: On-demand fetch when a Claude user references a URL
Citation behavior: Direct, real-time
Recommendation: Allow

6. Claude-SearchBot (new in 2026)

User-agent: Mozilla/5.0 (compatible; Claude-SearchBot/1.0; +Claude-SearchBot@anthropic.com)
Role: Index for Claude search citations
Citation behavior: Direct, with inline citations
Recommendation: Allow — this is the equivalent of OAI-SearchBot for Claude

Perplexity

7. PerplexityBot

User-agent: Mozilla/5.0 (compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)
Role: Builds Perplexity's primary search index
Citation behavior: Direct, prominent citations
Recommendation: Allow — Perplexity is the most citation-heavy AI search engine in 2026

8. Perplexity-User

User-agent: Mozilla/5.0 (compatible; Perplexity-User/1.0; +https://perplexity.ai/perplexity-user)
Role: On-demand fetch when users invoke a URL
Citation behavior: Direct
Recommendation: Allow

Google

Google's AI crawling story is more nuanced. There are two relevant signals.

9. Google-Extended

User-agent: Not a real crawler — it's a robots.txt-only signal
Role: Controls whether Google can use your content for Gemini training and Vertex AI
Citation behavior: Affects Gemini quality and Vertex AI grounding
Blocking does NOT remove you from Google search

User-agent: Google-Extended
Allow: /

10. GoogleOther

User-agent: GoogleOther
Role: Experimental and product-related crawls (including AI Overviews features)
Citation behavior: Influences AI Overviews appearance
Recommendation: Allow

Apple

11. Applebot-Extended

User-agent: Applebot-Extended (robots.txt signal)
Role: Controls whether Apple can use your content for Apple Intelligence training
Citation behavior: Indirect via Apple Intelligence
Recommendation: Allow if you want presence in Apple's AI surfaces

Microsoft (Bing + Copilot)

12. Bingbot / BingPreview

User-agent: bingbot (and BingPreview for snapshots)
Role: Powers Bing search index, which feeds Microsoft Copilot
Citation behavior: Direct in Copilot answers
Recommendation: Allow — this is the Copilot equivalent of OAI-SearchBot

A robots.txt template for 2026

# Allow all major AI crawlers
User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Claude-User
Allow: /

User-agent: Claude-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Perplexity-User
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: GoogleOther
Allow: /

User-agent: Applebot-Extended
Allow: /

After deploying, test with our free LLM Bot Checker — it pings every one of the user-agents above against your robots.txt and surfaces any block that slipped through.

Frequently Asked Questions

Which AI crawlers respect robots.txt?

GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, Claude-User, Claude-SearchBot, PerplexityBot, and Google-Extended all respect robots.txt. Compliance is high but not universal — for strict enforcement, use Cloudflare's AI Scrapers rule or similar at the CDN layer.

What's the difference between GPTBot and OAI-SearchBot?

GPTBot collects training data; OAI-SearchBot builds the live search index for ChatGPT. If you want ChatGPT to cite you in answers, you need OAI-SearchBot allowed. If you also want OpenAI to use your content in future model training, allow GPTBot.

Does blocking Google-Extended hurt my Google search ranking?

No. Google-Extended only controls AI training (Gemini, Vertex AI). Regular Googlebot is separate, and your search ranking is unaffected by Google-Extended decisions.

Is Claude-SearchBot a new crawler?

Yes — Anthropic added Claude-SearchBot in 2026 specifically for Claude's search citation feature. If your robots.txt was last updated before 2026, it almost certainly doesn't address Claude-SearchBot explicitly.

Should I allow or block all of these?

Default to allow for all twelve unless you have a specific reason to block. The economics of AI citation in 2026 favor visibility — being recommended in ChatGPT, Claude, Perplexity, Gemini, and Copilot answers is the new top-of-funnel.

How do I test that my robots.txt actually allows these crawlers?

Use our free LLM Bot Checker. It tests every user-agent in this directory against your robots.txt and shows you the allow/block result for each. Takes 5 seconds.

How is AI search different from traditional SEO?

Traditional SEO ranks pages for keyword queries. AI search (ChatGPT, Perplexity, Google AI Overviews) cites passages from pages to compose answers. Optimizing for AI search — sometimes called GEO or AEO — focuses on citable, well-structured passages and schema markup rather than just keyword density.

Do I need to do anything different for ChatGPT and Perplexity vs Google?

The fundamentals overlap — quality content, schema markup, crawler accessibility — but AI engines lean heavily on structured data and clear passage-level answers. Make sure GPTBot, ClaudeBot, and PerplexityBot are not blocked in robots.txt and that your key answers are in scannable paragraphs and lists.

What is llms.txt and do I need one?

llms.txt is a proposed standard that helps AI crawlers find your most important content (similar in spirit to robots.txt or sitemap.xml). It is not yet a hard ranking signal, but it costs little to publish and signals intent — early-adopter sites are seeing it referenced.

How long does it take to set up an AI chatbot with Hyperleap?

Most SMBs go live in 3–5 days for self-serve setup. With Managed Setup (from $299 one-time, available on every plan), Hyperleap builds the bot for you on your content and channels. A 7-day free trial is included on every plan.

Top 12 LLM Crawlers and What They Do (2026 Directory)

Top 12 LLM Crawlers and What They Do (2026 Directory)