Top 12 LLM Crawlers and What They Do (2026 Directory)
Back to Blog
GEO

Top 12 LLM Crawlers and What They Do (2026 Directory)

Every major LLM crawler in 2026 — GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Applebot-Extended, and 7 more. User-agents, behavior, robots.txt rules.

Gopi Krishna Lakkepuram
May 3, 2026
6 min read

TL;DR: In 2026, twelve LLM crawlers matter for AI discoverability: GPTBot, OAI-SearchBot, ChatGPT-User (OpenAI); ClaudeBot, Claude-User, Claude-SearchBot (Anthropic); PerplexityBot, Perplexity-User (Perplexity); Google-Extended, GoogleOther (Google); Applebot-Extended (Apple); BingBot/Copilot (Microsoft). Each has a distinct user-agent, role (training vs answer-citing vs both), and robots.txt rule. This directory covers all twelve. Test your site against every one of them with our free LLM Bot Checker.

Top 12 LLM Crawlers and What They Do (2026 Directory)

If you're optimizing for AI search in 2026, the first question isn't "what should I write?" — it's "are the right crawlers reaching me?" This directory lists every major LLM crawler, what it does, how to allow or block it, and how it cites content.

Want to skip to results? Test your URL with our free LLM Bot Checker →

OpenAI (ChatGPT)

OpenAI runs three distinct crawlers, each with a different role.

1. GPTBot

  • User-agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)
  • Role: Training data collection for future models
  • Citation behavior: Indirect — content fed into training may be reflected in model responses, but not as live citations
  • Allow: User-agent: GPTBot\nAllow: /
  • Block: User-agent: GPTBot\nDisallow: /

2. OAI-SearchBot

  • User-agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot)
  • Role: Builds the index for ChatGPT search
  • Citation behavior: Direct — pages that match a user query may be cited inline in ChatGPT answers
  • Recommendation: Allow if you want ChatGPT to cite you

3. ChatGPT-User

  • User-agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ChatGPT-User/1.0; +https://openai.com/bot)
  • Role: On-demand fetcher when a user pastes a URL or asks ChatGPT to read a page
  • Citation behavior: Direct — content is summarized in real time and cited in the answer
  • Recommendation: Allow — blocking this hurts user-initiated reads, not bulk crawling

Anthropic (Claude)

Three crawlers as of 2026. Claude-SearchBot was added this year.

4. ClaudeBot

  • User-agent: Mozilla/5.0 (compatible; ClaudeBot/1.0; +claudebot@anthropic.com)
  • Role: Training data collection
  • Citation behavior: Indirect via training
  • Recommendation: Allow if you want long-term presence in Claude responses

5. Claude-User

  • User-agent: Mozilla/5.0 (compatible; Claude-User/1.0; +Claude-User@anthropic.com)
  • Role: On-demand fetch when a Claude user references a URL
  • Citation behavior: Direct, real-time
  • Recommendation: Allow

6. Claude-SearchBot (new in 2026)

  • User-agent: Mozilla/5.0 (compatible; Claude-SearchBot/1.0; +Claude-SearchBot@anthropic.com)
  • Role: Index for Claude search citations
  • Citation behavior: Direct, with inline citations
  • Recommendation: Allow — this is the equivalent of OAI-SearchBot for Claude

Perplexity

7. PerplexityBot

  • User-agent: Mozilla/5.0 (compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)
  • Role: Builds Perplexity's primary search index
  • Citation behavior: Direct, prominent citations
  • Recommendation: Allow — Perplexity is the most citation-heavy AI search engine in 2026

8. Perplexity-User

  • User-agent: Mozilla/5.0 (compatible; Perplexity-User/1.0; +https://perplexity.ai/perplexity-user)
  • Role: On-demand fetch when users invoke a URL
  • Citation behavior: Direct
  • Recommendation: Allow

Google

Google's AI crawling story is more nuanced. There are two relevant signals.

9. Google-Extended

  • User-agent: Not a real crawler — it's a robots.txt-only signal
  • Role: Controls whether Google can use your content for Gemini training and Vertex AI
  • Citation behavior: Affects Gemini quality and Vertex AI grounding
  • Blocking does NOT remove you from Google search
User-agent: Google-Extended
Allow: /

10. GoogleOther

  • User-agent: GoogleOther
  • Role: Experimental and product-related crawls (including AI Overviews features)
  • Citation behavior: Influences AI Overviews appearance
  • Recommendation: Allow

Apple

11. Applebot-Extended

  • User-agent: Applebot-Extended (robots.txt signal)
  • Role: Controls whether Apple can use your content for Apple Intelligence training
  • Citation behavior: Indirect via Apple Intelligence
  • Recommendation: Allow if you want presence in Apple's AI surfaces

Microsoft (Bing + Copilot)

12. Bingbot / BingPreview

  • User-agent: bingbot (and BingPreview for snapshots)
  • Role: Powers Bing search index, which feeds Microsoft Copilot
  • Citation behavior: Direct in Copilot answers
  • Recommendation: Allow — this is the Copilot equivalent of OAI-SearchBot

A robots.txt template for 2026

# Allow all major AI crawlers
User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Claude-User
Allow: /

User-agent: Claude-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Perplexity-User
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: GoogleOther
Allow: /

User-agent: Applebot-Extended
Allow: /

After deploying, test with our free LLM Bot Checker — it pings every one of the user-agents above against your robots.txt and surfaces any block that slipped through.

Frequently Asked Questions

Which AI crawlers respect robots.txt?

GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, Claude-User, Claude-SearchBot, PerplexityBot, and Google-Extended all respect robots.txt. Compliance is high but not universal — for strict enforcement, use Cloudflare's AI Scrapers rule or similar at the CDN layer.

What's the difference between GPTBot and OAI-SearchBot?

GPTBot collects training data; OAI-SearchBot builds the live search index for ChatGPT. If you want ChatGPT to cite you in answers, you need OAI-SearchBot allowed. If you also want OpenAI to use your content in future model training, allow GPTBot.

Does blocking Google-Extended hurt my Google search ranking?

No. Google-Extended only controls AI training (Gemini, Vertex AI). Regular Googlebot is separate, and your search ranking is unaffected by Google-Extended decisions.

Is Claude-SearchBot a new crawler?

Yes — Anthropic added Claude-SearchBot in 2026 specifically for Claude's search citation feature. If your robots.txt was last updated before 2026, it almost certainly doesn't address Claude-SearchBot explicitly.

Should I allow or block all of these?

Default to allow for all twelve unless you have a specific reason to block. The economics of AI citation in 2026 favor visibility — being recommended in ChatGPT, Claude, Perplexity, Gemini, and Copilot answers is the new top-of-funnel.

How do I test that my robots.txt actually allows these crawlers?

Use our free LLM Bot Checker. It tests every user-agent in this directory against your robots.txt and shows you the allow/block result for each. Takes 5 seconds.

Test your robots.txt against 12 AI crawlers

The free Hyperleap LLM Bot Checker covers GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Applebot-Extended, and 7 more — including the 2026 additions like Claude-SearchBot.

Run the LLM Bot Checker

Related Articles

Gopi Krishna Lakkepuram

Founder & CEO

Gopi leads Hyperleap AI with a vision to transform how businesses implement AI. Before founding Hyperleap AI, he built and scaled systems serving billions of users at Microsoft on Office 365 and Outlook.com. He holds an MBA from ISB and combines technical depth with business acumen.

Published on May 3, 2026