Specification/AI Crawler Permissions

AI Crawler Permissions

26 checks · Weight: 8% of overall score

1 critical4 high21 medium
2.1

GPTBot allowed

mediumPass / Warn / Fail

Without an explicit robots.txt rule, GPTBot may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.

Why This Matters

Blocking GPTBot prevents your content from being used by OpenAI's models and appearing in ChatGPT responses. Explicitly allowing it signals that your site welcomes AI indexing for the largest AI platform by user base.

How to Fix

Add an explicit User-agent: GPTBot with Allow: / rule in your robots.txt file.

Example

User-agent: GPTBot
Allow: /
Effort: Trivial (minutes)Documentation →
robots-txtopenaicrawler-permissions
2.2

Google-Extended allowed

mediumPass / Warn / Fail

Without an explicit robots.txt rule, Google-Extended may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.

Why This Matters

Blocking Google-Extended prevents your content from being used in Google's AI features like Gemini and AI Overviews. Allowing it ensures your site appears in Google's AI-powered search experiences alongside traditional results.

How to Fix

Add an explicit User-agent: Google-Extended with Allow: / rule in your robots.txt file.

Example

User-agent: Google-Extended
Allow: /
Effort: Trivial (minutes)Documentation →
robots-txtgooglecrawler-permissions
2.3

anthropic-ai / ClaudeBot allowed

mediumPass / Warn / Fail

Without an explicit robots.txt rule, anthropic-ai / ClaudeBot may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.

Why This Matters

Blocking anthropic-ai / ClaudeBot prevents your content from being used by Anthropic's Claude models. Explicitly allowing it ensures your site is indexed for Claude's training data and knowledge base.

How to Fix

Add explicit User-agent rules for both anthropic-ai and ClaudeBot with Allow: / in your robots.txt file.

Example

User-agent: anthropic-ai
Allow: /

User-agent: ClaudeBot
Allow: /
Effort: Trivial (minutes)
robots-txtanthropiccrawler-permissions
2.4

PerplexityBot allowed

mediumPass / Warn / Fail

Without an explicit robots.txt rule, PerplexityBot may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.

Why This Matters

Blocking PerplexityBot prevents your content from appearing in Perplexity AI search results, one of the fastest-growing AI answer engines. Allowing it gives your content visibility in AI-native search.

How to Fix

Add an explicit User-agent: PerplexityBot with Allow: / rule in your robots.txt file.

Example

User-agent: PerplexityBot
Allow: /
Effort: Trivial (minutes)Documentation →
robots-txtperplexitycrawler-permissions
2.5

Applebot-Extended allowed

mediumPass / Warn / Fail

Without an explicit robots.txt rule, Applebot-Extended may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.

Why This Matters

Blocking Applebot-Extended prevents your content from being used in Apple Intelligence features, Siri AI answers, and Safari Highlights. Allowing it ensures visibility across Apple's AI ecosystem.

How to Fix

Add an explicit User-agent: Applebot-Extended with Allow: / rule in your robots.txt file.

Example

User-agent: Applebot-Extended
Allow: /
Effort: Trivial (minutes)Documentation →
robots-txtapplecrawler-permissions
2.6

CCBot allowed

mediumPass / Warn / Fail

Without an explicit robots.txt rule, CCBot may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.

Why This Matters

Blocking CCBot prevents your content from being included in the Common Crawl dataset, which is a foundational training data source for many AI models. Allowing it broadens your content's reach across multiple AI systems.

How to Fix

Add an explicit User-agent: CCBot with Allow: / rule in your robots.txt file.

Example

User-agent: CCBot
Allow: /
Effort: Trivial (minutes)Documentation →
robots-txtcommon-crawlcrawler-permissions
2.7

Meta-ExternalAgent allowed

mediumPass / Warn / Fail

Without an explicit robots.txt rule, Meta-ExternalAgent may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.

Why This Matters

Blocking Meta-ExternalAgent prevents your content from being used in Meta's AI features across Facebook, Instagram, and WhatsApp. Allowing it ensures visibility in Meta's AI-powered recommendations and summaries.

How to Fix

Add an explicit User-agent: Meta-ExternalAgent with Allow: / rule in your robots.txt file.

Example

User-agent: Meta-ExternalAgent
Allow: /
Effort: Trivial (minutes)
robots-txtmetacrawler-permissions
2.8

Amazonbot allowed

mediumPass / Warn / Fail

Without an explicit robots.txt rule, Amazonbot may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.

Why This Matters

Blocking Amazonbot prevents your content from appearing in Alexa AI answers and Amazon's AI-powered search features. Allowing it gives your content visibility in Amazon's voice and commerce AI ecosystem.

How to Fix

Add an explicit User-agent: Amazonbot with Allow: / rule in your robots.txt file.

Example

User-agent: Amazonbot
Allow: /
Effort: Trivial (minutes)Documentation →
robots-txtamazoncrawler-permissions
2.9

Bytespider allowed

mediumPass / Warn / Fail

Without an explicit robots.txt rule, Bytespider may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.

Why This Matters

Blocking Bytespider prevents your content from being used by ByteDance's AI products including TikTok search and Doubao AI. Allowing it extends your content's reach to ByteDance's large user base.

How to Fix

Add an explicit User-agent: Bytespider with Allow: / rule in your robots.txt file.

Example

User-agent: Bytespider
Allow: /
Effort: Trivial (minutes)
robots-txtbytedancecrawler-permissions
2.10

cohere-ai allowed

mediumPass / Warn / Fail

Without an explicit robots.txt rule, cohere-ai may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.

Why This Matters

Blocking cohere-ai prevents your content from being used by Cohere's enterprise AI models, which power search and RAG applications for many businesses. Allowing it ensures your content is available in Cohere-powered AI search products.

How to Fix

Add an explicit User-agent: cohere-ai with Allow: / rule in your robots.txt file.

Example

User-agent: cohere-ai
Allow: /
Effort: Trivial (minutes)
robots-txtcoherecrawler-permissions
2.11

YouBot allowed

mediumPass / Warn / Fail

Without an explicit robots.txt rule, YouBot may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.

Why This Matters

Blocking YouBot prevents your content from appearing in You.com AI search results. Allowing it gives your content visibility in this AI-native search engine that generates direct answers for users.

How to Fix

Add an explicit User-agent: YouBot with Allow: / rule in your robots.txt file.

Example

User-agent: YouBot
Allow: /
Effort: Trivial (minutes)
robots-txtyou-comcrawler-permissions
2.12

Diffbot allowed

mediumPass / Warn / Fail

Without an explicit robots.txt rule, Diffbot may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.

Why This Matters

Blocking Diffbot prevents your content from being included in Diffbot's Knowledge Graph, which powers structured data extraction for many AI applications. Allowing it ensures your content is properly indexed for AI-powered entity extraction.

How to Fix

Add an explicit User-agent: Diffbot with Allow: / rule in your robots.txt file.

Example

User-agent: Diffbot
Allow: /
Effort: Trivial (minutes)
robots-txtdiffbotcrawler-permissions
2.13

AI2Bot allowed

mediumPass / Warn / Fail

Without an explicit robots.txt rule, AI2Bot may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.

Why This Matters

Blocking AI2Bot prevents your content from being used by the Allen Institute for AI (AI2), which powers research models and semantic search tools. Allowing it contributes to open AI research and ensures broader content visibility.

How to Fix

Add an explicit User-agent: AI2Bot with Allow: / rule in your robots.txt file.

Example

User-agent: AI2Bot
Allow: /
Effort: Trivial (minutes)
robots-txtai2crawler-permissions
2.14

ChatGPT-User allowed

mediumPass / Warn / Fail

Without an explicit robots.txt rule, ChatGPT-User may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.

Why This Matters

Blocking ChatGPT-User prevents ChatGPT from browsing your site in real-time when users ask it to visit your pages. This blocks your content from being cited in ChatGPT Browse conversations, losing a significant source of AI-driven traffic.

How to Fix

Add an explicit User-agent: ChatGPT-User with Allow: / rule in your robots.txt file.

Example

User-agent: ChatGPT-User
Allow: /
Effort: Trivial (minutes)Documentation →
robots-txtopenairealtimecrawler-permissions
2.15

Claude-User allowed

mediumPass / Warn / Fail

Without an explicit robots.txt rule, Claude-User may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.

Why This Matters

Blocking Claude-User prevents Claude from browsing your site in real-time when users ask it to visit your pages. This blocks your content from being cited in Claude conversations with web access enabled.

How to Fix

Add an explicit User-agent: Claude-User with Allow: / rule in your robots.txt file.

Example

User-agent: Claude-User
Allow: /
Effort: Trivial (minutes)
robots-txtanthropicrealtimecrawler-permissions
2.16

OAI-SearchBot allowed

mediumPass / Warn / Fail

Without an explicit robots.txt rule, OAI-SearchBot may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.

Why This Matters

Blocking OAI-SearchBot prevents your content from appearing in OpenAI's SearchGPT and ChatGPT web search results. Allowing it ensures your site is discoverable through OpenAI's real-time search features.

How to Fix

Add an explicit User-agent: OAI-SearchBot with Allow: / rule in your robots.txt file.

Example

User-agent: OAI-SearchBot
Allow: /
Effort: Trivial (minutes)Documentation →
robots-txtopenairealtimecrawler-permissions
2.17

Meta-ExternalFetcher allowed

mediumPass / Warn / Fail

Without an explicit robots.txt rule, Meta-ExternalFetcher may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.

Why This Matters

Blocking Meta-ExternalFetcher prevents Meta's AI from fetching your content in real-time for AI-powered features across Facebook, Instagram, and WhatsApp. Allowing it ensures your content can be surfaced in Meta's real-time AI experiences.

How to Fix

Add an explicit User-agent: Meta-ExternalFetcher with Allow: / rule in your robots.txt file.

Example

User-agent: Meta-ExternalFetcher
Allow: /
Effort: Trivial (minutes)
robots-txtmetarealtimecrawler-permissions
2.18

Bravebot allowed

mediumPass / Warn / Fail

Without an explicit robots.txt rule, Bravebot may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.

Why This Matters

Blocking Bravebot prevents your content from appearing in Brave Search AI answers and Brave Leo AI assistant responses. Allowing it gives your content visibility in the privacy-focused Brave browser ecosystem.

How to Fix

Add an explicit User-agent: Bravebot with Allow: / rule in your robots.txt file.

Example

User-agent: Bravebot
Allow: /
Effort: Trivial (minutes)
robots-txtbraverealtimecrawler-permissions
2.19

DuckAssistBot allowed

mediumPass / Warn / Fail

Without an explicit robots.txt rule, DuckAssistBot may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.

Why This Matters

Blocking DuckAssistBot prevents your content from appearing in DuckDuckGo's AI-powered DuckAssist feature, which generates instant answers from crawled web pages. Allowing it ensures visibility in this privacy-first AI search experience.

How to Fix

Add an explicit User-agent: DuckAssistBot with Allow: / rule in your robots.txt file.

Example

User-agent: DuckAssistBot
Allow: /
Effort: Trivial (minutes)
robots-txtduckduckgorealtimecrawler-permissions
2.20

MistralAI-User allowed

mediumPass / Warn / Fail

Without an explicit robots.txt rule, MistralAI-User may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.

Why This Matters

Blocking MistralAI-User prevents Mistral AI's Le Chat from browsing your site in real-time when users ask it to visit your pages. Allowing it ensures your content can be cited in Mistral-powered AI conversations.

How to Fix

Add an explicit User-agent: MistralAI-User with Allow: / rule in your robots.txt file.

Example

User-agent: MistralAI-User
Allow: /
Effort: Trivial (minutes)
robots-txtmistralrealtimecrawler-permissions
2.21

Claude-SearchBot allowed

mediumPass / Warn / Fail

Without an explicit robots.txt rule, Claude-SearchBot may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.

Why This Matters

Blocking Claude-SearchBot prevents your content from appearing in Claude's web search results. Allowing it ensures your site is included when Claude searches the web to answer user questions.

How to Fix

Add an explicit User-agent: Claude-SearchBot with Allow: / rule in your robots.txt file.

Example

User-agent: Claude-SearchBot
Allow: /
Effort: Trivial (minutes)
robots-txtanthropicrealtimecrawler-permissions
2.22

No blanket AI block

criticalPass / Warn / Fail

A blanket Disallow: / under User-agent: * blocks every crawler, including all AI agents. Your site becomes invisible to AI search engines, ChatGPT Browse, Perplexity, and others.

Why This Matters

A blanket Disallow: / under User-agent: * blocks every crawler, including all AI agents. Your site becomes completely invisible to AI search engines, ChatGPT Browse, Perplexity, Claude, and all other AI-powered discovery tools.

How to Fix

Replace the blanket Disallow: / with targeted path blocks for sensitive areas only. Allow the root path and block only private directories like /api/ and /admin/.

Example

User-agent: *
Allow: /
Disallow: /api/
Disallow: /admin/
Disallow: /internal/
Effort: Trivial (minutes)Documentation →
robots-txtcriticalcrawler-permissions
2.23

Sensitive paths protected

highPass / Warn / Fail

Without robots.txt, AI crawlers can access sensitive paths like /api/ and /admin/. This may expose internal endpoints, admin panels, or debug information in AI training data and search results.

Why This Matters

Without Disallow rules for sensitive paths, AI crawlers can access /api/ and /admin/ endpoints. Internal endpoints and admin interfaces could appear in AI training data or be exposed in AI-powered search results, creating security and privacy risks.

How to Fix

Add Disallow rules in robots.txt for sensitive paths like /api/, /admin/, /internal/, and any other private directories that should not be crawled.

Example

User-agent: *
Allow: /
Disallow: /api/
Disallow: /admin/
Disallow: /internal/
Effort: Trivial (minutes)
robots-txtsecuritycrawler-permissions
2.24

Crawl-delay is reasonable

highPass / Fail

Excessive Crawl-delay values (over 10 seconds) dramatically slow AI indexing, meaning your latest content may take days or weeks to appear in AI search results.

Why This Matters

Excessive Crawl-delay values (over 10 seconds) dramatically slow AI indexing, meaning your latest content may take days or weeks to appear in AI search results while competitors with lower delays get indexed faster.

How to Fix

Reduce Crawl-delay to 5 seconds or less, or remove it entirely if your server can handle the crawl load. Most modern servers can handle AI crawler traffic without throttling.

Example

User-agent: *
Crawl-delay: 5
Effort: Trivial (minutes)
robots-txtperformancecrawler-permissions
2.25

<meta name="robots"> not blocking

highPass / Warn / Fail

Pages with <meta name="robots" content="noindex"> are hidden from all search engines, including AI-powered ones. Ensure your important content pages do not have this tag.

Why This Matters

Pages with <meta name="robots" content="noindex"> are hidden from all search engines, including AI-powered ones. These pages will not appear in AI search results or be referenced by AI assistants, making their content effectively invisible.

How to Fix

Remove the noindex directive from pages you want AI agents to discover. Replace with "index, follow" or remove the meta robots tag entirely.

Example

<meta name="robots" content="index, follow">
Effort: Trivial (minutes)
meta-tagsindexingcrawler-permissions
2.26

No aggressive bot-detection blocking agents

highPass / Warn / Fail

Bot-detection services like Cloudflare Turnstile, DataDome, and reCAPTCHA can block legitimate AI agents from accessing your content. Configure your service to allowlist known AI user-agents.

Why This Matters

Bot-detection services like Cloudflare Turnstile, DataDome, and reCAPTCHA can block legitimate AI agents from accessing your content. When agents are challenged, they cannot complete page fetches, making your content inaccessible to AI-powered search and assistants.

How to Fix

Configure your bot-detection service to allowlist known AI agent user-agents (GPTBot, ChatGPT-User, Claude-User, PerplexityBot) so they bypass challenges while still protecting against malicious bots.

Example

// Allowlist these AI agent user-agents in your WAF/CDN config:
// GPTBot, ChatGPT-User, OAI-SearchBot (OpenAI)
// Claude-User, Claude-SearchBot, anthropic-ai (Anthropic)
// PerplexityBot, Google-Extended, Bravebot, DuckAssistBot
Effort: Moderate (hours)
securitybot-detectioncrawler-permissions