AI Crawler Permissions
26 checks · Weight: 8% of overall score
Checks in this category
GPTBot allowed
Without an explicit robots.txt rule, GPTBot may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.
Why This Matters
Blocking GPTBot prevents your content from being used by OpenAI's models and appearing in ChatGPT responses. Explicitly allowing it signals that your site welcomes AI indexing for the largest AI platform by user base.
How to Fix
Add an explicit User-agent: GPTBot with Allow: / rule in your robots.txt file.
Example
User-agent: GPTBot
Allow: /Google-Extended allowed
Without an explicit robots.txt rule, Google-Extended may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.
Why This Matters
Blocking Google-Extended prevents your content from being used in Google's AI features like Gemini and AI Overviews. Allowing it ensures your site appears in Google's AI-powered search experiences alongside traditional results.
How to Fix
Add an explicit User-agent: Google-Extended with Allow: / rule in your robots.txt file.
Example
User-agent: Google-Extended
Allow: /anthropic-ai / ClaudeBot allowed
Without an explicit robots.txt rule, anthropic-ai / ClaudeBot may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.
Why This Matters
Blocking anthropic-ai / ClaudeBot prevents your content from being used by Anthropic's Claude models. Explicitly allowing it ensures your site is indexed for Claude's training data and knowledge base.
How to Fix
Add explicit User-agent rules for both anthropic-ai and ClaudeBot with Allow: / in your robots.txt file.
Example
User-agent: anthropic-ai
Allow: /
User-agent: ClaudeBot
Allow: /PerplexityBot allowed
Without an explicit robots.txt rule, PerplexityBot may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.
Why This Matters
Blocking PerplexityBot prevents your content from appearing in Perplexity AI search results, one of the fastest-growing AI answer engines. Allowing it gives your content visibility in AI-native search.
How to Fix
Add an explicit User-agent: PerplexityBot with Allow: / rule in your robots.txt file.
Example
User-agent: PerplexityBot
Allow: /Applebot-Extended allowed
Without an explicit robots.txt rule, Applebot-Extended may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.
Why This Matters
Blocking Applebot-Extended prevents your content from being used in Apple Intelligence features, Siri AI answers, and Safari Highlights. Allowing it ensures visibility across Apple's AI ecosystem.
How to Fix
Add an explicit User-agent: Applebot-Extended with Allow: / rule in your robots.txt file.
Example
User-agent: Applebot-Extended
Allow: /CCBot allowed
Without an explicit robots.txt rule, CCBot may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.
Why This Matters
Blocking CCBot prevents your content from being included in the Common Crawl dataset, which is a foundational training data source for many AI models. Allowing it broadens your content's reach across multiple AI systems.
How to Fix
Add an explicit User-agent: CCBot with Allow: / rule in your robots.txt file.
Example
User-agent: CCBot
Allow: /Meta-ExternalAgent allowed
Without an explicit robots.txt rule, Meta-ExternalAgent may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.
Why This Matters
Blocking Meta-ExternalAgent prevents your content from being used in Meta's AI features across Facebook, Instagram, and WhatsApp. Allowing it ensures visibility in Meta's AI-powered recommendations and summaries.
How to Fix
Add an explicit User-agent: Meta-ExternalAgent with Allow: / rule in your robots.txt file.
Example
User-agent: Meta-ExternalAgent
Allow: /Amazonbot allowed
Without an explicit robots.txt rule, Amazonbot may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.
Why This Matters
Blocking Amazonbot prevents your content from appearing in Alexa AI answers and Amazon's AI-powered search features. Allowing it gives your content visibility in Amazon's voice and commerce AI ecosystem.
How to Fix
Add an explicit User-agent: Amazonbot with Allow: / rule in your robots.txt file.
Example
User-agent: Amazonbot
Allow: /Bytespider allowed
Without an explicit robots.txt rule, Bytespider may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.
Why This Matters
Blocking Bytespider prevents your content from being used by ByteDance's AI products including TikTok search and Doubao AI. Allowing it extends your content's reach to ByteDance's large user base.
How to Fix
Add an explicit User-agent: Bytespider with Allow: / rule in your robots.txt file.
Example
User-agent: Bytespider
Allow: /cohere-ai allowed
Without an explicit robots.txt rule, cohere-ai may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.
Why This Matters
Blocking cohere-ai prevents your content from being used by Cohere's enterprise AI models, which power search and RAG applications for many businesses. Allowing it ensures your content is available in Cohere-powered AI search products.
How to Fix
Add an explicit User-agent: cohere-ai with Allow: / rule in your robots.txt file.
Example
User-agent: cohere-ai
Allow: /YouBot allowed
Without an explicit robots.txt rule, YouBot may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.
Why This Matters
Blocking YouBot prevents your content from appearing in You.com AI search results. Allowing it gives your content visibility in this AI-native search engine that generates direct answers for users.
How to Fix
Add an explicit User-agent: YouBot with Allow: / rule in your robots.txt file.
Example
User-agent: YouBot
Allow: /Diffbot allowed
Without an explicit robots.txt rule, Diffbot may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.
Why This Matters
Blocking Diffbot prevents your content from being included in Diffbot's Knowledge Graph, which powers structured data extraction for many AI applications. Allowing it ensures your content is properly indexed for AI-powered entity extraction.
How to Fix
Add an explicit User-agent: Diffbot with Allow: / rule in your robots.txt file.
Example
User-agent: Diffbot
Allow: /AI2Bot allowed
Without an explicit robots.txt rule, AI2Bot may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.
Why This Matters
Blocking AI2Bot prevents your content from being used by the Allen Institute for AI (AI2), which powers research models and semantic search tools. Allowing it contributes to open AI research and ensures broader content visibility.
How to Fix
Add an explicit User-agent: AI2Bot with Allow: / rule in your robots.txt file.
Example
User-agent: AI2Bot
Allow: /ChatGPT-User allowed
Without an explicit robots.txt rule, ChatGPT-User may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.
Why This Matters
Blocking ChatGPT-User prevents ChatGPT from browsing your site in real-time when users ask it to visit your pages. This blocks your content from being cited in ChatGPT Browse conversations, losing a significant source of AI-driven traffic.
How to Fix
Add an explicit User-agent: ChatGPT-User with Allow: / rule in your robots.txt file.
Example
User-agent: ChatGPT-User
Allow: /Claude-User allowed
Without an explicit robots.txt rule, Claude-User may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.
Why This Matters
Blocking Claude-User prevents Claude from browsing your site in real-time when users ask it to visit your pages. This blocks your content from being cited in Claude conversations with web access enabled.
How to Fix
Add an explicit User-agent: Claude-User with Allow: / rule in your robots.txt file.
Example
User-agent: Claude-User
Allow: /OAI-SearchBot allowed
Without an explicit robots.txt rule, OAI-SearchBot may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.
Why This Matters
Blocking OAI-SearchBot prevents your content from appearing in OpenAI's SearchGPT and ChatGPT web search results. Allowing it ensures your site is discoverable through OpenAI's real-time search features.
How to Fix
Add an explicit User-agent: OAI-SearchBot with Allow: / rule in your robots.txt file.
Example
User-agent: OAI-SearchBot
Allow: /Meta-ExternalFetcher allowed
Without an explicit robots.txt rule, Meta-ExternalFetcher may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.
Why This Matters
Blocking Meta-ExternalFetcher prevents Meta's AI from fetching your content in real-time for AI-powered features across Facebook, Instagram, and WhatsApp. Allowing it ensures your content can be surfaced in Meta's real-time AI experiences.
How to Fix
Add an explicit User-agent: Meta-ExternalFetcher with Allow: / rule in your robots.txt file.
Example
User-agent: Meta-ExternalFetcher
Allow: /Bravebot allowed
Without an explicit robots.txt rule, Bravebot may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.
Why This Matters
Blocking Bravebot prevents your content from appearing in Brave Search AI answers and Brave Leo AI assistant responses. Allowing it gives your content visibility in the privacy-focused Brave browser ecosystem.
How to Fix
Add an explicit User-agent: Bravebot with Allow: / rule in your robots.txt file.
Example
User-agent: Bravebot
Allow: /DuckAssistBot allowed
Without an explicit robots.txt rule, DuckAssistBot may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.
Why This Matters
Blocking DuckAssistBot prevents your content from appearing in DuckDuckGo's AI-powered DuckAssist feature, which generates instant answers from crawled web pages. Allowing it ensures visibility in this privacy-first AI search experience.
How to Fix
Add an explicit User-agent: DuckAssistBot with Allow: / rule in your robots.txt file.
Example
User-agent: DuckAssistBot
Allow: /MistralAI-User allowed
Without an explicit robots.txt rule, MistralAI-User may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.
Why This Matters
Blocking MistralAI-User prevents Mistral AI's Le Chat from browsing your site in real-time when users ask it to visit your pages. Allowing it ensures your content can be cited in Mistral-powered AI conversations.
How to Fix
Add an explicit User-agent: MistralAI-User with Allow: / rule in your robots.txt file.
Example
User-agent: MistralAI-User
Allow: /Claude-SearchBot allowed
Without an explicit robots.txt rule, Claude-SearchBot may still crawl your site but has no signal that it is welcome. Adding an explicit allow rule improves your visibility in AI-powered search and ensures consistent crawler behavior.
Why This Matters
Blocking Claude-SearchBot prevents your content from appearing in Claude's web search results. Allowing it ensures your site is included when Claude searches the web to answer user questions.
How to Fix
Add an explicit User-agent: Claude-SearchBot with Allow: / rule in your robots.txt file.
Example
User-agent: Claude-SearchBot
Allow: /No blanket AI block
A blanket Disallow: / under User-agent: * blocks every crawler, including all AI agents. Your site becomes invisible to AI search engines, ChatGPT Browse, Perplexity, and others.
Why This Matters
A blanket Disallow: / under User-agent: * blocks every crawler, including all AI agents. Your site becomes completely invisible to AI search engines, ChatGPT Browse, Perplexity, Claude, and all other AI-powered discovery tools.
How to Fix
Replace the blanket Disallow: / with targeted path blocks for sensitive areas only. Allow the root path and block only private directories like /api/ and /admin/.
Example
User-agent: *
Allow: /
Disallow: /api/
Disallow: /admin/
Disallow: /internal/Sensitive paths protected
Without robots.txt, AI crawlers can access sensitive paths like /api/ and /admin/. This may expose internal endpoints, admin panels, or debug information in AI training data and search results.
Why This Matters
Without Disallow rules for sensitive paths, AI crawlers can access /api/ and /admin/ endpoints. Internal endpoints and admin interfaces could appear in AI training data or be exposed in AI-powered search results, creating security and privacy risks.
How to Fix
Add Disallow rules in robots.txt for sensitive paths like /api/, /admin/, /internal/, and any other private directories that should not be crawled.
Example
User-agent: *
Allow: /
Disallow: /api/
Disallow: /admin/
Disallow: /internal/Crawl-delay is reasonable
Excessive Crawl-delay values (over 10 seconds) dramatically slow AI indexing, meaning your latest content may take days or weeks to appear in AI search results.
Why This Matters
Excessive Crawl-delay values (over 10 seconds) dramatically slow AI indexing, meaning your latest content may take days or weeks to appear in AI search results while competitors with lower delays get indexed faster.
How to Fix
Reduce Crawl-delay to 5 seconds or less, or remove it entirely if your server can handle the crawl load. Most modern servers can handle AI crawler traffic without throttling.
Example
User-agent: *
Crawl-delay: 5<meta name="robots"> not blocking
Pages with <meta name="robots" content="noindex"> are hidden from all search engines, including AI-powered ones. Ensure your important content pages do not have this tag.
Why This Matters
Pages with <meta name="robots" content="noindex"> are hidden from all search engines, including AI-powered ones. These pages will not appear in AI search results or be referenced by AI assistants, making their content effectively invisible.
How to Fix
Remove the noindex directive from pages you want AI agents to discover. Replace with "index, follow" or remove the meta robots tag entirely.
Example
<meta name="robots" content="index, follow">No aggressive bot-detection blocking agents
Bot-detection services like Cloudflare Turnstile, DataDome, and reCAPTCHA can block legitimate AI agents from accessing your content. Configure your service to allowlist known AI user-agents.
Why This Matters
Bot-detection services like Cloudflare Turnstile, DataDome, and reCAPTCHA can block legitimate AI agents from accessing your content. When agents are challenged, they cannot complete page fetches, making your content inaccessible to AI-powered search and assistants.
How to Fix
Configure your bot-detection service to allowlist known AI agent user-agents (GPTBot, ChatGPT-User, Claude-User, PerplexityBot) so they bypass challenges while still protecting against malicious bots.
Example
// Allowlist these AI agent user-agents in your WAF/CDN config:
// GPTBot, ChatGPT-User, OAI-SearchBot (OpenAI)
// Claude-User, Claude-SearchBot, anthropic-ai (Anthropic)
// PerplexityBot, Google-Extended, Bravebot, DuckAssistBot