Content Discoverability
23 checks · Weight: 15% of overall score
Checks in this category
llms.txt exists
llms.txt is the primary way AI agents discover your site content. Without it, LLMs must crawl your entire site to understand what you offer. Create this file at your site root.
Why This Matters
llms.txt is the primary entry point for AI agents discovering your site. Without it, LLMs like ChatGPT, Perplexity, and Claude must crawl your entire site blindly, often missing key pages and providing incomplete or inaccurate answers about your business.
How to Fix
Create a /llms.txt file at your site root in markdown format. Include an H1 heading with your site name, a blockquote summary, and organized sections with links to your key pages.
Example
# Your Site Name
> Brief description of your site for AI agents.
## Pages
- [Home](/): Main landing page
- [About](/about/): Company information
## Resources
- [Sitemap](/sitemap.xml): Full URL list
- [RSS](/rss.xml): Content feedllms.txt has blockquote summary
The blockquote summary gives AI agents a one-sentence overview of your site without reading further.
Why This Matters
Without a blockquote summary, AI agents have no quick way to understand what your site offers. They must parse the entire llms.txt file before deciding if your content is relevant, leading to slower and less accurate AI responses about your business.
How to Fix
Add a blockquote line (starting with >) immediately after the H1 heading in your llms.txt file. Write a concise 1-2 sentence summary of what your site provides.
Example
# Your Site Name
> Your site provides X for Y. It covers topics including A, B, and C.llms.txt has H2 sections
H2 sections help AI agents navigate your llms.txt by topic. Without them, agents must scan the entire file linearly.
Why This Matters
Without H2 sections, AI agents must scan your entire llms.txt linearly to find relevant content. Sections let agents jump directly to the topic they need, producing faster and more accurate responses.
How to Fix
Organize your llms.txt links under H2 headings (## Section Name) that group related pages. Use intuitive section names like ## Documentation, ## API, ## Blog, ## Company.
Example
## Documentation
- [Getting Started](/docs/start): Quick start guide
- [API Reference](/docs/api): Full API documentation
## Company
- [About](/about): Company information
- [Blog](/blog): Latest updatesllms.txt links include descriptions
Link descriptions help AI agents understand what each page covers without visiting it, reducing unnecessary crawling.
Why This Matters
Links without descriptions force AI agents to visit every page to understand its content, wasting crawl budget and slowing down response generation. Described links let agents filter relevant pages instantly.
How to Fix
Add a colon and brief description after each link URL in your llms.txt file. Describe what the page covers in a few words so agents can decide which pages to visit.
Example
- [Getting Started](/docs/start): Step-by-step guide for new users
- [API Reference](/docs/api): Complete endpoint documentation
- [Pricing](/pricing): Plans and pricing informationllms.txt links are valid
Valid links in llms.txt ensure AI agents can navigate to your content without encountering dead ends.
Why This Matters
Broken links in llms.txt send AI agents to dead ends, wasting their context window and degrading the quality of answers about your site. Users asking AI about your products or services will get error messages instead of useful information.
How to Fix
Verify all links in your llms.txt resolve to HTTP 200. Remove links to deleted pages and update any URLs that have changed. Run this check after every site deployment.
Example
- [Page Name](/correct-path): Description of the page contentllms-full.txt present
llms-full.txt provides the complete content of your site in a single file, allowing AI agents to ingest everything in one request instead of crawling page by page.
Why This Matters
Without llms-full.txt, AI agents must crawl your site page by page, which is slow and often incomplete. This means AI assistants give shallow or outdated answers about your products and services.
How to Fix
Create a /llms-full.txt file at your site root containing the full text content of all important pages in markdown format. Include headings, descriptions, and key details for each page.
Example
# Your Site Name
> Full content version for AI agents.
## Home
Your homepage content here...
## About
Your about page content here...
## Documentation
Your documentation content here...sitemap.xml exists
AI crawlers use your sitemap to discover all pages without following links. Without it, pages may never be indexed by AI search engines.
Why This Matters
Without a sitemap, AI crawlers must discover your pages solely through link-following, which is slow and incomplete. Pages deep in your site hierarchy may never be found, meaning AI search engines like Perplexity and ChatGPT Browse cannot surface your full content.
How to Fix
Create a sitemap.xml at your site root containing all important pages. Use a <urlset> with <url> entries for each page, including <loc> and <lastmod>. Most frameworks (Next.js, WordPress, etc.) can auto-generate sitemaps.
Example
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://yoursite.com/</loc>
<lastmod>2026-01-01</lastmod>
<priority>1.0</priority>
</url>
</urlset>Sitemap includes all key pages
The sitemap should include all scanned pages so AI crawlers can discover your full site content.
Why This Matters
Pages missing from your sitemap may not be discovered by AI crawlers, even if they exist on your site. This means important content -- product pages, documentation, blog posts -- could be absent from AI search results.
How to Fix
Ensure your sitemap.xml includes all important pages. Compare your sitemap against your actual site pages and add any missing URLs. Configure your CMS or build tool to auto-include new pages in the sitemap.
Example
<url>
<loc>https://yoursite.com/missing-page</loc>
<lastmod>2026-01-01</lastmod>
</url>Sitemap uses absolute URLs
Sitemap URLs must be absolute (starting with https://) so AI crawlers can resolve them without ambiguity.
Why This Matters
Relative URLs in your sitemap cannot be resolved by AI crawlers, causing them to silently skip those pages. Any page listed with a relative URL is effectively invisible to AI search engines.
How to Fix
Ensure every <loc> value in your sitemap.xml starts with the full protocol and domain (e.g., https://yoursite.com/page). Update your sitemap generator configuration to output absolute URLs.
Example
<!-- Correct: absolute URL -->
<url>
<loc>https://yoursite.com/page</loc>
</url>
<!-- Wrong: relative URL -->
<url>
<loc>/page</loc>
</url>Sitemap has lastmod dates
AI crawlers use <lastmod> to decide which pages to re-index and which to skip. Without these dates, crawlers must re-fetch every page on every visit.
Why This Matters
Without <lastmod> dates, AI crawlers must re-fetch every page on every visit because they cannot tell which pages have changed. This wastes crawl budget and delays indexing of your freshest content.
How to Fix
Add accurate <lastmod> dates to every <url> entry in your sitemap.xml. Update the date whenever the page content actually changes. Use ISO 8601 format (YYYY-MM-DD or full datetime).
Example
<url>
<loc>https://yoursite.com/page</loc>
<lastmod>2026-01-15</lastmod>
</url>RSS/Atom feed link present
RSS/Atom feeds let AI agents track new and updated content without re-crawling your entire site.
Why This Matters
Without an RSS/Atom feed, AI agents have no efficient way to track new and updated content on your site. They must re-crawl your entire site to find changes, which means your latest posts and pages may take much longer to appear in AI search results.
How to Fix
Create an RSS or Atom feed and link to it in your HTML <head> with a <link rel="alternate"> tag. Most frameworks and CMS platforms can auto-generate feeds. Place the feed at a well-known path like /rss.xml or /feed.xml.
Example
<!-- Add to your HTML <head> -->
<link rel="alternate" type="application/rss+xml" title="Your Site Feed" href="/rss.xml" />
<!-- Example /rss.xml -->
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
<channel>
<title>Your Site</title>
<link>https://yoursite.com</link>
<description>Site description</description>
<item>
<title>Article Title</title>
<link>https://yoursite.com/article</link>
<description>Article summary...</description>
</item>
</channel>
</rss>RSS feed content complete
Full-content feeds allow AI agents to index your articles without visiting each page, reducing crawl load and improving content quality in AI responses.
Why This Matters
Truncated RSS feed items force AI agents to visit each page individually, increasing crawl time and often resulting in incomplete indexing. Full-content feeds let agents ingest all your articles in a single request, producing richer AI-generated answers.
How to Fix
Include full article content in each feed item using <content:encoded> (RSS) or <content> (Atom). Aim for more than 500 characters per item. Most CMS platforms have a setting to switch from excerpt to full-content feeds.
Example
<item>
<title>Article Title</title>
<link>https://yoursite.com/article</link>
<content:encoded><![CDATA[
<p>Full article content goes here. Include all paragraphs,
headings, and relevant details.</p>
]]></content:encoded>
</item>No noindex on homepage
A noindex directive on your homepage prevents AI crawlers and search engines from indexing your most important page.
Why This Matters
A noindex directive on your homepage completely blocks AI crawlers and search engines from indexing your most important page. Your site becomes invisible in AI search results, causing total loss of organic AI-driven traffic.
How to Fix
Remove the noindex directive from your homepage by updating the meta robots tag to "index, follow" and removing any X-Robots-Tag: noindex header from your server configuration.
Example
<!-- Remove noindex from homepage -->
<meta name="robots" content="index, follow" />No nofollow on important links
A site-wide nofollow directive prevents AI crawlers from following links to discover your content. Important internal links should be followable.
Why This Matters
A nofollow directive prevents AI crawlers from following links on your pages, effectively hiding all linked content from AI indexing. Your deeper pages become invisible to AI search engines, drastically reducing discoverability.
How to Fix
Remove nofollow from your meta robots tag and X-Robots-Tag header on important pages. Use "index, follow" to allow full crawling. Reserve nofollow only for untrusted external links.
Example
<!-- Allow crawlers to follow links -->
<meta name="robots" content="index, follow" />Internal linking structure
A strong internal linking structure helps AI crawlers discover and understand the relationships between your pages.
Why This Matters
Without internal links, AI crawlers cannot discover related pages or understand how your content is organized. This limits the depth of your site that gets indexed and weakens topical authority in AI search results.
How to Fix
Add contextual internal links between related pages. Use descriptive anchor text that tells AI crawlers what the linked page is about. Aim for at least 3-5 internal links per page.
Example
<a href="/related-page">Learn more about related topic</a>No redirect chains
Redirect chains waste AI crawler budget and slow down content discovery. Each page should resolve in a single redirect at most.
Why This Matters
Redirect chains slow down AI crawlers and waste their limited crawl budget. Each extra redirect adds latency and increases the chance a crawler gives up before reaching the final page, leaving content unindexed.
How to Fix
Update all internal links and sitemap entries to point directly to the final destination URL. Eliminate intermediate redirects by configuring your server to redirect directly from the old URL to the final URL in a single hop.
Example
<!-- Update links to use final URLs directly -->
<a href="https://yoursite.com/final-page">Page</a>
<!-- In sitemap.xml, use the final URL -->
<url>
<loc>https://yoursite.com/final-page</loc>
</url>Canonical links present
Canonical link tags tell AI crawlers which URL is the authoritative version of a page, preventing duplicate content issues.
Why This Matters
Without canonical tags, AI crawlers may index duplicate versions of your pages (www vs non-www, HTTP vs HTTPS, trailing slash variants), diluting your content authority and causing inconsistent answers in AI search results.
How to Fix
Add a <link rel="canonical"> tag to the <head> of every page pointing to the preferred URL. Ensure canonical URLs are absolute and consistent across all page variants.
Example
<link rel="canonical" href="https://yoursite.com/page" />Mobile friendly
A viewport meta tag signals mobile-friendliness. AI crawlers may prioritize mobile-friendly content, and many AI-powered searches originate from mobile devices.
Why This Matters
AI crawlers and search engines may deprioritize pages without a viewport meta tag. Since many AI-powered searches originate from mobile devices, non-mobile-friendly pages rank lower and deliver a poor experience.
How to Fix
Add a viewport meta tag to the <head> of every page. This is typically a one-line addition to your HTML template or layout component.
Example
<meta name="viewport" content="width=device-width, initial-scale=1" />Fast page load
AI crawlers have limited time budgets. Fast Time-to-First-Byte (TTFB) ensures crawlers can fetch more of your pages within their allotted time.
Why This Matters
AI crawlers operate on strict time budgets. Slow Time-to-First-Byte (TTFB) means crawlers index fewer of your pages per session, leaving content undiscovered and reducing your visibility in AI-powered search results.
How to Fix
Optimize server response times by enabling caching (Redis, Varnish), deploying a CDN (Cloudflare, Fastly), reducing server-side computation, and enabling HTTP/2 or HTTP/3. Target TTFB under 800ms for all pages.
Example
# Potential optimizations:
# - Enable server-side caching (Redis, Varnish)
# - Use a CDN (Cloudflare, Fastly, Vercel Edge)
# - Optimize database queries and indexes
# - Enable HTTP/2 or HTTP/3
# - Use static generation where possibleNo broken internal links
Broken internal links create dead ends for AI crawlers and waste their limited crawl budget.
Why This Matters
Broken internal links waste AI crawlers' limited crawl budget by sending them to dead ends. This means fewer of your pages get indexed, and users asking AI about your site may encounter errors or missing information.
How to Fix
Audit all internal links and fix or remove any that return non-200 status codes. Update href values to point to the correct URLs, and set up redirects for pages that have moved.
Example
<!-- Fix broken links by updating the href -->
<a href="/correct-path">Page title</a>
<!-- Or set up a redirect for moved pages -->
<!-- In next.config.js -->
redirects: [{ source: "/old-path", destination: "/new-path", permanent: true }]navigation.json present
A navigation.json file gives AI agents a machine-readable map of your site hierarchy, helping them navigate your site like a human would.
Why This Matters
Without a machine-readable navigation structure, AI agents must infer your site hierarchy from HTML parsing, which is error-prone and incomplete. A navigation.json gives agents a clear map of your site, enabling accurate multi-step browsing.
How to Fix
Create a /navigation.json file at your site root with a JSON structure representing your site menu hierarchy. Include labels, URLs, and nested children for submenus.
Example
{
"name": "Your Site",
"items": [
{ "label": "Home", "url": "/" },
{ "label": "Products", "url": "/products", "children": [
{ "label": "Product A", "url": "/products/a" },
{ "label": "Product B", "url": "/products/b" }
]},
{ "label": "About", "url": "/about" }
]
}No orphan pages
Orphan pages are not listed in your sitemap or llms.txt, so AI crawlers may never discover them.
Why This Matters
Orphan pages are not referenced in your sitemap or llms.txt, so AI crawlers may never discover them. This means potentially valuable content -- product pages, blog posts, documentation -- remains invisible to AI-powered search.
How to Fix
Add all important pages to your sitemap.xml and/or reference them in your llms.txt file. Run a crawl comparison periodically to catch pages that fall out of both indexes.
Example
<!-- Add to sitemap.xml -->
<url>
<loc>https://yoursite.com/orphan-page</loc>
<lastmod>2026-01-01</lastmod>
</url>
# Or add to llms.txt
- [Orphan Page](/orphan-page): Description of the pageCritical commerce links
AI agents must provide transparency on shipping and returns to help users make buying decisions. Finding and verifying these links is essential for "Instant Checkout" readiness.
Why This Matters
AI shopping agents cannot complete purchase recommendations without verifiable return, shipping, and seller information. Missing these links disqualifies your store from "Instant Checkout" flows and erodes buyer trust.
How to Fix
Add clearly labeled links to your return policy, shipping information, and seller/contact pages in your site footer or navigation. Use descriptive anchor text like "Return Policy", "Shipping & Delivery", and "About Us".
Example
<footer>
<a href="/returns">Return Policy</a>
<a href="/shipping">Shipping & Delivery</a>
<a href="/about">About Us</a>
</footer>