Specification/Content Discoverability

Content Discoverability

23 checks · Weight: 15% of overall score

3 critical5 high15 medium

1.1

llms.txt exists

criticalPass / Warn / Fail

llms.txt is the primary way AI agents discover your site content. Without it, LLMs must crawl your entire site to understand what you offer. Create this file at your site root.

Why This Matters

llms.txt is the primary entry point for AI agents discovering your site. Without it, LLMs like ChatGPT, Perplexity, and Claude must crawl your entire site blindly, often missing key pages and providing incomplete or inaccurate answers about your business.

How to Fix

Create a /llms.txt file at your site root in markdown format. Include an H1 heading with your site name, a blockquote summary, and organized sections with links to your key pages.

Example

# Your Site Name

> Brief description of your site for AI agents.

## Pages
- [Home](/): Main landing page
- [About](/about/): Company information

## Resources
- [Sitemap](/sitemap.xml): Full URL list
- [RSS](/rss.xml): Content feed

Effort: Easy (< 1 hour)Documentation →

llms-txtdiscoverability

1.2

llms.txt has blockquote summary

mediumPass / Fail

The blockquote summary gives AI agents a one-sentence overview of your site without reading further.

Why This Matters

Without a blockquote summary, AI agents have no quick way to understand what your site offers. They must parse the entire llms.txt file before deciding if your content is relevant, leading to slower and less accurate AI responses about your business.

How to Fix

Add a blockquote line (starting with >) immediately after the H1 heading in your llms.txt file. Write a concise 1-2 sentence summary of what your site provides.

Example

# Your Site Name

> Your site provides X for Y. It covers topics including A, B, and C.

Effort: Trivial (minutes)Documentation →

llms-txtdiscoverability

1.3

llms.txt has H2 sections

mediumPass / Fail

H2 sections help AI agents navigate your llms.txt by topic. Without them, agents must scan the entire file linearly.

Why This Matters

Without H2 sections, AI agents must scan your entire llms.txt linearly to find relevant content. Sections let agents jump directly to the topic they need, producing faster and more accurate responses.

How to Fix

Organize your llms.txt links under H2 headings (## Section Name) that group related pages. Use intuitive section names like ## Documentation, ## API, ## Blog, ## Company.

Example

## Documentation
- [Getting Started](/docs/start): Quick start guide
- [API Reference](/docs/api): Full API documentation

## Company
- [About](/about): Company information
- [Blog](/blog): Latest updates

Effort: Trivial (minutes)Documentation →

llms-txtdiscoverability

1.4

llms.txt links include descriptions

mediumPass / Warn / Fail

Link descriptions help AI agents understand what each page covers without visiting it, reducing unnecessary crawling.

Why This Matters

Links without descriptions force AI agents to visit every page to understand its content, wasting crawl budget and slowing down response generation. Described links let agents filter relevant pages instantly.

How to Fix

Add a colon and brief description after each link URL in your llms.txt file. Describe what the page covers in a few words so agents can decide which pages to visit.

Example

- [Getting Started](/docs/start): Step-by-step guide for new users
- [API Reference](/docs/api): Complete endpoint documentation
- [Pricing](/pricing): Plans and pricing information

Effort: Trivial (minutes)Documentation →

llms-txtdiscoverability

1.5

llms.txt links are valid

highPass / Warn / Fail

Valid links in llms.txt ensure AI agents can navigate to your content without encountering dead ends.

Why This Matters

Broken links in llms.txt send AI agents to dead ends, wasting their context window and degrading the quality of answers about your site. Users asking AI about your products or services will get error messages instead of useful information.

How to Fix

Verify all links in your llms.txt resolve to HTTP 200. Remove links to deleted pages and update any URLs that have changed. Run this check after every site deployment.

Example

- [Page Name](/correct-path): Description of the page content

Effort: Easy (< 1 hour)Documentation →

llms-txtbroken-linksdiscoverability

1.6

llms-full.txt present

highPass / Fail

llms-full.txt provides the complete content of your site in a single file, allowing AI agents to ingest everything in one request instead of crawling page by page.

Why This Matters

Without llms-full.txt, AI agents must crawl your site page by page, which is slow and often incomplete. This means AI assistants give shallow or outdated answers about your products and services.

How to Fix

Create a /llms-full.txt file at your site root containing the full text content of all important pages in markdown format. Include headings, descriptions, and key details for each page.

Example

# Your Site Name

> Full content version for AI agents.

## Home
Your homepage content here...

## About
Your about page content here...

## Documentation
Your documentation content here...

Effort: Moderate (hours)Documentation →

llms-txtdiscoverability

1.7

sitemap.xml exists

criticalPass / Fail

AI crawlers use your sitemap to discover all pages without following links. Without it, pages may never be indexed by AI search engines.

Why This Matters

Without a sitemap, AI crawlers must discover your pages solely through link-following, which is slow and incomplete. Pages deep in your site hierarchy may never be found, meaning AI search engines like Perplexity and ChatGPT Browse cannot surface your full content.

How to Fix

Create a sitemap.xml at your site root containing all important pages. Use a <urlset> with <url> entries for each page, including <loc> and <lastmod>. Most frameworks (Next.js, WordPress, etc.) can auto-generate sitemaps.

Example

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://yoursite.com/</loc>
    <lastmod>2026-01-01</lastmod>
    <priority>1.0</priority>
  </url>
</urlset>

Effort: Easy (< 1 hour)Documentation →

sitemapseodiscoverability

1.8

Sitemap includes all key pages

mediumPass / Warn / Fail

The sitemap should include all scanned pages so AI crawlers can discover your full site content.

Why This Matters

Pages missing from your sitemap may not be discovered by AI crawlers, even if they exist on your site. This means important content -- product pages, documentation, blog posts -- could be absent from AI search results.

How to Fix

Ensure your sitemap.xml includes all important pages. Compare your sitemap against your actual site pages and add any missing URLs. Configure your CMS or build tool to auto-include new pages in the sitemap.

Example

<url>
  <loc>https://yoursite.com/missing-page</loc>
  <lastmod>2026-01-01</lastmod>
</url>

Effort: Easy (< 1 hour)Documentation →

sitemapseodiscoverability

1.9

Sitemap uses absolute URLs

highPass / Fail

Sitemap URLs must be absolute (starting with https://) so AI crawlers can resolve them without ambiguity.

Why This Matters

Relative URLs in your sitemap cannot be resolved by AI crawlers, causing them to silently skip those pages. Any page listed with a relative URL is effectively invisible to AI search engines.

How to Fix

Ensure every <loc> value in your sitemap.xml starts with the full protocol and domain (e.g., https://yoursite.com/page). Update your sitemap generator configuration to output absolute URLs.

Example

<!-- Correct: absolute URL -->
<url>
  <loc>https://yoursite.com/page</loc>
</url>

<!-- Wrong: relative URL -->
<url>
  <loc>/page</loc>
</url>

Effort: Trivial (minutes)Documentation →

sitemapseodiscoverability

1.10

Sitemap has lastmod dates

mediumPass / Warn / Fail

AI crawlers use <lastmod> to decide which pages to re-index and which to skip. Without these dates, crawlers must re-fetch every page on every visit.

Why This Matters

Without <lastmod> dates, AI crawlers must re-fetch every page on every visit because they cannot tell which pages have changed. This wastes crawl budget and delays indexing of your freshest content.

How to Fix

Add accurate <lastmod> dates to every <url> entry in your sitemap.xml. Update the date whenever the page content actually changes. Use ISO 8601 format (YYYY-MM-DD or full datetime).

Example

<url>
  <loc>https://yoursite.com/page</loc>
  <lastmod>2026-01-15</lastmod>
</url>

Effort: Easy (< 1 hour)Documentation →

sitemapseodiscoverability

1.11

RSS/Atom feed link present

mediumPass / Fail

RSS/Atom feeds let AI agents track new and updated content without re-crawling your entire site.

Why This Matters

Without an RSS/Atom feed, AI agents have no efficient way to track new and updated content on your site. They must re-crawl your entire site to find changes, which means your latest posts and pages may take much longer to appear in AI search results.

How to Fix

Create an RSS or Atom feed and link to it in your HTML <head> with a <link rel="alternate"> tag. Most frameworks and CMS platforms can auto-generate feeds. Place the feed at a well-known path like /rss.xml or /feed.xml.

Example

<!-- Add to your HTML <head> -->
<link rel="alternate" type="application/rss+xml" title="Your Site Feed" href="/rss.xml" />

<!-- Example /rss.xml -->
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <title>Your Site</title>
    <link>https://yoursite.com</link>
    <description>Site description</description>
    <item>
      <title>Article Title</title>
      <link>https://yoursite.com/article</link>
      <description>Article summary...</description>
    </item>
  </channel>
</rss>

Effort: Moderate (hours)

rsscontent-feeddiscoverability

1.12

RSS feed content complete

mediumPass / Warn / Fail

Full-content feeds allow AI agents to index your articles without visiting each page, reducing crawl load and improving content quality in AI responses.

Why This Matters

Truncated RSS feed items force AI agents to visit each page individually, increasing crawl time and often resulting in incomplete indexing. Full-content feeds let agents ingest all your articles in a single request, producing richer AI-generated answers.

How to Fix

Include full article content in each feed item using <content:encoded> (RSS) or <content> (Atom). Aim for more than 500 characters per item. Most CMS platforms have a setting to switch from excerpt to full-content feeds.

Example

<item>
  <title>Article Title</title>
  <link>https://yoursite.com/article</link>
  <content:encoded><![CDATA[
    <p>Full article content goes here. Include all paragraphs,
    headings, and relevant details.</p>
  ]]></content:encoded>
</item>

Effort: Easy (< 1 hour)

rsscontent-feeddiscoverability

1.13

No noindex on homepage

criticalPass / Fail

A noindex directive on your homepage prevents AI crawlers and search engines from indexing your most important page.

Why This Matters

A noindex directive on your homepage completely blocks AI crawlers and search engines from indexing your most important page. Your site becomes invisible in AI search results, causing total loss of organic AI-driven traffic.

How to Fix

Remove the noindex directive from your homepage by updating the meta robots tag to "index, follow" and removing any X-Robots-Tag: noindex header from your server configuration.

Example

<!-- Remove noindex from homepage -->
<meta name="robots" content="index, follow" />

Effort: Trivial (minutes)Documentation →

robotsseodiscoverability

1.14

No nofollow on important links

highPass / Warn / Fail

A site-wide nofollow directive prevents AI crawlers from following links to discover your content. Important internal links should be followable.

Why This Matters

A nofollow directive prevents AI crawlers from following links on your pages, effectively hiding all linked content from AI indexing. Your deeper pages become invisible to AI search engines, drastically reducing discoverability.

How to Fix

Remove nofollow from your meta robots tag and X-Robots-Tag header on important pages. Use "index, follow" to allow full crawling. Reserve nofollow only for untrusted external links.

Example

<!-- Allow crawlers to follow links -->
<meta name="robots" content="index, follow" />

Effort: Trivial (minutes)Documentation →

robotsseodiscoverability

1.15

Internal linking structure

mediumPass / Warn / Fail

A strong internal linking structure helps AI crawlers discover and understand the relationships between your pages.

Why This Matters

Without internal links, AI crawlers cannot discover related pages or understand how your content is organized. This limits the depth of your site that gets indexed and weakens topical authority in AI search results.

How to Fix

Add contextual internal links between related pages. Use descriptive anchor text that tells AI crawlers what the linked page is about. Aim for at least 3-5 internal links per page.

Example

<a href="/related-page">Learn more about related topic</a>

Effort: Easy (< 1 hour)

internal-linksseodiscoverability

1.16

No redirect chains

mediumPass / Warn / Fail

Redirect chains waste AI crawler budget and slow down content discovery. Each page should resolve in a single redirect at most.

Why This Matters

Redirect chains slow down AI crawlers and waste their limited crawl budget. Each extra redirect adds latency and increases the chance a crawler gives up before reaching the final page, leaving content unindexed.

How to Fix

Update all internal links and sitemap entries to point directly to the final destination URL. Eliminate intermediate redirects by configuring your server to redirect directly from the old URL to the final URL in a single hop.

Example

<!-- Update links to use final URLs directly -->
<a href="https://yoursite.com/final-page">Page</a>

<!-- In sitemap.xml, use the final URL -->
<url>
  <loc>https://yoursite.com/final-page</loc>
</url>

Effort: Easy (< 1 hour)

redirectsperformancediscoverability

1.17

Canonical links present

mediumPass / Warn / Fail

Canonical link tags tell AI crawlers which URL is the authoritative version of a page, preventing duplicate content issues.

Why This Matters

Without canonical tags, AI crawlers may index duplicate versions of your pages (www vs non-www, HTTP vs HTTPS, trailing slash variants), diluting your content authority and causing inconsistent answers in AI search results.

How to Fix

Add a <link rel="canonical"> tag to the <head> of every page pointing to the preferred URL. Ensure canonical URLs are absolute and consistent across all page variants.

Example

<link rel="canonical" href="https://yoursite.com/page" />

Effort: Easy (< 1 hour)Documentation →

canonicalseodiscoverability

1.18

Mobile friendly

mediumPass / Warn / Fail

A viewport meta tag signals mobile-friendliness. AI crawlers may prioritize mobile-friendly content, and many AI-powered searches originate from mobile devices.

Why This Matters

AI crawlers and search engines may deprioritize pages without a viewport meta tag. Since many AI-powered searches originate from mobile devices, non-mobile-friendly pages rank lower and deliver a poor experience.

How to Fix

Add a viewport meta tag to the <head> of every page. This is typically a one-line addition to your HTML template or layout component.

Example

<meta name="viewport" content="width=device-width, initial-scale=1" />

Effort: Trivial (minutes)Documentation →

mobileseodiscoverability

1.19

Fast page load

mediumPass / Warn / Fail

AI crawlers have limited time budgets. Fast Time-to-First-Byte (TTFB) ensures crawlers can fetch more of your pages within their allotted time.

Why This Matters

AI crawlers operate on strict time budgets. Slow Time-to-First-Byte (TTFB) means crawlers index fewer of your pages per session, leaving content undiscovered and reducing your visibility in AI-powered search results.

How to Fix

Optimize server response times by enabling caching (Redis, Varnish), deploying a CDN (Cloudflare, Fastly), reducing server-side computation, and enabling HTTP/2 or HTTP/3. Target TTFB under 800ms for all pages.

Example

# Potential optimizations:
# - Enable server-side caching (Redis, Varnish)
# - Use a CDN (Cloudflare, Fastly, Vercel Edge)
# - Optimize database queries and indexes
# - Enable HTTP/2 or HTTP/3
# - Use static generation where possible

Effort: Moderate (hours)

performancecrawl-budgetdiscoverability

1.20

No broken internal links

highPass / Warn / Fail

Broken internal links create dead ends for AI crawlers and waste their limited crawl budget.

Why This Matters

Broken internal links waste AI crawlers' limited crawl budget by sending them to dead ends. This means fewer of your pages get indexed, and users asking AI about your site may encounter errors or missing information.

How to Fix

Audit all internal links and fix or remove any that return non-200 status codes. Update href values to point to the correct URLs, and set up redirects for pages that have moved.

Example

<!-- Fix broken links by updating the href -->
<a href="/correct-path">Page title</a>

<!-- Or set up a redirect for moved pages -->
<!-- In next.config.js -->
redirects: [{ source: "/old-path", destination: "/new-path", permanent: true }]

Effort: Easy (< 1 hour)

broken-linkscrawl-budgetdiscoverability

1.21

navigation.json present

mediumPass / Fail

A navigation.json file gives AI agents a machine-readable map of your site hierarchy, helping them navigate your site like a human would.

Why This Matters

Without a machine-readable navigation structure, AI agents must infer your site hierarchy from HTML parsing, which is error-prone and incomplete. A navigation.json gives agents a clear map of your site, enabling accurate multi-step browsing.

How to Fix

Create a /navigation.json file at your site root with a JSON structure representing your site menu hierarchy. Include labels, URLs, and nested children for submenus.

Example

{
  "name": "Your Site",
  "items": [
    { "label": "Home", "url": "/" },
    { "label": "Products", "url": "/products", "children": [
      { "label": "Product A", "url": "/products/a" },
      { "label": "Product B", "url": "/products/b" }
    ]},
    { "label": "About", "url": "/about" }
  ]
}

Effort: Easy (< 1 hour)

navigationstructured-datadiscoverability

1.22

No orphan pages

mediumPass / Warn / Fail

Orphan pages are not listed in your sitemap or llms.txt, so AI crawlers may never discover them.

Why This Matters

Orphan pages are not referenced in your sitemap or llms.txt, so AI crawlers may never discover them. This means potentially valuable content -- product pages, blog posts, documentation -- remains invisible to AI-powered search.

How to Fix

Add all important pages to your sitemap.xml and/or reference them in your llms.txt file. Run a crawl comparison periodically to catch pages that fall out of both indexes.

Example

<!-- Add to sitemap.xml -->
<url>
  <loc>https://yoursite.com/orphan-page</loc>
  <lastmod>2026-01-01</lastmod>
</url>

# Or add to llms.txt
- [Orphan Page](/orphan-page): Description of the page

Effort: Easy (< 1 hour)

orphan-pagessitemapdiscoverability

1.23

Critical commerce links

mediumPass / Warn / Fail

AI agents must provide transparency on shipping and returns to help users make buying decisions. Finding and verifying these links is essential for "Instant Checkout" readiness.

Why This Matters

AI shopping agents cannot complete purchase recommendations without verifiable return, shipping, and seller information. Missing these links disqualifies your store from "Instant Checkout" flows and erodes buyer trust.

How to Fix

Add clearly labeled links to your return policy, shipping information, and seller/contact pages in your site footer or navigation. Use descriptive anchor text like "Return Policy", "Shipping & Delivery", and "About Us".

Example

<footer>
  <a href="/returns">Return Policy</a>
  <a href="/shipping">Shipping & Delivery</a>
  <a href="/about">About Us</a>
</footer>

Effort: Easy (< 1 hour)

commercetrustdiscoverability

AI Crawler Permissions→

Content Discoverability

Checks in this category

llms.txt exists

Why This Matters

How to Fix

Example

llms.txt has blockquote summary

Why This Matters

How to Fix

Example

llms.txt has H2 sections

Why This Matters

How to Fix

Example

llms.txt links include descriptions

Why This Matters

How to Fix

Example

llms.txt links are valid

Why This Matters

How to Fix

Example

llms-full.txt present

Why This Matters

How to Fix

Example

sitemap.xml exists

Why This Matters

How to Fix

Example

Sitemap includes all key pages

Why This Matters

How to Fix

Example

Sitemap uses absolute URLs

Why This Matters

How to Fix

Example

Sitemap has lastmod dates

Why This Matters

How to Fix

Example

RSS/Atom feed link present

Why This Matters

How to Fix

Example

RSS feed content complete

Why This Matters

How to Fix

Example

No noindex on homepage

Why This Matters

How to Fix

Example

No nofollow on important links

Why This Matters

How to Fix

Example

Internal linking structure

Why This Matters

How to Fix

Example

No redirect chains

Why This Matters

How to Fix

Example

Canonical links present

Why This Matters

How to Fix

Example

Mobile friendly

Why This Matters

How to Fix

Example

Fast page load

Why This Matters

How to Fix

Example

No broken internal links

Why This Matters