Skip to main content

How to run a GEO audit so AI search engines can actually find your content

Illustration of a checklist representing a GEO audit process in a vibrant, digital style.
Owen Steer 15 min read

How do I make my website technically ready for AI search engines?

To ensure your site is AI-ready, conduct a GEO audit that checks access for crawlers, JavaScript rendering, structured data quality, content independence, and technical foundations for AI search. Focus on these areas to enhance your visibility across AI platforms.

You make your website technically ready for AI search engines by running a GEO audit that covers five areas: whether AI crawlers can actually access your pages, whether your content renders without JavaScript, whether your structured data is attribute-rich (not just present), whether each section of content can stand alone if extracted, and whether the full technical foundation passes a checklist built for AI search, not just Google.

Most of this overlaps with good SEO hygiene. But the differences are where companies get caught out. A site that ranks well on Google can be completely invisible to ChatGPT, Claude, and Perplexity. Different crawlers, different rendering capabilities, different rules. If you’ve been building your AI search optimisation strategy around onsite content and offsite engagement , the technical infrastructure underneath is what makes both layers actually work. Without it, you’re publishing content that AI engines can’t read and building brand presence that AI engines can’t connect back to your site.

I’m Owen Steer , and at Fifty Five and Five I run these audits for clients building their AI search presence. This piece walks through what I check, what I find, and the specific technical issues that make the biggest difference to whether AI engines can find and cite your content.

AI indexing is not Google indexing: what actually changed

AI indexing works fundamentally differently from Google indexing, and that difference is why sites that rank well on Google might not exist as far as AI search engines are concerned. Google crawls your pages, renders JavaScript, follows links, and builds a searchable index. AI engines skip most of that.

The first thing to understand is that AI crawlers serve two distinct purposes. Training crawlers (GPTBot, ClaudeBot, Google-Extended) harvest content to build and update the model’s knowledge base. Retrieval crawlers (ChatGPT-User, Claude-SearchBot, PerplexityBot) fetch content in real time when a user asks a question. Training builds what the model knows. Retrieval is what happens when the model needs to cite a source right now. These are separate systems with separate access controls, which means your content can exist in the model’s training data without being accessible for real-time citation (or vice versa).

AI crawl volume is growing fast. JetOctopus data shows AI bot activity now sits at roughly 40-50% of Googlebot-level activity across the web (JetOctopus ). But the traffic return is asymmetric. Anthropic’s crawlers generate approximately 38,000 crawl requests for every single referral back to your site. OpenAI’s ratio is roughly 400:1 (Am I Cited ). That’s a lot of crawling for very little direct traffic. The value isn’t in click-throughs. It’s in citations.

Here’s the thing: Google has a massive technical advantage over every other AI engine when it comes to understanding your content. Google renders JavaScript through its Web Rendering Service. It processes dynamic content. It follows internal links methodically. AI engines don’t do any of that (with the partial exception of Gemini, which benefits from Google’s existing rendered index). They read the raw HTML they receive and move on.

This is why a GEO audit is different from a traditional SEO audit. A traditional audit asks: “Can Google find, render, and index this content?” A GEO audit asks: “Can an AI engine that doesn’t render JavaScript, doesn’t follow all your internal links, and reads only raw HTML still find and extract meaningful content from this page?” Those are different questions, and for most websites, they have different answers.

GPTBot, ClaudeBot, and the robots.txt settings that control your AI visibility

Your robots.txt file is the first thing that determines whether AI search engines can see your content at all. Get this wrong and nothing else in the audit matters, because the crawlers never make it past the front door.

When I audit a site for AI readiness, robots.txt is where I start. The issue isn’t complicated, but it’s surprisingly common: companies either block all AI crawlers without realising the consequences, or they have no AI crawler policy at all and don’t know what they’re allowing. Both are problems.

Here are the AI crawler user agents that matter in 2026:

Training crawlers (content enters the model’s knowledge base):

  • GPTBot (OpenAI)
  • ClaudeBot (Anthropic)
  • Google-Extended (Google/Gemini)
  • Applebot-Extended (Apple Intelligence)
  • CCBot (Common Crawl)

Retrieval crawlers (content fetched in real time to answer queries):

  • ChatGPT-User (OpenAI)
  • Claude-SearchBot (Anthropic)
  • PerplexityBot (Perplexity)

The nuance most people miss: these are separate systems. Blocking GPTBot prevents your content from entering OpenAI’s training data, but it doesn’t affect ChatGPT-User, which is what fetches your content when someone asks ChatGPT a question. You can block training and allow retrieval. For most companies in 2026, that’s the right call: prevent your content from being used to train models, while keeping it visible in AI search answers.

HTTP Archive data from 12.15 million sites shows that 21% of the top 1,000 websites currently block GPTBot (Paul Calvano ). That’s a conscious choice for most of those sites. The question is whether your organisation has made a conscious choice, or whether your robots.txt just happens to be whatever it was when someone last touched it three years ago.

What about llms.txt?

You’ve probably seen llms.txt mentioned as the next big standard for AI search. The idea is a structured file that tells AI engines what your site is about and how to interpret it. In theory, useful. In practice, not yet.

SE Ranking’s analysis of nearly 300,000 domains found that only 10.13% have implemented an llms.txt file. More importantly, their data (corroborated by ALLMO.ai’s study of 94,000+ cited URLs) shows no measurable correlation between having an llms.txt file and being cited by AI engines (SE Ranking ). No correlation at all.

That doesn’t mean it’ll never matter. But spending time on llms.txt before fixing your robots.txt, JavaScript rendering, and schema markup is optimising the wrong things in the wrong order. Get the foundations right first. Add llms.txt as optional scaffolding later if you want to.

My recommendation: treat your robots.txt as a business decision, not just a technical one. Decide what you’re comfortable with, document it, and review it quarterly. New AI crawlers appear regularly, and your policy should keep pace.

Want help running a GEO audit on your site?

We audit websites for AI search readiness and fix the technical issues that keep your content invisible.

Talk to us

Enjoying this article?

Get more B2B marketing insights delivered straight to your inbox.

JavaScript SEO and the content AI search engines can’t see

If your content loads via client-side JavaScript, AI search engines can’t see it. GPTBot, ClaudeBot, and PerplexityBot do not execute JavaScript. They don’t run your scripts, they don’t wait for your API calls to return, and they don’t interact with your single-page application. What they see is the raw HTML your server sends back on the first request. If your content isn’t in that HTML, it doesn’t exist as far as AI engines are concerned.

seoClarity’s internal data puts specific numbers on this: on modern single-page applications relying on client-side rendering, 50% to 80% of the meaningful content fails to appear to AI bots (seoClarity ). That’s not an edge case. That’s the majority of content on JavaScript-heavy sites being functionally invisible to every AI search engine.

The disconnect that catches companies off guard is Google. Google’s Web Rendering Service handles JavaScript through a two-wave indexing process: it captures the raw HTML first, then comes back later to render the JavaScript and process the dynamic content. This means a React or Angular site can rank perfectly well on Google while ChatGPT, Claude, and Perplexity can’t see a word of it. If your SEO dashboard shows green across the board but you’re getting zero AI citations, JavaScript rendering is the first thing to check.

How to test this yourself:

Right-click on any page of your site and select “View Page Source” (not “Inspect Element”, which shows the rendered DOM after JavaScript has run). What you see in View Page Source is what AI crawlers see. If your main content, headings, and text aren’t there, neither are your chances of being cited. For a more thorough test, run curl on your URL from the command line. The HTML that comes back is exactly what GPTBot receives.

How to fix it:

The solution is server-side rendering (SSR) or static site generation (SSG). Both ensure your content is in the HTML before it reaches the browser or the bot. Frameworks like Next.js (React), Nuxt (Vue), and SvelteKit make this achievable without rebuilding your entire front end from scratch. If you’re starting fresh, static site generators like Hugo or Eleventy solve the problem by default.

When I built the Fifty Five and Five website, we went with Hugo partly because static site generation means every page is already server-rendered. AI crawlers get the full content on the first request, no JavaScript execution required. It wasn’t the only reason we chose it, but it’s one that keeps paying off as AI search becomes more important.

If you’re running a JavaScript SPA and migrating to SSR feels like a big lift, start with your highest-value pages: the blog, the resource hub, the pages you actually want AI engines to cite. Get those server-rendered first and work outward.

Schema markup for AI citations: which types matter and which don’t

Schema markup helps AI engines cite your content, but only if the implementation is thorough. Generic, half-populated schema is actually worse than having no schema at all. That’s not an opinion. That’s what the data shows.

Growth Marshal’s study (February 2026, 730 citations across 1,006 pages) found that attribute-rich schema earns a 61.7% citation rate. Pages with no schema at all earn 59.8%. And generic schema (the kind where you’ve got the basic Article or Organization type with the minimum required fields and nothing else) earns just 41.6% (Growth Marshal ).

Generic, minimally populated schema actively hurts your AI citation rate compared to having no schema at all (41.6% vs 59.8%). Only attribute-rich schema with every relevant field populated outperforms the baseline, at 61.7%. If your schema isn’t thorough, it’s working against you.

The finding is counterintuitive, but it makes sense when you think about how AI engines process structured data. Generic schema signals that structured data exists but provides nothing useful to extract. The AI engine encounters a structured data block, parses it expecting useful information, and finds empty or minimal fields. That’s worse than encountering no schema at all, where the engine falls back to parsing the page content directly.

For lower-authority domains (DR ≤ 60, which covers most B2B company websites), the gap is even larger: attribute-rich schema achieves a 54.2% citation rate versus 31.8% for generic. If you’re not a household name, thorough schema implementation matters more, not less.

What “attribute-rich” means in practice:

It means populating every relevant field. For Article schema: author (with full name, URL, and sameAs links to LinkedIn and social profiles), datePublished, dateModified, publisher, headline, description, image, and wordCount. For Organization schema: name, url, logo, sameAs (linking to all official social profiles and any Wikipedia or Wikidata entries), contactPoint, and address. Not just the three required fields. All of them.

Which schema types matter for B2B content:

  • Article (with full author details): the most important schema type for blog content and thought leadership
  • Organization (with sameAs links): connects your entity to profiles across the web
  • BreadcrumbList: helps AI engines understand your site hierarchy and content relationships
  • FAQ: for genuine FAQ content (SE Ranking data shows FAQ blocks yield roughly 11% citation lift)
  • HowTo: for process-oriented, step-by-step content

The sameAs gap:

SALT.agency’s analysis of 107,352 URLs found that fewer than 4% of schema-present pages include sameAs entity links (Whitehat SEO ). sameAs links connect your entities to their profiles on other platforms: LinkedIn, Wikipedia, Wikidata, social accounts. They help AI engines understand who you are across the web and connect mentions of your brand to a single entity. This is a quick win that almost nobody implements.

The schema audit is usually where I find the biggest gap between what companies think they have and what’s actually in place. My recommendation: if your existing schema is generic or minimal, audit every field and populate it fully before adding new schema types. Depth beats breadth. Makes sense, eh?

This is the checklist I run when I audit client sites at Fifty Five and Five. It’s also what I run on our own site. Seven items, each with a clear pass or fail. The goal isn’t perfection on day one. It’s knowing exactly where you stand and fixing the items that have the biggest impact first.

  1. Robots.txt audit: Pull up your robots.txt and check each AI crawler user agent individually. Is GPTBot blocked or allowed? What about ChatGPT-User? ClaudeBot? Claude-SearchBot? PerplexityBot? Document what’s currently allowed, what’s blocked, and whether that’s the policy you actually want. If a specific AI crawler isn’t mentioned in your robots.txt, it’s allowed by default.

  2. JavaScript rendering test: Open your key pages and view source. Is your main content in the raw HTML? If you see empty <div> tags where text should be, your content is loading via JavaScript and AI crawlers can’t see it. This is a pass/fail test with no grey area.

  3. Schema markup audit: Run your pages through Google’s Rich Results Test or Schema.org’s validator. Check whether schema is present, whether it’s in JSON-LD format, and whether all relevant attributes are populated (not just the required minimum). Is author information complete with sameAs links? Remember the Growth Marshal finding: generic schema performs worse than no schema.

  4. Content extractability test: Read each section of your content starting from the H2. Does each section answer its question in the first 1-2 sentences? Could an AI engine extract just that section and present it as a useful, self-contained answer? Are your headings descriptive (“Schema markup for AI citations: which types matter”) rather than vague (“The secret sauce”)?

  5. Crawl access and sitemap check: Is your XML sitemap submitted to Google Search Console and Bing Webmaster Tools? Are your key content pages included? Are there orphan pages that aren’t linked from anywhere and therefore harder for AI crawlers to discover? Check for noindex tags on pages you actually want AI engines to find.

  6. Page speed and Core Web Vitals: AI crawlers have crawl budgets, and slow pages either get partially crawled or skipped entirely. Check your Largest Contentful Paint, Interaction to Next Paint, and Cumulative Layout Shift. If your pages take more than 3 seconds to respond with the initial HTML, you’re losing AI crawl coverage.

  7. Internal linking structure: Are your topic clusters properly connected? Can a crawler follow the path from your pillar page to each cluster and back? Internal linking is how AI engines understand the relationships between your content. If your content strategy for AI search doesn’t link pillar pages to cluster pages (and vice versa), you’re leaving topical authority on the table.

Run this checklist quarterly, not once. AI crawlers update their behaviour, new user agents appear, and your site changes over time. A quarterly audit catches drift before it turns into invisible content.

If you’ve already built your onsite content process and your offsite engagement layer , this technical audit is what ensures both layers are actually visible to the engines that matter.

How do you make your website technically ready for AI search engines

The question was how to make your website technically ready for AI search engines. The answer is a GEO audit that covers five areas: crawler access (your robots.txt and the business decisions behind it), JavaScript rendering (is your content in the raw HTML or hidden behind client-side scripts?), schema markup (attribute-rich or not at all), content extractability (can each section stand alone?), and the full technical checklist that ties it all together.

The companies I work with that are getting this right treat it as infrastructure, not a one-off project. The technical foundation is what makes your onsite content and offsite engagement layers actually work. You can write brilliant content and build a strong presence on Reddit and LinkedIn, but if AI crawlers can’t access your pages, can’t render your content, or can’t parse your structured data, none of it reaches the engines that are answering your customers’ questions.

Three things to do this week:

  • Check your robots.txt: Find out which AI crawlers you’re currently blocking or allowing. Make a conscious decision about your policy.
  • View source on your most important pages: If the content isn’t in the raw HTML, that’s your biggest technical problem and the first thing to fix.
  • Audit your schema markup: If it’s generic or half-populated, either complete every field or remove it entirely. Partial schema hurts more than it helps.

This is the technical infrastructure layer of the AI search strategy I’ve been building across this series. Onsite content gets you the material worth citing. Offsite engagement gets you the brand mentions that AI engines trust. The GEO audit makes sure AI engines can actually find and process all of it.

If you’re ready to run a GEO audit on your site, get in touch . I’ll walk you through the process.

Frequently asked questions

Start your GEO audit today

Don't let your content go unseen. Contact Fifty Five and Five to discuss how a comprehensive GEO audit can enhance your visibility in AI search results.