A GEO audit is an assessment of your website's technical readiness for AI search engines. It evaluates factors like accessibility, content rendering, structured data quality, and overall technical foundations critical for AI indexing.

How to run a GEO audit so AI search engines can actually find your content

Q: How does a GEO audit differ from traditional SEO audits?

While a GEO audit shares some common aspects with SEO audits, it specifically focuses on how AI crawlers interact with your site. This includes ensuring that your content is accessible to different types of crawlers, which may not render JavaScript or follow links like Google does.

Q: Why is a GEO audit important for AI search optimisation?

A GEO audit is crucial for AI search optimisation because it ensures that the underlying technical structure of your website is compatible with AI crawlers. Without this, your content could be invisible to AI search engines, hindering its discoverability.

Q: What are the key areas assessed in a GEO audit?

A GEO audit typically covers five key areas: page accessibility for crawlers, the ability of content to render without JavaScript, the richness of structured data, content extractability, and compliance with an AI search-specific technical checklist.

Q: How can a GEO audit improve my website's performance?

By identifying and rectifying technical issues that affect how AI engines access and index your content, a GEO audit can enhance your website's visibility in AI search results. This leads to increased engagement and potential traffic from AI-driven platforms.

Q: What are common technical issues found during a GEO audit?

Common issues identified in a GEO audit include JavaScript-rendering problems, inadequate structured data, and content that does not stand alone when extracted. Addressing these problems helps align your site with AI search engine requirements.

Illustration of a checklist representing a GEO audit process in a vibrant, digital style.

Owen Steer March 30, 2026 18 min read

How to Awareness Nurture and activation

How do I make my website technically ready for AI search engines?

A GEO audit covers five areas: AI crawler access, JavaScript rendering, schema markup quality, content extractability, and a technical checklist for AI search. A site ranking well on Google can be invisible to ChatGPT and Perplexity because they don't render JavaScript, don't follow all links, and read only raw HTML. 50-80% of content on JS-heavy sites never reaches AI bots.

You make your website technically ready for AI search engines by running a GEO audit that covers five areas: whether AI crawlers can actually access your pages, whether your content renders without JavaScript, whether your structured data is attribute-rich (not just present), whether each section of content can stand alone if extracted, and whether the full technical foundation passes a checklist built for AI search, not just Google.

Most of this overlaps with good SEO hygiene. But the differences are where companies get caught out. A site that ranks well on Google can be completely invisible to ChatGPT, Claude, and Perplexity. Different crawlers, different rendering capabilities, different rules. If you’ve been building your AI search optimisation strategy around onsite content and offsite engagement , the technical infrastructure underneath is what makes both layers actually work. Without it, you’re publishing content that AI engines can’t read and building brand presence that AI engines can’t connect back to your site.

Owen Steer at Fifty Five and Five . I run these audits for clients building their AI search presence, and the issues I find are remarkably consistent. This piece walks through what I check, what I find, and the specific technical gaps that make the biggest difference to whether AI engines can find and cite your content.

AI indexing is not Google indexing: what actually changed

AI indexing works fundamentally differently from Google indexing, and that difference is why sites that rank well on Google might not exist as far as AI search engines are concerned. Google crawls your pages, renders JavaScript, follows links, and builds a searchable index. AI engines skip most of that.

The first thing to understand is that AI crawlers serve two distinct purposes. Training crawlers (GPTBot, ClaudeBot, Google-Extended) harvest content to build and update the model’s knowledge base. Retrieval crawlers (ChatGPT-User, Claude-SearchBot, PerplexityBot) fetch content in real time when a user asks a question. Training builds what the model knows. Retrieval is what happens when the model needs to cite a source right now. These are separate systems with separate access controls, which means your content can exist in the model’s training data without being accessible for real-time citation (or vice versa).

AI crawl volume is growing fast. JetOctopus data shows AI bot activity now sits at roughly 40-50% of Googlebot-level activity across the web (JetOctopus ). But the traffic return is asymmetric. Anthropic’s crawlers generate approximately 38,000 crawl requests for every single referral back to your site. OpenAI’s ratio is roughly 400:1 (Am I Cited ). That’s a lot of crawling for very little direct traffic. The value isn’t in click-throughs. It’s in citations.

Here’s the thing: Google has a massive technical advantage over every other AI engine when it comes to understanding your content. Google renders JavaScript through its Web Rendering Service. It processes dynamic content. It follows internal links methodically. AI engines don’t do any of that (with the partial exception of Gemini, which benefits from Google’s existing rendered index). They read the raw HTML they receive and move on.

The technical differences between how Google processes your site and how AI engines process it are more fundamental than most people realise. Google crawls, renders JavaScript, follows internal links, indexes pages into a searchable database, and ranks them based on hundreds of signals. AI engines add a retrieval layer on top of that: they search for relevant content, extract specific passages, cross-reference claims against other sources, and synthesise answers from multiple pages. AI crawlers like GPTBot, ClaudeBot, and PerplexityBot each have different capabilities and limitations compared to Googlebot. Content that’s technically perfect for Google (fast, mobile-friendly, well-linked) can still fail AI extraction if the answers are buried in paragraph four or hidden behind a JavaScript render. A page can pass every traditional SEO check and still be invisible to AI search.

This is why a GEO audit is different from a traditional SEO audit (I cover the full difference between SEO and GEO separately). A traditional audit asks: “Can Google find, render, and index this content?” A GEO audit asks: “Can an AI engine that doesn’t render JavaScript, doesn’t follow all your internal links, and reads only raw HTML still find and extract meaningful content from this page?” Those are different questions, and for most websites, they have different answers.

What structured data helps AI search engines understand your content

Structured data is one area where you can give AI engines more to work with than they’d get from raw HTML alone. Article schema, author schema, and FAQ schema provide machine-readable signals that help AI engines identify who wrote the content, when it was published, and what questions it answers. Author schema is particularly valuable because it helps AI attribute content to specific experts, reinforcing E-E-A-T signals at a technical level. The sameAs property (which links an author entity to their LinkedIn, Twitter, and other profiles) helps AI engines verify that the person exists and has a presence across multiple platforms. Fewer than 4% of schema-present pages include sameAs links (SALT.agency ), which means this is still a genuine competitive advantage for sites that implement it properly. I cover the specific schema types and implementation details in the schema section below.

GPTBot, ClaudeBot, and the robots.txt settings that control your AI visibility

Your robots.txt file is the first thing that determines whether AI search engines can see your content at all. Get this wrong and nothing else in the audit matters, because the crawlers never make it past the front door.

When I audit a site for AI readiness, robots.txt is where I start. The issue isn’t complicated, but it’s surprisingly common: companies either block all AI crawlers without realising the consequences, or they have no AI crawler policy at all and don’t know what they’re allowing. Both are problems.

Here are the AI crawler user agents that matter in 2026:

Training crawlers (content enters the model’s knowledge base):

GPTBot (OpenAI)
ClaudeBot (Anthropic)
Google-Extended (Google/Gemini)
Applebot-Extended (Apple Intelligence)
CCBot (Common Crawl)

Retrieval crawlers (content fetched in real time to answer queries):

ChatGPT-User (OpenAI)
Claude-SearchBot (Anthropic)
PerplexityBot (Perplexity)

The nuance most people miss: these are separate systems. Blocking GPTBot prevents your content from entering OpenAI’s training data, but it doesn’t affect ChatGPT-User, which is what fetches your content when someone asks ChatGPT a question. You can block training and allow retrieval. For most companies in 2026, that’s the right call: prevent your content from being used to train models, while keeping it visible in AI search answers.

HTTP Archive data from 12.15 million sites shows that 21% of the top 1,000 websites currently block GPTBot (Paul Calvano ). That’s a conscious choice for most of those sites. The question is whether your organisation has made a conscious choice, or whether your robots.txt just happens to be whatever it was when someone last touched it three years ago.

What about llms.txt?

You’ve probably seen llms.txt mentioned as the next big standard for AI search. The idea is a structured file that tells AI engines what your site is about and how to interpret it. In theory, useful. In practice, not yet.

SE Ranking’s analysis of nearly 300,000 domains found that only 10.13% have implemented an llms.txt file. More importantly, their data (corroborated by ALLMO.ai’s study of 94,000+ cited URLs) shows no measurable correlation between having an llms.txt file and being cited by AI engines (SE Ranking ). No correlation at all.

That doesn’t mean it’ll never matter. We cover what an llms.txt file is and how to create one in a separate guide. But spending time on llms.txt before fixing your robots.txt, JavaScript rendering, and schema markup is optimising the wrong things in the wrong order. Get the foundations right first. Add llms.txt as optional scaffolding later if you want to.

My recommendation: treat your robots.txt as a business decision, not just a technical one. Decide what you’re comfortable with, document it, and review it quarterly. New AI crawlers appear regularly, and your policy should keep pace.

Want help running a GEO audit on your site?

We audit websites for AI search readiness and fix the technical issues that keep your content invisible.

Talk to us

JavaScript SEO and the content AI search engines can’t see

If your content loads via client-side JavaScript, AI search engines can’t see it. GPTBot, ClaudeBot, and PerplexityBot do not execute JavaScript. They don’t run your scripts, they don’t wait for your API calls to return, and they don’t interact with your single-page application. What they see is the raw HTML your server sends back on the first request. If your content isn’t in that HTML, it doesn’t exist as far as AI engines are concerned.

seoClarity’s internal data puts specific numbers on this: on modern single-page applications relying on client-side rendering, 50% to 80% of the meaningful content fails to appear to AI bots (seoClarity ). That’s not an edge case. That’s the majority of content on JavaScript-heavy sites being functionally invisible to every AI search engine.

The disconnect that catches companies off guard is Google. Google’s Web Rendering Service handles JavaScript through a two-wave indexing process: it captures the raw HTML first, then comes back later to render the JavaScript and process the dynamic content. This means a React or Angular site can rank perfectly well on Google while ChatGPT, Claude, and Perplexity can’t see a word of it. If your SEO dashboard shows green across the board but you’re getting zero AI citations, JavaScript rendering is the first thing to check.

How to test this yourself:

Right-click on any page of your site and select “View Page Source” (not “Inspect Element”, which shows the rendered DOM after JavaScript has run). What you see in View Page Source is what AI crawlers see. If your main content, headings, and text aren’t there, neither are your chances of being cited. For a more thorough test, run curl on your URL from the command line. The HTML that comes back is exactly what GPTBot receives.

How to fix it:

The solution is server-side rendering (SSR) or static site generation (SSG). Both ensure your content is in the HTML before it reaches the browser or the bot. Frameworks like Next.js (React), Nuxt (Vue), and SvelteKit make this achievable without rebuilding your entire front end from scratch. If you’re starting fresh, static site generators like Hugo or Eleventy solve the problem by default.

When I built the Fifty Five and Five website, we went with Hugo partly because static site generation means every page is already server-rendered. AI crawlers get the full content on the first request, no JavaScript execution required. It wasn’t the only reason we chose it, but it’s one that keeps paying off as AI search becomes more important.

If you’re running a JavaScript SPA and migrating to SSR feels like a big lift, start with your highest-value pages: the blog, the resource hub, the pages you actually want AI engines to cite. Get those server-rendered first and work outward.

Schema markup for AI citations: which types matter and which don’t

Schema markup helps AI engines cite your content, but only if the implementation is thorough. Generic, half-populated schema is actually worse than having no schema at all. That’s not an opinion. That’s what the data shows.

Growth Marshal’s study (February 2026, 730 citations across 1,006 pages) found that attribute-rich schema earns a 61.7% citation rate. Pages with no schema at all earn 59.8%. And generic schema (the kind where you’ve got the basic Article or Organization type with the minimum required fields and nothing else) earns just 41.6% (Growth Marshal ).

Generic, minimally populated schema actively hurts your AI citation rate compared to having no schema at all (41.6% vs 59.8%). Only attribute-rich schema with every relevant field populated outperforms the baseline, at 61.7%. If your schema isn’t thorough, it’s working against you.

The finding is counterintuitive, but it makes sense when you think about how AI engines process structured data. Generic schema signals that structured data exists but provides nothing useful to extract. The AI engine encounters a structured data block, parses it expecting useful information, and finds empty or minimal fields. That’s worse than encountering no schema at all, where the engine falls back to parsing the page content directly.

For lower-authority domains (DR ≤ 60, which covers most B2B company websites), the gap is even larger: attribute-rich schema achieves a 54.2% citation rate versus 31.8% for generic. If you’re not a household name, thorough schema implementation matters more, not less.

What “attribute-rich” means in practice:

It means populating every relevant field. For Article schema: author (with full name, URL, and sameAs links to LinkedIn and social profiles), datePublished, dateModified, publisher, headline, description, image, and wordCount. For Organization schema: name, url, logo, sameAs (linking to all official social profiles and any Wikipedia or Wikidata entries), contactPoint, and address. Not just the three required fields. All of them.

Which schema types matter for B2B content:

Article (with full author details): the most important schema type for blog content and thought leadership
Organization (with sameAs links): connects your entity to profiles across the web
BreadcrumbList: helps AI engines understand your site hierarchy and content relationships
FAQ: for genuine FAQ content (SE Ranking data shows FAQ blocks yield roughly 11% citation lift)
HowTo: for process-oriented, step-by-step content

The sameAs gap:

SALT.agency’s analysis of 107,352 URLs found that fewer than 4% of schema-present pages include sameAs entity links (Whitehat SEO ). sameAs links connect your entities to their profiles on other platforms: LinkedIn, Wikipedia, Wikidata, social accounts. They help AI engines understand who you are across the web and connect mentions of your brand to a single entity. This is a quick win that almost nobody implements.

The schema audit is usually where I find the biggest gap between what companies think they have and what’s actually in place. My recommendation: if your existing schema is generic or minimal, audit every field and populate it fully before adding new schema types. Depth beats breadth. Sound clever right? It is.

Enjoying this article?

Get more B2B marketing insights delivered straight to your inbox.

The technical SEO checklist I use to audit websites for AI search

Why the checklist covers what it covers: each AI platform works differently

The reason a GEO audit needs to be broader than a traditional SEO audit is that each AI platform finds and cites sources through a different mechanism. Perplexity searches the web in real-time for every single query and cites sources inline with numbered references, always linking back. It rewards recency and specificity. ChatGPT with browsing enabled cites more selectively, often with inline links, and draws from both its training data and live web results. Without browsing, it relies entirely on training data with no citation at all. Google AI Overviews pull from Google’s own search index and reference sources in a sidebar or footer format, leaning heavily on structured data and E-E-A-T signals from pages it has already indexed. Each platform’s citation mechanics create different technical requirements, which is why the checklist below covers crawl access, rendering, schema, and content structure rather than optimising for one engine.

This is the checklist I run when I audit client sites at Fifty Five and Five. It’s also what I run on our own site. Seven items, each with a clear pass or fail. The goal isn’t perfection on day one. It’s knowing exactly where you stand and fixing the items that have the biggest impact first.

Robots.txt audit: Pull up your robots.txt and check each AI crawler user agent individually. Is GPTBot blocked or allowed? What about ChatGPT-User? ClaudeBot? Claude-SearchBot? PerplexityBot? Document what’s currently allowed, what’s blocked, and whether that’s the policy you actually want. If a specific AI crawler isn’t mentioned in your robots.txt, it’s allowed by default.
JavaScript rendering test: Open your key pages and view source. Is your main content in the raw HTML? If you see empty <div> tags where text should be, your content is loading via JavaScript and AI crawlers can’t see it. This is a pass/fail test with no grey area.
Schema markup audit: Run your pages through Google’s Rich Results Test or Schema.org’s validator. Check whether schema is present, whether it’s in JSON-LD format, and whether all relevant attributes are populated (not just the required minimum). Is author information complete with sameAs links? Remember the Growth Marshal finding: generic schema performs worse than no schema.
Content extractability test: Read each section of your content starting from the H2. Does each section answer its question in the first 1-2 sentences? Could an AI engine extract just that section and present it as a useful, self-contained answer? Are your headings descriptive (“Schema markup for AI citations: which types matter”) rather than vague (“The secret sauce”)? Query fan-out analysis can tell you exactly which questions AI engines are looking for answers to on your topic.
Crawl access and sitemap check: Is your XML sitemap submitted to Google Search Console and Bing Webmaster Tools? Are your key content pages included? Are there orphan pages that aren’t linked from anywhere and therefore harder for AI crawlers to discover? Check for noindex tags on pages you actually want AI engines to find.
Page speed and Core Web Vitals: AI crawlers have crawl budgets, and slow pages either get partially crawled or skipped entirely. Check your Largest Contentful Paint, Interaction to Next Paint, and Cumulative Layout Shift. If your pages take more than 3 seconds to respond with the initial HTML, you’re losing AI crawl coverage.
Internal linking structure: Are your topic clusters properly connected? Can a crawler follow the path from your pillar page to each cluster and back? Internal linking is how AI engines understand the relationships between your content. If your content strategy for AI search doesn’t link pillar pages to cluster pages (and vice versa), you’re leaving topical authority on the table.

Run this checklist quarterly, not once. AI crawlers update their behaviour, new user agents appear, and your site changes over time. A quarterly audit catches drift before it turns into invisible content.

If you’ve already built your onsite content process and your offsite engagement layer , this technical audit is what ensures both layers are actually visible to the engines that matter.

How do you make your website technically ready for AI search engines

The question was how to make your website technically ready for AI search engines. The answer is a GEO audit that covers five areas: crawler access (your robots.txt and the business decisions behind it), JavaScript rendering (is your content in the raw HTML or hidden behind client-side scripts?), schema markup (attribute-rich or not at all), content extractability (can each section stand alone?), and the full technical checklist that ties it all together.

The companies I work with that are getting this right treat it as infrastructure, not a one-off project. The technical foundation is what makes your onsite content and offsite engagement layers actually work. You can write brilliant content and build a strong presence on Reddit and LinkedIn, but if AI crawlers can’t access your pages, can’t render your content, or can’t parse your structured data, none of it reaches the engines that are answering your customers’ questions.

Three things to do this week:

Check your robots.txt: Find out which AI crawlers you’re currently blocking or allowing. Make a conscious decision about your policy.
View source on your most important pages: If the content isn’t in the raw HTML, that’s your biggest technical problem and the first thing to fix.
Audit your schema markup: If it’s generic or half-populated, either complete every field or remove it entirely. Partial schema hurts more than it helps.

This is the technical infrastructure layer of the AI search strategy I’ve been building across this series. Onsite content gets you the material worth citing. Offsite engagement gets you the brand mentions that AI engines trust. The GEO audit makes sure AI engines can actually find and process all of it.

If you’re ready to run a GEO audit on your site, get in touch . I’ll walk you through the process.

Frequently asked questions

What is a GEO audit?

How does a GEO audit differ from traditional SEO audits?

Why is a GEO audit important for AI search optimisation?

What are the key areas assessed in a GEO audit?

How can a GEO audit improve my website\'s performance?

What are common technical issues found during a GEO audit?