← All briefings Briefing

The technical SEO audit now needs an AI crawler layer

seoai searchtechnical seocrawlers

Technical SEO audits have long focused on Googlebot: crawl budget, renderability, canonical tags and Core Web Vitals. That still matters, but a new class of visitor is now arriving in server logs and the audit needs to account for it.

Cloudflare’s analysis of AI crawler traffic breaks the landscape into three groups: training crawlers (89.4%), search crawlers (8%) and user-triggered agents (2.2%). The training crawlers are the ones scraping content to build or update foundation models. Search crawlers include the new generation of AI-powered search engines. User-triggered agents arrive when a person explicitly asks an AI tool to visit a page on their behalf.

Together, these bots behave differently from Googlebot. They request pages at different rates, respect robots.txt inconsistently, and can place material load on infrastructure. Ignoring them means both missing visibility opportunities and risking performance problems.

Why this belongs in the audit

A traditional SEO audit asks whether a site can be discovered, understood and ranked. An AI crawler layer asks the same questions for a different audience: can an AI system find the page, extract accurate information, and present it as an answer or citation?

Training crawlers need clean, accessible content. If your most useful material sits behind JavaScript, login walls or fragmented pagination, it may never make it into a model’s training data. Search crawlers and user-triggered agents need fast, well-structured pages that can be summarised accurately. The signals are similar to conventional SEO, but the failure modes are different.

There is also a traffic management question. Some AI crawlers are polite; some are not. Without classification, they can consume bandwidth, distort analytics and trigger rate limits that affect real users.

What to add to the audit checklist

Start by identifying what is already happening. Parse server logs or use a reverse-proxy tool to classify AI crawler requests by provider and purpose. Look for spikes, repeated paths and requests that bypass standard bot detection.

Next, review your robots.txt and rate-limiting policy. Decide explicitly which AI crawlers are allowed, which are restricted, and on which parts of the site. This is a business decision as much as a technical one: some organisations want maximum training exposure, others want to protect proprietary content or reduce infrastructure costs.

Then check content accessibility. AI crawlers vary in their ability to render JavaScript, parse tables, or understand structured data. Key pages should present critical information in clean HTML with sensible heading hierarchy and accurate schema markup.

Finally, measure AI search visibility separately. Track whether your brand, products and key topics appear in AI-generated answers and summaries. This is a better indicator of AI-era performance than rankings alone.

A practical next step

Most marketing and engineering teams already have an SEO audit process. The easiest improvement is to add one slide or section: AI crawler traffic, policy, accessibility and AI search visibility. It does not require replacing the existing audit, only extending it to cover the new audience that is already knocking on the door.

Related briefings

Keep reading.

More from the team

Longer thinking →

Briefings are short reads on the news. For Burt's own thinking, see the Journal.