Cloudflare’s 2025 Year in Review highlights major shifts in web crawling: Googlebot remains dominant while AI-driven crawlers are rising rapidly. Googlebot accounted for over 25% of verified bot traffic and generated 4.5% of all HTML request traffic — more than all other AI bots combined. At the same time, AI “user action” crawling surged roughly 15x year-over-year, driven by platforms like Anthropic, OpenAI, and Perplexity.

Googlebot’s share of verified bot traffic and its outsized role in HTML requests confirm that it remains the most important crawler for search visibility. The rapid growth of AI crawlers, meanwhile, introduces new complexities. These AI bots simulate human interactions with web pages, which can increase server load, change content consumption patterns, and raise questions about how content is used for AI training.
Googlebot’s dominance makes optimization for its crawling behavior essential. Ensure fast load times, clear site architecture, and accessible content so Google can index your most important pages effectively. At the same time, AI crawlers that simulate user interactions can trigger unexpected server load and access content in ways that traditional bots do not.
As Search Engine Land’s Danny Goodwin observed: “The rise of AI crawlers introduces new challenges and opportunities for site owners, requiring a more nuanced approach to bot management and server resource allocation.” That means treating bot management as an ongoing operational task rather than a one-time configuration.
Start with targeted robots.txt rules and server-level controls. Use robots.txt to limit or direct AI crawlers where appropriate; for example:
User-agent: AnthropicBot Crawl-delay: 10 Disallow: /private/
On the server side, implement rate limiting for heavy crawlers. An nginx snippet to limit requests from OpenAI user agents looks like this:
limit_req_zone $binary_remote_addr zone=bot_limit:10m rate=1r/s;
server {
location / {
if ($http_user_agent ~* "OpenAI") {
limit_req zone=bot_limit burst=5 nodelay;
}
}
}
Combine these technical controls with ongoing monitoring: log user agents, track crawl frequency, and measure crawl-to-refer ratios where possible. Keep an eye on robots.txt disallow rates; Cloudflare found AI crawlers were the most frequently fully disallowed user agents in robots.txt files.
AI crawlers often access content for model training and analysis. Site owners should review terms of service and content licenses, especially for user-generated content or proprietary material. Where necessary, use explicit directives to restrict training data harvesting and be transparent with your users about how content may be used.
SEOteric can help with technical audits, server configuration, and policy guidance. Visit https://www.seoteric.com to learn more about our SEO and site performance services.
Attribution: This post is based on reporting by Danny Goodwin at Search Engine Land and the Cloudflare Radar 2025 Year in Review. Read the original article on Search Engine Land: https://searchengineland.com/googlebot-crawling-ai-bots-2025-report-466402 and Cloudflare’s report: https://radar.cloudflare.com/year-in-review/2025.
Quote: “The rise of AI crawlers introduces new challenges and opportunities for site owners, requiring a more nuanced approach to bot management and server resource allocation.” — Danny Goodwin, Search Engine Land
Recognized by clients and industry publications for providing top-notch service and results.
Contact Us to Set Up A Discovery Call
Our clients love working with us, and we think you will too. Give us a call to see how we can work together - or fill out the contact form.