AI Crawler Check
Free Bot Analysis Tool
How to Block AI Crawlers with Robots.txt - Complete 2026 Guide
Guides 18 min read

How to Block AI Crawlers with Robots.txt (2026 Complete Guide)

By Brian Ho · ·

AI crawlers from ChatGPT, Claude, Gemini, and Perplexity are visiting websites at a very fast rate in 2026. These bots collect your content for different reasons. Some use it to train AI models. Others use it to answer questions in real time. Whether you want to block them to protect your content or allow them for better AI visibility, your robots.txt file is the first line of control.

In this guide, you will learn exactly how to block or allow every major AI crawler. We will show you the exact code you need to add to your robots.txt file. We will also explain the trade-offs so you can make the best choice for your website. By the end, you will have a clear plan for your AI crawler strategy.

This guide is for website owners, SEO professionals, content creators, and anyone who wants to understand how AI bots interact with their website. You do not need any technical background to follow along. We will explain everything step by step.

Quick Check

Want to see which AI crawlers are currently blocked on your website? Run a free scan at AI Crawler Check. It checks 154+ bots in seconds and gives you an AI Visibility Score.

What Are AI Crawlers and Why Do They Matter?

AI crawlers are automated programs (also called bots) that visit websites and collect information. They work a lot like the bots that Google and Bing use to index web pages. But AI crawlers have a different goal. Instead of building a search index, they collect content that AI companies use to build and improve their language models.

In 2026, there are more than 150 known AI crawlers active on the web. Some of these bots are from big companies like OpenAI, Google, and Anthropic. Others come from smaller companies or research groups. Each bot has its own name, called a user-agent, and its own rules about what it collects.

AI crawlers matter because they directly affect two things. First, they decide whether your content is used to train AI models like ChatGPT and Claude. Second, they decide whether your content shows up in AI-powered search results. If you block all AI crawlers, your website becomes invisible to AI search tools. If you allow all of them, your content may be used for model training without your permission.

The good news is that you can control which bots access your site. The main tool for this is your robots.txt file. Let us start by understanding the two types of AI crawlers.

Training AI crawlers versus search AI crawlers comparison

Two Types of AI Crawlers: Training vs. Search

Not all AI crawlers do the same thing. Understanding the difference between the two main types is important before you decide what to block. AI crawlers fall into two categories: training crawlers and search crawlers.

Training Crawlers

These bots collect content to train large language models. When a training crawler visits your website, it saves your text, images, and other content. Then it sends that data back to the AI company. The company uses your content to make their AI models better at writing, answering questions, and doing other tasks.

Examples: GPTBot (OpenAI), ClaudeBot (Anthropic), CCBot (Common Crawl), Google-Extended (Google AI training)

Search Crawlers

These bots visit your website in real time when a user asks a question to an AI tool. For example, when someone asks ChatGPT about a topic, the ChatGPT-User bot may visit your website right at that moment. It reads your page and uses the information to create an answer for the user. Your website may get credit with a link in the AI response.

Examples: ChatGPT-User (OpenAI Search), Perplexity-User (Perplexity Search), OAI-SearchBot, Claude-SearchBot

This difference is very important for your strategy. Many website owners want to block training crawlers (to protect their content from being used in AI models) but allow search crawlers (so their content shows up in AI search results and brings traffic). This is called selective blocking, and we will show you exactly how to do it later in this guide.

Some companies operate both types. For example, OpenAI has GPTBot for training and ChatGPT-User for search. These are separate bots with separate user-agent names. You can block one and allow the other. To learn more about OpenAI's crawlers, read our detailed guide on What is GPTBot.

The Tier 1 AI Crawlers You Must Know

Not all AI crawlers are equally important. We group them into tiers based on how much they affect your website's AI visibility. Tier 1 bots have the biggest impact on whether your content appears in AI-powered tools and search results. These are the bots you should focus on first.

Your decisions about these bots directly affect your AI Visibility Score. Here are the most important AI crawlers in 2026:

Bot NameCompanyTypeUser-AgentWhy It Matters
GPTBotOpenAITrainingGPTBotYour content becomes part of future ChatGPT models
ChatGPT-UserOpenAISearchChatGPT-UserYour content appears in ChatGPT search answers
ClaudeBotAnthropicTrainingClaudeBotYour content used to train Claude AI models
Google-ExtendedGoogleTrainingGoogle-ExtendedControls your visibility in Google AI Overviews and Gemini
PerplexityBotPerplexityBothPerplexityBotUsed for both training and real-time search answers
OAI-SearchBotOpenAISearchOAI-SearchBotOpenAI's dedicated search indexing bot
Claude-SearchBotAnthropicSearchClaude-SearchBotAnthropic's search crawler for Claude web search

You can find the full list of all 154+ bots, including their user-agent strings, safety ratings, and blocking instructions, in our Bot Directory. For now, let us focus on the practical steps to control these bots.

What is robots.txt and How Does It Work?

Before we get into the blocking rules, let us make sure you understand what robots.txt is and how it works. The robots.txt file is a plain text file that sits at the root of your website. For example, if your website is example.com, your robots.txt file lives at example.com/robots.txt.

When a bot wants to visit your website, it first checks your robots.txt file. The file tells the bot which pages it can visit and which pages it cannot visit. Think of it like a set of rules posted at the door of a building. The rules say who can come in and where they can go.

The robots.txt file uses a simple format. Each rule has two parts: a User-agent line (which says which bot the rule is for) and a Disallow or Allow line (which says what the bot can or cannot do). Here is a basic example:

# This rule tells GPTBot it cannot visit any page User-agent: GPTBot Disallow: / # This rule tells Googlebot it can visit all pages User-agent: Googlebot Allow: /

The / symbol means "everything on the website." So Disallow: / means "block everything" and Allow: / means "allow everything." You can also be more specific. For example, Disallow: /private/ would only block the /private/ folder.

There is one very important thing to know: robots.txt rules are based on trust. The file asks bots to follow the rules, but it does not force them. Most big companies like OpenAI, Google, and Anthropic say their bots follow robots.txt rules. But some smaller or unknown scrapers may ignore the rules. For these, you may need extra protection at the server level.

For a complete guide to robots.txt settings and optimization, read our Robots.txt Best Practices for AI SEO in 2026 article.

Option 1: Block All AI Crawlers

If you want to completely stop all AI crawlers from visiting your website, you need to add a rule for each major bot. You cannot just use one rule for all AI bots because they all have different user-agent names. Here is the full list you should add to your robots.txt file:

robots.txt: Block All AI Crawlers
# Block OpenAI crawlers User-agent: GPTBot Disallow: / User-agent: ChatGPT-User Disallow: / User-agent: OAI-SearchBot Disallow: / # Block Anthropic crawlers User-agent: ClaudeBot Disallow: / User-agent: Claude-Web Disallow: / User-agent: Claude-SearchBot Disallow: / # Block Google AI training User-agent: Google-Extended Disallow: / # Block Perplexity crawlers User-agent: PerplexityBot Disallow: / User-agent: Perplexity-User Disallow: / # Block other AI and data scrapers User-agent: CCBot Disallow: / User-agent: Bytespider Disallow: / User-agent: cohere-ai Disallow: / User-agent: Diffbot Disallow: / User-agent: FacebookBot Disallow: / User-agent: Applebot-Extended Disallow: / User-agent: omgili Disallow: / User-agent: Timpibot Disallow: /

Warning: Blocking all AI crawlers means your content will NOT show up in ChatGPT, Claude, Gemini, Perplexity, or any other AI search tool. This can reduce your website's total traffic, because AI search is growing very fast. In 2026, AI search traffic is up more than 400% compared to the year before. Think carefully before you block everything.

Do not want to write robots.txt by hand? Use our free Robots.txt Generator with one-click presets for all 154+ bots. It takes less than a minute.

Robots.txt file acting as a gateway filtering AI bot access

Option 2: Allow All AI Crawlers for Maximum Visibility

If your main goal is to get as much traffic as possible from AI tools, you should allow all AI crawlers. This gives you the best AI Visibility Score and means your content can show up in ChatGPT, Claude, Gemini, Perplexity, and other AI search results.

To allow all AI crawlers, make sure your robots.txt file does not have any Disallow rules for AI user-agents. You can also add explicit Allow rules to make your settings clear:

# Allow all major AI crawlers User-agent: GPTBot Allow: / User-agent: ChatGPT-User Allow: / User-agent: OAI-SearchBot Allow: / User-agent: ClaudeBot Allow: / User-agent: Claude-SearchBot Allow: / User-agent: Google-Extended Allow: / User-agent: PerplexityBot Allow: / User-agent: Perplexity-User Allow: / # Also consider adding llms.txt for better AI discoverability # See: /blog/llmstxt-guide-ai-discoverability

When you allow all AI crawlers, you get several benefits. First, your content can appear in AI search results from all major platforms. Second, your content may be cited and linked in AI-generated answers, which brings traffic back to your site. Third, if your content is in the training data, AI models may become more familiar with your brand and recommend it more often.

The main downside is that your content will be used for AI model training. This means AI companies will use your text, data, and other content to improve their systems. For some website owners, this is fine. For others, especially publishers and content creators who depend on their original content, this may not be acceptable.

For even better AI visibility, you should also create an llms.txt file. This file tells AI systems what your website is about and which pages are most important. It can add up to 35 points to your AI Visibility Score.

Option 3: Selective Blocking (The Smart Approach)

Most websites get the best results with a selective approach. This means you allow the search-facing AI bots (so your content shows up in AI answers) while blocking the training-only crawlers (so your content is not used to build AI models). This gives you the best of both worlds: AI search visibility without giving away your content for training.

Here is the recommended setup for selective blocking:

# ALLOW: AI search bots (your content shows up in AI answers) User-agent: ChatGPT-User Allow: / User-agent: OAI-SearchBot Allow: / User-agent: Perplexity-User Allow: / User-agent: Claude-SearchBot Allow: / # BLOCK: AI training crawlers (protect your original content) User-agent: GPTBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: Google-Extended Disallow: / # BLOCK: Aggressive scrapers (always block these) User-agent: CCBot Disallow: / User-agent: Bytespider Disallow: / User-agent: Diffbot Disallow: / User-agent: cohere-ai Disallow: / # Keep search engines working normally User-agent: Googlebot Allow: / User-agent: Bingbot Allow: / # Default: allow everything else, block sensitive paths User-agent: * Disallow: /admin/ Disallow: /api/ Disallow: /login/ Allow: / Sitemap: https://yourdomain.com/sitemap.xml

This approach works well for most types of websites. News websites, blogs, e-commerce stores, and corporate websites can all benefit from selective blocking. You keep your content out of training data while still getting traffic from AI search tools.

To create this setup without writing code by hand, use our Robots.txt Generator. It has a "Selective Blocking" preset that creates this exact setup with one click. You can also customize it for your specific needs.

Dashboard showing selective AI bot access control with toggle switches

Option 4: Partial Blocking (Block Specific Paths)

Sometimes you do not want to block a bot from your entire website. Instead, you want to let AI crawlers see some pages but not others. For example, you might want to let GPTBot read your blog posts (to get AI search traffic) but block it from your premium content or member-only pages.

You can do this with path-specific rules in robots.txt:

# Allow GPTBot on public pages, block on premium content User-agent: GPTBot Disallow: /premium/ Disallow: /members/ Disallow: /courses/ Allow: /blog/ Allow: /guides/ Allow: / # Same approach for ClaudeBot User-agent: ClaudeBot Disallow: /premium/ Disallow: /members/ Allow: /

This approach is great for websites that have both free and paid content. You can share your free content with AI systems to get more visibility while keeping your paid content protected. Many news websites and online course platforms use this strategy.

Remember that the order of rules matters. In robots.txt, more specific rules usually win over less specific ones. So if you have Disallow: /premium/ and Allow: / for the same bot, the bot will be blocked from /premium/ but allowed everywhere else.

Going Beyond robots.txt: Extra Protection Methods

While robots.txt is the most common and easiest way to control AI crawlers, it has one big weakness: it is based on trust. A bot can choose to ignore your robots.txt rules. The major AI companies say they follow the rules, but you have no way to force every bot to follow them.

For stronger protection, you can use additional methods alongside your robots.txt file:

Nginx Server Block

If you use Nginx as your web server, you can add rules that block bots at the server level. This is more powerful than robots.txt because the server will refuse to send any content to the blocked bot. Here is an example:

# Nginx: Block specific AI crawlers at server level if ($http_user_agent ~* "(GPTBot|ClaudeBot|CCBot|Bytespider)") { return 403; }

Apache .htaccess Block

If you use Apache as your web server, you can add similar rules in your .htaccess file:

# Apache: Block specific AI crawlers RewriteEngine On RewriteCond %{HTTP_USER_AGENT} (GPTBot|ClaudeBot|CCBot|Bytespider) [NC] RewriteRule .* - [F,L]

Meta Robot Tags

You can also add meta tags to individual HTML pages to tell AI crawlers not to use your content. This gives you page-level control:

<!-- Block GPTBot from using this page's content --> <meta name="GPTBot" content="noindex, nofollow"> <!-- Block all AI crawlers from this page --> <meta name="robots" content="noai, noimageai">

X-Robots-Tag HTTP Header

For non-HTML content like PDF files or images, you can use the X-Robots-Tag HTTP header. This tells bots not to use these files for AI training:

# In Nginx config location /documents/ { add_header X-Robots-Tag "noai, noimageai" always; }

For more details on all these methods, including how to set them up for specific bot categories, check the blocking instructions on any bot's page in our Bot Directory. Every bot profile page shows you the exact code for robots.txt, Nginx, Apache, meta tags, and HTTP headers.

Common Mistakes to Avoid

When setting up your robots.txt for AI crawlers, many website owners make mistakes that can hurt their SEO or leave their content unprotected. Here are the most common mistakes and how to avoid them:

Mistake 1: Using User-agent: * to Block AI Bots

Some people add User-agent: * Disallow: / and think it will block all AI crawlers. This is wrong because it also blocks Googlebot, Bingbot, and other search engines. Always use specific user-agent names for AI crawlers instead of the wildcard.

Mistake 2: Forgetting About New Crawlers

New AI crawlers appear every few months. If you only block the bots you know about today, new bots will be able to access your site tomorrow. Check your robots.txt at least every three months and use AI Crawler Check to scan for new bots.

Mistake 3: Blocking Search Crawlers by Accident

Be careful not to confuse AI training crawlers with AI search crawlers. If you block ChatGPT-User (search), your content will not appear in ChatGPT search results. If you only wanted to block GPTBot (training), you need to check that your rules are correct. Use our Robots.txt Validator to double-check.

Mistake 4: Not Testing After Making Changes

Always test your robots.txt after making changes. A small typo can break your rules. Use the Robots.txt Validator and then run a scan at AI Crawler Check to make sure everything works as expected.

Mistake 5: Blocking CSS, JavaScript, or Image Files

Never block CSS, JavaScript, or image files in your robots.txt. Both Google and AI crawlers need these files to understand your page properly. Only block page URLs, not asset files.

Step-by-Step: Set Up Your AI Crawler Rules

Now let us put everything together. Follow these steps to set up your AI crawler rules from start to finish:

1

Check Your Current Status

Go to AI Crawler Check and enter your website URL. The tool will scan your robots.txt and show you which of the 154+ bots are currently blocked, allowed, or not addressed. It also gives you an AI Visibility Score from 0 to 100.

2

Decide on Your Strategy

Choose one of the four options we covered: block all, allow all, selective blocking, or partial blocking. For most websites, we recommend selective blocking. This gives you AI search visibility while protecting your training data.

3

Generate Your robots.txt

Use our Robots.txt Generator to create the file. Select the bots you want to block or allow, and the tool will create the correct code for you. You can also use the preset buttons for common setups.

4

Upload Your robots.txt

Upload the file to the root of your website. It must be at yourdomain.com/robots.txt (not in a subfolder). If you use WordPress, you can edit it through the Yoast SEO or Rank Math plugin. If you use a static site, add the file to your root directory.

5

Validate Your Setup

Use the Robots.txt Validator to check for errors in your file. Then run another scan at AI Crawler Check to make sure the bots are blocked or allowed as you intended.

6

Add Extra Protection (Optional)

If you want stronger protection, add server-level blocks using Nginx or Apache rules (see the section above). You can also add meta robot tags to individual pages for more precise control.

7

Review Every Quarter

Set a reminder to check your robots.txt every 3 months. New AI crawlers appear regularly, and your strategy may need to change as AI search grows. Use the Batch Checker if you manage multiple websites.

AI Crawler Strategy by Website Type

Different types of websites need different AI crawler strategies. Here are our recommendations for the most common website types:

News and Media Websites

Recommendation: Selective blocking. Allow search bots (ChatGPT-User, OAI-SearchBot, Perplexity-User) to get traffic from AI search. Block training bots (GPTBot, ClaudeBot) to protect your original reporting. News content is very valuable for AI training, so protecting it makes sense.

E-commerce Stores

Recommendation: Allow most AI crawlers. AI search visibility can bring new customers to your products. Allow all Tier 1 bots for maximum visibility. Block aggressive scrapers (Bytespider, CCBot) that may steal product data for competitors. Consider partial blocking for pricing pages.

Blogs and Content Creators

Recommendation: Selective blocking or allow all. If your blog makes money from traffic, allow AI search bots to get more visitors. If you sell your content (courses, ebooks), block training bots to protect your paid material. Many bloggers choose to allow everything because AI search traffic is growing fast.

Corporate and SaaS Websites

Recommendation: Allow all Tier 1 bots. Corporate websites benefit a lot from AI visibility. When someone asks an AI tool about your industry, you want your company to be mentioned. Allow all major AI crawlers and add an llms.txt file for even better visibility.

Education and Government

Recommendation: Allow most crawlers for public information. Block training bots for protected research or student data. Government and education websites usually want their public information to be as easy to find as possible, including through AI tools.

The SEO Impact of Your AI Crawler Decisions

Let us be clear about one important fact: blocking AI crawlers does not affect your Google search ranking. Googlebot and AI crawlers are completely separate. You can block every AI crawler and your Google ranking will stay the same.

However, blocking AI crawlers does affect your overall traffic and visibility. In 2026, AI search is one of the fastest growing traffic sources on the internet. Here are some numbers to think about:

400%+
AI search traffic growth year over year
60%+
Websites accidentally block at least one AI bot
154+
Known AI crawlers active on the web

If you block all AI crawlers, you are missing out on a rapidly growing traffic source. If you allow all of them, you may be giving away your content for free. The best approach depends on your business model and goals. That is why selective blocking is so popular. It gives you the traffic benefits without the training data costs.

To understand exactly how your current settings affect your visibility, check your AI Visibility Score using our free tool. The score tells you how well your website is set up for AI access and where you can improve.

You can also learn about a new standard called llms.txt that works together with robots.txt. While robots.txt controls bot access, llms.txt tells AI systems what your website is about. Together, they give you the most control over your AI visibility.

Quick Summary and Next Steps

Here is a quick summary of everything we covered in this guide:

AI crawlers come in two types: training crawlers (collect data for AI models) and search crawlers (fetch data for real-time AI answers)

Your robots.txt file is the main tool to control which AI bots can access your website

You can block all, allow all, use selective blocking, or do partial path blocking

Selective blocking (allow search bots, block training bots) is the best option for most websites

Blocking AI crawlers does not affect your Google search ranking

For extra security, add server-level blocks with Nginx or Apache in addition to robots.txt

Review your settings every 3 months as new bots appear regularly

Now that you know how to control AI crawlers, here are your next steps:

1.

Scan your website to see your current AI Visibility Score and which bots are blocked

2.

Generate your robots.txt with our free tool using the strategy that fits your website

3.

Read our robots.txt best practices guide for more advanced tips

4.

Learn about llms.txt to boost your AI visibility even further

Check Your AI Crawler Access Now

See which of the 154+ bots are blocked or allowed on your website. Free, instant results.

Free AI Bot Check

Frequently Asked Questions

How do I block ChatGPT from crawling my website?
Add User-agent: GPTBot followed by Disallow: / to your robots.txt file. This blocks OpenAI's training crawler. To also block ChatGPT Search, add rules for ChatGPT-User and OAI-SearchBot. Use our Robots.txt Generator to create the correct setup.
Does blocking AI crawlers affect my Google ranking?
No. AI crawlers like GPTBot, ClaudeBot, and Google-Extended are separate from Googlebot. Blocking AI crawlers does not change your Google search ranking. However, blocking Google-Extended will stop your content from showing up in Google AI Overviews and Gemini responses.
Which AI crawlers should I block?
It depends on your goals. If you want the most AI visibility, allow Tier 1 bots (GPTBot, ClaudeBot, Google-Extended, PerplexityBot). If you want to stop AI training but keep search visibility, block training crawlers (GPTBot, ClaudeBot, CCBot) and allow search bots (ChatGPT-User, Perplexity-User). Use AI Crawler Check to see which bots can reach your site right now.
How do I verify if my robots.txt is blocking AI bots correctly?
Use our free Robots.txt Validator to paste your robots.txt and check which bots are blocked, allowed, or only partly restricted. You can also run a full site check at aicrawlercheck.com.
Can I block some AI crawlers and allow others?
Yes. This is called selective blocking and it is the best approach for most websites. You write separate rules for each bot in your robots.txt file. For example, you can allow ChatGPT-User (for search visibility) while blocking GPTBot (to stop model training). Our Robots.txt Generator makes this easy with one-click presets.
Do AI crawlers actually follow robots.txt rules?
Major AI companies like OpenAI, Anthropic, Google, and Perplexity have all said their crawlers follow robots.txt rules. However, some smaller or less well-known scrapers may not follow the rules. That is why we recommend also using server-side blocking (like Nginx or Apache rules) for bots you really want to keep out.

Related Articles

B
Brian Ho
SEO & AI SEO Specialist at Brian Ho Marketing

Brian specializes in AI SEO and web crawler optimization. He built AI Crawler Check to help website owners navigate the rapidly evolving landscape of AI crawlers and search.

Check Your AI Visibility Now

Scan your website against 154+ bots and get your AI Visibility Score