Operated by Internet Archive
The crawler for the Internet Archive (Wayback Machine). It preserves the history of the web.
The crawler for the Internet Archive (Wayback Machine). It preserves the history of the web.
archive.org_bot crawls websites using the user-agent archive.org_bot. Review the safety rating and blocking impact below.
<code>User-agent: archive.org_bot</code> — Matching is case-insensitive. Robots.txt is fetched from the root of each subdomain separately.
Understanding archive.org_bot's purpose helps you decide whether to allow or block it.
archive.org_bot. This is the exact string you must use in robots.txt, Nginx, Apache, or Cloudflare firewall rules to target this bot. User-agent matching in robots.txt is case-insensitive, but the string must be spelled correctly. You can verify that a request genuinely comes from archive.org_bot by performing a reverse-DNS lookup on the source IP — legitimate bots resolve back to their operator's domain./robots.txt file:
User-agent: archive.org_bot Disallow: /This instructs archive.org_bot not to crawl any path on your site. The Disallow: / directive covers the entire domain including subfolders. To only block specific sections, replace / with the path (e.g.,
Disallow: /blog/). Note: robots.txt is publicly readable — any bot or human can inspect it at yourdomain.com/robots.txt.archive.org_bot (case-insensitive grep: grep -i "archive.org_bot" /var/log/nginx/access.log). You can also check Google Search Console → Coverage → Crawl Stats for Googlebot variants. For archive.org_bot specifically, filter by user-agent in your log analysis tool (GoAccess, AWStats, etc.).Disallow: / you can restrict archive.org_bot to specific paths:
User-agent: archive.org_bot Disallow: /private/ Disallow: /staging/ Allow: /This allows archive.org_bot everywhere except the listed paths. Path matching in robots.txt uses prefix matching —
Disallow: /private/ blocks /private/page.html but NOT /public/private/.Crawl-delay: 30 below the User-agent directive in robots.txt.
2. Rate-limit the user-agent via Nginx's limit_req_zone or Apache's mod_ratelimit.
3. Block it outright at Cloudflare WAF with rule: http.user_agent contains "archive.org_bot".
4. Use fail2ban to auto-block IPs exceeding request thresholds.Check instantly with our free AI Bot Checker
Check Your Website