Question 1

What is the official user-agent string for magpie-crawler?

Accepted Answer

The official user-agent string for magpie-crawler is: magpie-crawler. This is the exact string you must use in robots.txt, Nginx, Apache, or Cloudflare firewall rules to target this bot. User-agent matching in robots.txt is case-insensitive, but the string must be spelled correctly. You can verify that a request genuinely comes from magpie-crawler by performing a reverse-DNS lookup on the source IP — legitimate bots resolve back to their operator's domain.

Question 2

Is magpie-crawler safe?

Accepted Answer

⚠️ **Use Caution with magpie-crawler.** While operated by Brandwatch for stated legitimate purposes, this bot collects your content for uses you may not want to support (commercial data aggregation). It generally respects robots.txt but may revisit pages more frequently than needed. Evaluate your content strategy: if you're concerned about your data being used for these purposes, block it.

Question 3

Will blocking magpie-crawler hurt my SEO?

Accepted Answer

✅ **Minimal Impact** — Blocking magpie-crawler has no meaningful effect on your search engine rankings or organic traffic.

Question 4

How do I block magpie-crawler in robots.txt?

Accepted Answer

Add the following lines to your /robots.txt file:
User-agent: magpie-crawler
Disallow: /
This instructs magpie-crawler not to crawl any path on your site. The Disallow: / directive covers the entire domain including subfolders. To only block specific sections, replace / with the path (e.g., Disallow: /blog/). Note: robots.txt is publicly readable — any bot or human can inspect it at yourdomain.com/robots.txt.

Question 5

Does magpie-crawler respect robots.txt?

Accepted Answer

⚠️ magpie-crawler may not always respect robots.txt. For guaranteed blocking, combine robots.txt with server-level rules (Nginx if/return 403, Apache SetEnvIf, or Cloudflare WAF).

Question 6

How do I verify if magpie-crawler is crawling my site?

Accepted Answer

Search your web server access logs for the string magpie-crawler (case-insensitive grep: grep -i "magpie-crawler" /var/log/nginx/access.log). You can also check Google Search Console → Coverage → Crawl Stats for Googlebot variants. For magpie-crawler specifically, filter by user-agent in your log analysis tool (GoAccess, AWStats, etc.).

magpie-crawler

Quick Facts

What is magpie-crawler?

What happens if you block magpie-crawler?

How to block magpie-crawler with robots.txt

Is magpie-crawler safe to allow?

What does magpie-crawler do?

Frequently Asked Questions

Related Bots

Is magpie-crawler blocked on your site?