The Kloser
BuSo Pro
- Joined
- Jan 3, 2016
- Messages
- 115
- Likes
- 67
- Degree
- 0
Is there a quick way to use robots.txt file or anything else to block ahrefs, moz, majestic and bunch of KW rank checker tools from crawling a site?
Follow along with the video below to see how to install our site as a web app on your home screen.
Note: This feature may not be available in some browsers.
BrowserMatchNoCase BOTNAMEHERE bad_bot
BrowserMatchNoCase BOTNAMEHERE bad_bot
Order Deny,Allow
Deny from env=bad_bot
BrowserMatchNoCase BOTNAMEHERE bad_bot
BrowserMatchNoCase BOTNAMEHERE bad_bot
Order Deny,Allow
Deny from env=bad_bot
Use Robots.txt file with this - http://pastebin.com/dnkEeeEk
and
edit your htaccess file with this - http://pastebin.com/wGwHLUcZ
Code:BrowserMatchNoCase BOTNAMEHERE bad_bot BrowserMatchNoCase BOTNAMEHERE bad_bot Order Deny,Allow Deny from env=bad_bot
SetEnvIfNoCase User-Agent "BOTNAMEHERE" block_bot
SetEnvIfNoCase User-Agent "BOTNAMEHERE" block_bot
Order Allow,Deny
Allow from all
Deny from env=block_bot
That's why I stated "SOMEWHAT"@CCarter Not all link crawlers obey robots.txt, I've had cases where ahrefs crawled a site even though there was a specific rule that disallowed it.
Ahrefs, SEMRush, and the big guys all respect the robots.txt (SOMEWHAT)
I know, I just wanted to sharpen the fact that there are cases where they don't.That's why I stated "SOMEWHAT"
Glad you brought that up, it just shows how ineffective it would be to rely on just the robots file as these crawlers often disregard it.I was in a slack group where I confronted the Ahrefs CEO about the fact they were using a "East European" country's ISP to piggy back into a Private Blog Network which was specifically disabling Ahrefs in the robots.txt AND within .htaccess file. They were cloaking their user-agent and location, and indexing these sites which were specifically hidden FROM THEM. The CEO was denying it, but evidence was evidence, hard to disprove when you are seeing the logs, the blocking files, and then Ahrefs indexing these pages which clearly was against the robots.txt rules.