How to block bad bots for WordPress sites?

Bad bots(crawlers,spiders) consume lots of resources(i.e., CPU, Ram) of machines thus should be blocked. There are several ways to do this for wordpress websites.

Use antibot wordpress plugin

It is the easiest way. You can set it up to block bad bots based on their ua(user agent) strings, ips, and referers.

Use .htaccess file

If you search google about how to block bots using .htaccess, you will find a lot of articles on this. However, simply copying the code and appending them to the end of your .htaccess under the root directory of your website definitely won’t work. This is because wordpress uses .htaccess for its own aim.  Take the following code for an example:

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ – [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]

# END WordPress
RewriteCond %{HTTP_USER_AGENT} ahrefs [NC]
RewriteRule .* – [F]


This code is intended to block ahrefs, which is a notorious bad bot. But if you use a ua simulator to visit a page on the website as ahrefs bot, you will find the page is still accessible(just a little ugly because the css files are actually blocked). This is because the page is redirected to /index.php and handled by wordpress. There is no chance to reach the blocking code. Moving the blocking code at the beginning of .htacess will produce the following error message:


You don’t have permission to access /ghg on this server.

Additionally, a 500 Internal Server Error error was encountered while trying to use an ErrorDocument to handle the request.

This is not the user-specified 403 document. By checking the httpd error log file:, you will find the following error message:

[Tue Oct 06 09:12:36 2015] [error] [client] Request exceeded the limit of 10 internal redirects due to probable configuration error. Use ‘LimitInternalRecursion’ to increase the limit if necessary. Use ‘LogLevel debug’ to get a backtrace.

This is caused by RewriteRule .* – [F], which redirects any url to your 403 document including the 403 document itself, thus an infinite loop is formed. To cure this problem, just exclude the error document from the rewriterule as follows:

RewriteRule !^_errorpages – [F]

The _errorpages directory under the root is used to put in error documents such as 403.html, 410.html, etc. You can specify the error documents in Apache VirtualHost section.

All look fine but the home url:, which is redirected to the Apache HTTP server test page(/var/www/error/noindex.html), not your 403 forbidden page. This is caused by the settings in /etc/httpd/conf.d/welcome.conf(ErrorDocument 403 /error/noindex.html). For the detailed reason about this , please read this post and this post. By commenting all the lines in welcome.conf, this problem is resolved.

To make things perfect, you can empty the 403.html to reduce the bandwidth consumption to the minimum.

block bad bots globally in httpd.conf

add the following lines in httpd.conf to block ahrefs for all websites on your machines

SetEnvIfNoCase User-Agent .*ahrefs.* bad_bot
<Location “/”>
Order Allow,Deny
Allow from all
Deny from env=bad_bot

Note that you may find articles using <Directory> but I never get it to work.

Posted in tips of hosting