How to block Semalt.com from crawling your Wordpress site

How to block Semalt.com from crawling your Wordpress site

I have noticed in all of my Wordpress sites over the last few months that a new referrer *.semalt.com keeps crawling my site. Usually I don't bother with most crawlers as they are indexing your site for the benefit of the internet right? Wrong!

It appears Semalt is indexing sites to collect backlinks and keywords to deliver more powerful sneaky spam techniques to use against your sites.  Unlike most crawlers on the internet Selmalt does not follow your robots.txt file. So even if you block it via your robots.txt file it will circumvent this rule and continue crawling your site. Well shit!

Clicky realtime web analytics announced on their Twitter they have now blocked Semalt.  Wordpress.com is also

OK enough with the blabbering. So how do I block these bastards from crawling my site?

How to block Semalt.com from crawling your NGINX Wordpress site

      1. You can try to Opt out of the samalt index but I am doubting this works (Bottom of the page) - Opt Out Semalt
      2. Navigate to your NGINX sites-enabled directory.  Usually in /etc/nginx/sites-enabled
      3. Open your domains config file in this directory
      4. Add the below code to your domains config file.  If you want to add additional crawlers to this list separate the crawlers with a pipe ¦



if ($httpuseragent ~ "semalt.com") {

return 403;

break;

}

  1. Once you have added the above code to your domain config file run a nginx -t (Tests your config) to ensure that you made no typos in the file and that NGINX is happy with the new config
  2. Once NGINX reports that everything is OK reload NGINX config files:

$service nginx reload
If your running an Apache Wordpress website check out  the Wordpress Support Forum.

UPDATE:

To check the status of your site within the Selmalt crawler use this url but replace my URL with yours:

http://semalt.semalt.com/crawler.php?u=http://brianchristner.io

Related Article