3

Site are crawled by anonymous bot hosted on amazon ec2. This robot doesn't respect robots.txt and creates high load on web server so I added check if reverse IP for request ends with "amazonaws.com" then server returns 403 page immediately.

This solved problem but may be it can cause other problems? ec2 may be used for some "good" bots and this will cause access problem for theirs. Can you give example of such problems?

valodzka
  • 187

2 Answers2

5

Amazon EC2 is a hosting platform. They don't directly control what people host. If you block the whole *.amazonaws.com domain then you will stop access to any hosted service using EC2. Which is quite a lot these days.

1

Check out this similar question: it shows how to block by user agent directly in the .htaccess file. This is good for robots that fail to follow your robots.txt rule...

Blocking by user-agent string in httpd.conf not effective

And you would put that in either the httpd.conf file, OR a .htaccess.

Good luck.

U4iK_HaZe
  • 633