2

Every day my access-log looks kind of this:

66.249.78.140 - - [21/Oct/2013:14:37:00 +0200] "GET /robots.txt HTTP/1.1" 200 112 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.78.140 - - [21/Oct/2013:14:37:01 +0200] "GET /robots.txt HTTP/1.1" 200 112 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.78.140 - - [21/Oct/2013:14:37:01 +0200] "GET /vuqffxiyupdh.html HTTP/1.1" 404 1189 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

or this

66.249.78.140 - - [20/Oct/2013:09:25:29 +0200] "GET /robots.txt HTTP/1.1" 200 112 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.75.62 - - [20/Oct/2013:09:25:30 +0200] "GET /robots.txt HTTP/1.1" 200 112 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.78.140 - - [20/Oct/2013:09:25:30 +0200] "GET /zjtrtxnsh.html HTTP/1.1" 404 1186 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

The bot calls the robots.txt twice and after that tries to access a file (zjtrtxnsh.html, vuqffxiyupdh.html, ...) which cannot exist and must return a 404 error. The same procedure every day, just the unexisting html-filename changes.

The content of my robots.txt:

User-agent: *
Disallow: /backend
Sitemap: http://mysitesname.de/sitemap.xml

The sitemap.xml is readable and valid, so there seems to be no reason why the bot should want to force a 404-error.
How should I interpret this behaviour? Does it point to a mistake I've done or should I ignore it?


UPDATE
@malware I scanned my website with several online-tools, nothing was found.
I have none of the standard-apps on the server like wordpress or phpmyadmin.
I receive a logwatch every day and there was no unauthorized ssh-access or something like that.
I have fail2ban set up.
I have restricted ssh-access to publickeys, no root-login allowed.
There was none of the sudo-commands which logwatch reported which I could not recognize as things that I've done that day.
There is no file in my web-directory which is new or not created by me or looks kinda weired (okay I cannot guarantee that 100%, but all looks okay).
I've done a full clamscan on the server without any result.
The softwarepackages are up-to-date.

What else can I do?

1 Answers1

2

In short: If my memory servers me correct. Its to check the 404 pages on your site.

Longer answer: People create custom 404 pages and then forget to change the status code of the page. In the end you will return custom 404 pages with header status as 200 ok when google bot tries to access an invalid url. Now the bot has to make a decision. In order to aid it in this decision making it tries to hit your server with a randomly generated url which has a high probability of not being on your site and check what the response for the site is when requested for a not found page.

As I said I am not 100% sure about it.

kasperd
  • 31,086