3

Hullo,

Typically, if I type into my address bar, "oneofmysites.com/robots.txt", any browser will display the content of robots.txt. As you can see, this is pretty standard behaviour.

I have just one web server which does not. Instead, robots.txt redirects to the default web page (i.e. "thesiteinquestion.com/"). This notable difference (only one of seven sites) worries me.

Questions: Is this something to be concerned about? If so, what is the likely error that I am missing?

Notes:

  • This site is the only one with a separate service provider that I use.
  • CentOS release 6.10 (Final)
  • Webmin
  • robots.txt file permissions are 644
Parapluie
  • 165

3 Answers3

6

It depends on the server configuration, .txt files may not be allowed. It is possible that there is a rule somewhere in the config or some .htaccess that specifies if a url doesn't match a certain pattern (say .html, .php, .htm, etc) it then redirects the rest to the index page of the web root.

1

To add a bit of information, the web provider is not at all forced to respect the robots.txt standard, thus can make what ever he want with it and like Serge told it can be redirected anywhere.

yagmoth555
  • 17,495
1

A crawler should read robots.txt and follow its restrictions, but the web server cannot enforce this.

.htaccess (or the server confía file) can be used to keep out crawlers that don’t comply, if you know who they are.

WGroleau
  • 111