Robots.txt file – Common Formatting Mistakes to Avoid

To read my other posts about the robots.txt file, please visit:

What is robots.txt file?

How do you create a robots.txt file?

If you incorrectly format your robots.txt file, your web site files may not get indexed by the search engines.


To prevent this from happening make sure you follow these 7 points:

1. Don’t use comments in the robots.txt file

Although comments are allowed in a robots.txt file, they might confuse some search engine spiders.

“Disallow: support # Don’t index the support directory” might be
misinterepreted as “Disallow: support#Don’t index the support directory”.

2. Don’t use white space at the beginning of a line.

For example, don’t write

placeholder User-agent: *
place Disallow: /support

but

User-agent: *
Disallow: /support

3. Don’t change the order of the commands.

If your robots.txt file should work, don’t mix it up. Don’t write

Disallow: /support
User-agent: *

but

User-agent: *
Disallow: /support

4. Don’t use more than one directory in a Disallow line.

Do not use the following

User-agent: *
Disallow: /support /cgi-bin/ /images/

Search engine spiders cannot understand that format. The correct syntax for this is

User-agent: *
Disallow: /support
Disallow: /cgi-bin/
Disallow: /images/

5. Be sure to use the right case.

The file names on your server are case sensitve. If the name of your directory is “Support”, don’t write “support” in the robots.txt file.

6. Don’t list all files.

If you want a search engine spider to ignore all files in a special directory, you don’t have to list all files. For example:

User-agent: *
Disallow: /support/orders.html
Disallow: /support/technical.html
Disallow: /support/helpdesk.html
Disallow: /support/index.html

You can replace this with

User-agent: *
Disallow: /support

7. There is no “Allow” command

Don’t use an “Allow” command in your robots.txt file. Only mention files and directories that
you don’t want to be indexed. All other files will be indexed automatically if they are linked on your site.

Conclusion

Your web site should have a proper robots.txt file if you want to gain good rankings on the search engines. Only if search engines know what to do with your pages, they can give you a good ranking.

Resources

http://www.free-seo-news.com/all-about-robots-txt.htm
http://www.outfront.net/tutorials_02/adv_tech/robots.htm
http://www.robotstxt.org/

Robots.txt syntax checker
http://www.sxw.org.uk/computing/robots/check.html

Speak Your Mind

*