How do you create a robots.txt file?

Please read What is a robots.txt file?

How to Create and Use a robots.txt file

1. Open notepad and save a file named robots.txt

2. Create records

The content of a robots.txt file consists of so-called “records”. A record contains the information for a special search engine. Each record consists of two fields: the user agent line and one or more disallow lines.

User-Agent: [Spider or Bot name]
Disallow: [Directory or File Name]

Here’s an example:

User-agent: googlebot
Disallow: /cgi-bin/

This robots.txt file would allow the “googlebot”, to retrieve every page from your site except for files from the “cgi-bin” directory. All files in the “cgi-bin” directory will be ignored by the googlebot.

3. Upload the robots.txt file to your server.

Make sure you upload the robots.txt file to the root directory (same location as your home page).


Examples for using the robots.txt file

1. Allow all spiders to index everything within your web site

User-agent: *
Disallow:

2. Allow no spiders to index any part of your web site

Just add a forward slash after the Disallow command.

User-agent: *
Disallow: /

3. Exclude a specific file from an individual Search Engine

If you have a specific file (filename.html) you don’t want indexed by Google that is placed in a specific directory (/directoryname/) that you also don’t want indexed by Google, you would add these lines to your robots.txt file:

User-Agent: Googlebot
Disallow: /directoryname/filename.html

The spider that Google sends out is called ‘Googlebot’.

4. Exclude a specific section of your site from all spiders and bots

You may be adding a blog to a new section of your site in a directory called ‘blog’ and don’t want it indexed before you are finished. You do not need to specify each robot that you wish to exclude, but simply use a wildcard character, ‘*’, to exclude all search
engine robots.

User-Agent: *
Disallow: /blog/

The forward slash at the beginning and end of the directory name, indicates you do not want any files in that directory indexed.

Leave a Reply to Robots.txt file - Common Formatting Mistakes to Avoid-- Web Design Discussion Cancel reply

*