Correct use of Robots.txt

Firstly, what is the Robots.txt File

Robot

This is a small file hidden in your root directory called Robots.txt which
is designed to give instructions to Search Engines Bots/Spiders – Bots being
short for Robots that visit your site. The file tells the Bots which files /
pages it may look at and spider and which ones it may not.

When the Search engines bots visit your site URL http://yourdomain.com/robots.txt.
it will FIRST look into your root directory looking for a Robots.txt file. If
it doesn’t find one it will go about your site freely if it finds one it will
seek instructions as to what files/pages of your website are accessible to spider
and index. The file tells the robot (spider) which files it may spider (download).
This system is called, The Robots Exclusion Standard.

The format for the robots.txt file is very special as it consists of records.
Each record consists of two fields: a User-agent line and one or more Disallow:
lines. The format is:
<Field> ":" <value>

The robots.txt file should be created in Unix line ender mode! Most good text
editors will have a Unix mode or your FTP client *should* do the conversion
for you. Do not attempt to use an HTML editor that does not specifically have
a text mode to create a robots.txt file.

Some websites do not want to be spidered and indexed and therefore it is
possible to instruct the bots to ignore pages or your URL all together.

It is of course vitally important that you know what you’re doing when your
using a Robots.txt file if you accidentally block the spiders from some or
all of your pages you will never be indexed by the Major Search Engines.
If in doubt don’t use one at all or seek professional help or contact
us
alternatively you
can visit www.robotstxt.org for further
information.

Posted by admin on Saturday, January 9th, 2010