Robots.txt is a file which should be stored in the root directory of every website. The main purpose of such file is to restrict access to some or all content on your website by search engine bots that crawl websites. A robots.txt file is not mandatory. In fact you only need to have one if you have content on your website that you do not want search engines to index.
WP White Security Security Tip: If you want to protect confidential data password protect it. Remember that robots.txt is available to all of your website visitors.
Create a robots.txt file for your website
The simplest form of robots.txt file contains two mandatory directives:
User-agent: In this directive you specify the name of the search engine robot (bot). If you want this rule to apply for all automated search engine robots, then you can use the following rule: User-Agent: *. If you want the rule to apply to something specific, such as the Google search engine, then you can use the following rule: User-Agent: Googlebot.
Disallow: In this directive, you specify the path of the directory or file you don’t want any search engine to crawl. For example to disallow search engine bots from crawling a sub directory called images, you can use the rule: Disallow: /images/ . If you want to disallow search engine bots from crawling a file hello.php, you use the rule: Disallow: /hello.php.
These 2 directives (user-agent and disallow) are considered as a single entry in the robots.txt file. One directive cannot be specified without the other. There are also no limits to how many rules you can specify in a robots.txt file. The same applies for disallow and user-agent rules.
Apart from the above two directives, there is also the Allow directive. The Allow directive is optional and is supported by the majority of search engines. The Allow directive is useful to instruct the search engine bots to still crawl a file, or a number of files (or sub directories) from a disallowed directory. As an example, if you disallow crawling of sub directory /images/ but still would like the search engine bot to spider the image file me.jpg from that sub directory, you can use the following rule in the robots.txt file; Allow /images/me.jpg.
Refer to the WPWhiteSecurity.com robots.txt file to get an idea how a typical robots.txt file should look like.
WP White Security.com Blogger Tip: If you checked the WPProhelp.com robots.txt file, most probably you noticed that we also specify the sitemap URL. This is to advise search engines about your sitemap.
Example robots.txt rules
All search engine bots can visit all files on your website:
Block entire website for all search engine bots:
Disallow all search engines to crawl cgi-bin, exec and admin directories:
Disallow all search engines to crawl a single file called me.jpg in sub directory pictures:
Disallow all search engines to crawl all php files on your website:
Now that you created your new robots.txt file for your website or blog, you can upload the file using FTP.