ROBOTS.TXT FILE

Home Study Material ROBOTS.TXT FILE
Side Bar
Digital Marketing
sidedigital-img
ROBOTS.TXT FILE

The robots.txt protocol, also known as Robots Exclusion Protocol, is a file which informs the search engine not to crawl or index the file mentioned. This file is used to restrict the indexing of admin panel and other private information placed on the site.

The robots.txt file can be written in any way according to the requirement:-

1) Here * specifies all the robots. The above syntax allows the bots to crawl all the files.
User-agent: *
Disallow:

2) This sitemap restricts all the robots from indexing the site.
User-agent: *
Disallow: /

3) The example below restricts the bots from indexing the three directories mentioned.
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/

4) This example restricts the bots from only indexing only the directory mentioned.
User-agent: *
Disallow: /directory/file.html

5) We can also select the bot we do not to index our site.
User-agent:BadBot #replace 'BadBot' with the actual user-agent of the bot
Disallow: /

6) We can also restrict a specific bot from indexing specific files.
User-agent:BadBot# replace 'BadBot' with the actual user-agent of the bot
Disallow: /junk/

Name of some of the bots of Google are- googlebot ( all services), googlebot-news ( bot for news), baiduspider, bingbot, yandexbot, etc.