Friday, January 17, 2014

What is the Robots.txt?

The robots.txt file, a product of the Robots Exclusion Protocol, is a file hold on a website's root directory (e.g., www.google.com/robots.txt). The robots.txt file offers directions to automated internet crawlers visiting your website, together with search spiders.

By exploitation robots.txt, webmasters will indicate to search engines that areas of a website they'd like to prohibit bots from crawling additionally as indicate the locations of sitemap files and crawl-delay parameters. You'll be able to browse more details regarding this on the robots. text knowledge Center page.

The following commands are available:
Disallow

It Prevents compliant robots from accessing specific pages or folders.
Sitemap

It Indicates the location of a website’s sitemap or sitemaps.
Crawl Delay

Indicates the speed (in milliseconds) at that a mechanism will crawl a server.
      
    associate Example of Robots.txt
#Robots.txt www.seo-freelancer.com/robots.txt
User-agent: *
Disallow:

# Don’t permit spambot to crawl any pages
User-agent: spambot
disallow: /

sitemap:www.seo-freelancer.com/sitemap.xml
  
  



Warning: Not all internet robots follow robots.txt. People with unhealthy intentions (i.e. E-mail address scrapers) build boats that don’t follow this protocol and in extreme cases will use it to identify the location of private information. For this reason, it is suggested that the location of administrative sections and other personal sections of publicly accessible websites not be included within the robots. txt. Instead, these pages will utilize the meta robots tag (discussed next) to stay the major search engines from classification their high risk content.




Sunday, August 4

0 comments:

Post a Comment