By exploitation robots.txt, webmasters will indicate to search engines that areas of a website they'd like to prohibit bots from crawling additionally as indicate the locations of sitemap files and crawl-delay parameters. You'll be able to browse more details regarding this on the robots. text knowledge Center page.
The following commands are available:
Disallow
It Prevents compliant robots from accessing specific pages or folders.
Sitemap
It Indicates the location of a website’s sitemap or sitemaps.
Crawl Delay
Indicates the speed (in milliseconds) at that a mechanism will crawl a server.
associate Example of Robots.txt
#Robots.txt www.seo-freelancer.com/robots.txt
User-agent: *
Disallow:
# Don’t permit spambot to crawl any pages
User-agent: spambot
disallow: /
sitemap:www.seo-freelancer.com/sitemap.xml
Warning: Not all internet robots follow robots.txt. People with unhealthy intentions (i.e. E-mail address scrapers) build boats that don’t follow this protocol and in extreme cases will use it to identify the location of private information. For this reason, it is suggested that the location of administrative sections and other personal sections of publicly accessible websites not be included within the robots. txt. Instead, these pages will utilize the meta robots tag (discussed next) to stay the major search engines from classification their high risk content.
0 comments:
Post a Comment