If a webmaster wishes to restrict the information on their site available to a Googlebot, or another well-behaved spider, they can do so with the appropriate directives in a robots.txt file, or by adding the meta tag
<meta name="Googlebot" content="nofollow" /> to the web page. Googlebot requests to Web servers are identifiable by a user-agent string containing "Googlebot" and a host address containing "googlebot.com".
A problem that webmasters have often noted with the Googlebot is that it takes up an enormous amount of bandwidth. This can cause websites to exceed their bandwidth limit and be taken down temporarily. This is especially troublesome for mirror sites which host many gigabytes of data. Google provides "Webmaster Tools" that allow website owners to throttle the crawl rate.
How often Googlebot will crawl a site depends on the crawl budget. Crawl budget is an estimation of how often a website is updated. A site's crawl budget is determined by how many incoming links it has and how frequently the site is updated.
- "Webmaster Tools".
- Exact Googlebot client info can be found in Google-cached copies of pages which display such data to visitors. For example, see
- "Googlebot makes POST requests via AJAX".
- "Google, the Jig is Up! Googlebot is actually a browser..."
- "Googlebot is Chrome".
- "Google - Webmasters". Google.com. Retrieved 2012-12-15.
- "What Crawl Budget Means for Googlebot". Official Google Webmaster Central Blog. Retrieved 2018-07-04.