Googlebot: Tips On How To Direct The Crawler
Matt Cutts offers a few tips on how to handle Googlebot. These tips will help ensure your site is indexed correctly.
- At a site or directory level, I recommend an .htaccess file to add password protection to part of a domain.
- At a site or directory level, I also recommend a robots.txt file.
- At a page level, use meta tags at the top of your html page.
- At a link level, you can add a nofollow tag on the granularity of individual links to prevent Googlebot from crawling individual links (you could also make the link redirect through a page that is forbidden by robots.txt).
- If the content has already been crawled, you can use our url removal tool.
There are also a few curious pieces of information in the comments, namely that Google has “gotten better” at crawling Javascript links. I’ve also noticed that Google has “gotten better” at crawling scripted cgi-links, which unfortunately can earn you a dup content penalty.
