Help! My Site Is Not Being Crawled By Google

I was chatting with John, who was telling me about an ex-client of his.

The client had a site re-design, and had done a lot of link building, but wasn’t ranking anywhere. His site wasn’t even listed in Google. Was the site banned? On inspection, John found the designer had put a robots.txt file on the site blocking all robots. Simple mistake, but difficult to spot if you don’t know what you’re looking for.

If your site is not being crawled by Google, or other search engines, here’s a simple checklist to follow:

  • Robots.txt may exclude spiders: check to see if you have a file called robots.txt. This can appear in any directory, but is usually found in the root. Either remove the robots.txt, or make sure it conforms to the following robots.txt standard.
  • No inbound links: Search engines crawl the web, following links from page to page. If you don’t have a link pointing to your page from a page that is already included in Google, it is less likely that Google will find your site. Submit your site to a directory, ask a friend for a link, or beg, borrow or buy. It pays to get links from reliable sources, as opposed to link farms, which Google may discount.
  • Site may have technical issues: The server may be setup incorrectly, your site may contain code that makes crawling difficult, etc. Luckily, Google offers a report tool in the form of Webmaster Central. Use Sitemaps and the Site Status Wizard to help determine potential problems.
  • No deep crawl – Google crawls the site, but doesn’t find many pages. Check your linking structures to ensure that important pages are well linked. You may wish to use a pyramid site structure to help organize your site thematically. Remove, or alter, duplicate content. Increase the quality of inbound linking, and avoid poor quality outbound linking. See Matt Cutts comments roughly 3/4’s of the way down.
  • Flash, Scripting – Google can have problems following animated and coded links. It is safest to provide an all HTML version of your site if using Flash. Google is getting a lot better at following scripted links, however be sure to check with Webmaster Central if problems persist.
  • Site Ban – It’s unlikely, but possible, that your site may have been banned. Check with Webmaster Central, and if a ban is in place, try submitting a re-inclusion request. Here’s the definitive guide on submitting a reinclusion request, straight from the horses mouth. Essentially, Google want to know that the problem has been corrected, and it won’t happen again.

Resources:

  1. SophieWSophieW02-18-2007

    The same thing happened to a client of mine. The developer (not me) put a Meta Robots index, no follow on the homepage. Not long after I fixed it hundreds of pages were crawled and indexed. Sometimes small things make a huge difference eh.

  2. Peter Da VanzoPeter Da Vanzo02-18-2007

    Easy to miss, certainly.

  3. TechDukeTechDuke05-27-2007

    When I checked the Source code of one of the Blogspot blog, it had noindex tag by default by Blogspot, but the search engines indexed all the pages of that blog. I’m confused !

Leave a Reply