Tag Archives: spider

Stopping Search Engines Indexing Website Maintenance Pages

There is no schedule for when a search engine will or won’t turn up on your web sites door step and starting indexing it; what happens when they turn up unannounced during scheduled maintenance? Under normal conditions, the search engine spiders will notice the difference between what they crawled last time and will include your changes into their index. Of course, you really don’t want your ‘we are currently performing scheduled maintenance and expect to be back in 1 hour’ message showing up in search engine results when your users enter an appropriate query.

To stop search engines indexing your site while it is in maintenance mode, there are two simple solutions available:

HTTP 404 response code
Search engines don’t immediately remove your web pages from their index because they cannot access it on a given request; just like they won’t remove it if your site is returning an internal server error. Instead, they will take notice that they attempted to crawl a given web page at time particular time and try again later. Only after repeatedly failing to retrieve the document will they mark that particular page as being non-existent and remove it from their index.
META no-index tag
When a search engine spider encounters a no-index meta tag, they should immediately abort indexing that particular page. After the scheduled maintenance is over and the spiders return, the no-index flag is no longer present – so the spiders will proceed with the crawl as normal.

Next time your site is under maintenance, make sure you’ve implemented one of these point or you could be very surprised what’ll show up in the search engine results the following day!

Internet Scale

After launching ifdebug on the 3rd November, it’s only taken Googlebot and Yahoo! Slurp an amazing four days to crawl and index the site. Many moons ago, people would report having their sites online for literally months before being crawled by search engines, let alone have the content showing up in their index.

In August, Matt Cutts pointed out that the Google index is becoming minty fresh. What used to take months back in the year 2000, is now happening in days and what was taking days in 2005 is now regularly happening in hours or minutes. While the majority of the world don’t care about this sort of stuff and it never even enters into their consciousness, I find this nothing short of a technical marvel.

All major search engines currently report that they index literally billions of objects reaching into the farthest corners of the internet. This is where the amazing aspect comes into effect, ifdebug is but one of hundreds of millions online and some how the major search engines manage to find the time to crawl and index it only a matter of days after it was created!

The fast crawl rate is surely due to the link from my personal blog pointing here, as it is already well indexed and receives constant attention from the major search engines on a daily basis. As for managing the on going freshness of the site, sitemaps and online services such as pingomatic must play a reasonably substantial role in helping to keep their indexes fresh.

I’ll be keeping an eye on the major search engines over the coming weeks and months to see how they are performing; I’ll report back with the finding if there are any worth mentioning.