Category Archives: Search

You Know You’re Popular When

Today my personal site was pinged by Live Business Radio. As I do as a matter of course, I checked out the Live Business Radio web site and was disappointed to find that it’s nothing more than your average run of the mill site ridden with advertising, spam and buy this crap product now.

I get pinged by web sites regularly that don’t have anything to do with me and when I saw the site, I was about to abandon it immediately. Just before I did though, I scanned over the article and noticed that I’d been featured in a list of sites with a high Google Pagerank which offered links which are ‘followed’. It wouldn’t be a good filthy spammers site if they didn’t offer you software (for a fee) which you could use to spam take advantage of the followed links.

If you’re not quite sure what I’m referring to regarding the ‘followed’ remark, you can read about it on my personal site:

I should feel so honoured.

Google Alerts Getting Smarter

Google Alerts lets the user define keyword lists and phrases, which when found by Google while crawling and indexing web sites – will send a user a notification about that particular occurrence.

Historically, it always appeared as though the technology behind the alerting system was quite simple – literally matching the keyword and phrases that the user had nominated. Recently alerts have been generated that don’t strictly meet the keyword list and phrase requirements for a given page. It seems as though Google are using all of the additional meta data about a web site and the content to infer certain pieces of information.

As an example, I was recently notified about the my name being used within the post about extending the Nintendo Wii. If you view that particular item, you will not find the phrase “alistair lattimore” anywhere within it. Just to be sure, I have also ruled out my name within the RSS feeds generated by the site as well.

Putting the tinfoil hat on for a second, there is a raft of information that Google know about me already:

  • I have a Gmail account
  • The same Gmail account is associated to Google Analytics, Google Reader, Google Webmasters, Google Adwords and Google Adsense.
  • Within Google Webmasters, I monitor my personal blog and this site.
  • Within Google Analytics, I monitor my personal site and this site.
  • Within Google Reader, I subscribe to the feed of both sites.
  • Google are a domain registar, which means they could theoretically see that I purchased both domains.
  • I have linked in both directions between the two sites in the past.

When you start to see how all of that information is inter-linked, it becomes quite easy to see how Google can provide insightful results through their various services. Of course if you take the tin foil hat off and look at the more standard items such as web site content, my name is listed in the title on the front page and also on the about page. Those two bits of information might have been all it took, who knows.

If the technology behind that flexibility has a high level of accuracy in determining or inferring that information, it really is an excellent service. In the above example, if Google hadn’t of inferred my name as being associated to that document – I would have never found out about it via the alerting system. Granted in this particular example, it makes no difference as I know I wrote it – however for all other content on the internet it really lifts the products capability.

Changing The HTML Source Order Can Damage Search Engine Referrals

What follows is a quick digest of the impacts a web site owner might expect from reordering the HTML source on a web site, in particular what effects it can have on the search engine performance of a web site.

During the month of June, I decided to freshen up the layout of my personal site. When it was complete and without a whole lot of consideration – I published the new design onto my site. I sat back and admired my work for a little while until a few days later when I started to notice a decline in the number of natural search engine referrals. At the time, I didn’t bother to look into it and associated it to random flux on the internet. A few days later I checked the statistics again to confirm that it had recovered, only to find that not only had it not recovered but that it had dropped further.

After investigating the problem, I immediately realised that I had changed the HTML source order within my WordPress template. After the change, the primary content of the site was now placed at the bottom of the HTML document with a large amount of less important content above it. Worst yet, the information listed first within the HTML was identical for the entire site, as it was related to the sidebar which is largely static.

The table below shows the number of Google search engine referrals per month to the site. As you can see, the monthly referrals have been steadily increasing from the start of the year until they started to drop in June. Realising what had happened, I took the hit on the search engine referrals to see just how far it would drop down if it were left for a complete month. The ordering of the HTML was not restored until the beginning of August, as such July represents a complete month with the suboptimal ordering of the HTML.

  Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov
Referrals 9899 10395 13206 13281 13200 10942 8426 10580 12648 12084 12361

The change in the number of natural search engine referrals was caused by what was being listed within Google as the snippet for each page within the site. Ordinarily, the primary content is listed toward the top of the HTML document and as such, it is featured heavily within the snippet. After reorganising the order of the HTML, the snippet within the search engine results was displaying information about the current list of months in the sidebar of the site. As a by product of the snippet not being contextually relevant to the title of the page, the click through rate plummeted.

In an ideal world, a webmaster should be able to change their site layout as frequently as they choose without it impacting their search engine ranking and associated click through rate. In this particular case, changing the layout and unknowingly the HTML source order, had a significant knock on effect as I wasn’t controlling what was being displayed within the search engines specifically via a meta description tag. By not specifying it directly, I was relying on the search engines to automatically generate or choose one on my behalf and after changing the HTML ordering, the choice was suboptimal.

Measuring The Impact Of Page Titles

Following on from my simple experiment regarding the performance of different Adsense themes, it seemed like as good a time as any to start another experiment surrounding the impact of web page titles on search engine performance.

Out of the box, WordPress doesn’t ship with particularly search engine friendly page titles:

  • {blog title} » {blog archive} » {page title}

The biggest problem with the default title format from a search engine optimisation point of view, is that it contains relatively useless information in the title, positioned in the highest visibility location. The useless information I’m referring to, is of course the blog title and that a given page is within your blog archive. Some people might argue that your blog title is very important and in some cases that is the case, however in my opinion that isn’t the norm. As for the fact that a web page is within your archive, that has little significance, as publishing content online is essentially places it into an archive immediately – its called the internet.

To move the highest importance keywords and phrases into the highest visibility location, I have opted to place the page title at the start of the title tag. Since I don’t think users care about a web page being archived, I’ve also opted to drop that from the page title too. These changes have resulted in the page titles that you’re seeing currently, which take the form:

  • {page title} | {blog title}

Given that #if debug is taking a very small amount of traffic currently, any changes to the site at the moment are highly visible within the web statistics. I’m hoping that with the changes to the page title format, the search engine referrals will increase by a figure of over 10%. It is a modest figure, however given that I have so little content at the moment – it is something that I hope is attainable.

Stopping Search Engines Indexing Website Maintenance Pages

There is no schedule for when a search engine will or won’t turn up on your web sites door step and starting indexing it; what happens when they turn up unannounced during scheduled maintenance?

Under normal conditions, search engine spiders will notice the difference between what they crawled last time and will include your changes into their index. Of course, you really don’t want your ‘we are currently performing scheduled maintenance and expect to be back in 1 hour’ message showing up in search engine results when your users enter an appropriate query.

To stop search engines indexing your site while it is in maintenance mode, there are two simple solutions available:

  • HTTP 404 response code
    Return a HTTP 404 (Not Found) response code for each URL on your site effected by the maintenance outage. Search engines don’t immediately remove your web pages from their index because they cannot access it on a given request and instead try again later. Only after repeatedly failing to retrieve the document will they mark that particular page as being non-existent and remove it from their index.
  • HTTP 503 response code
    Return a HTTP 503 (Service Unavailable) response code for each URL on your site effected by your downtime. When a search engine spider encounters a 503 response code, it signals to the bots to come back later. Like the 404 error code, search engines won’t immediately remove the URLs from the index because your site is experiencing an outage.

What you should avoid doing during the scheduled maintenance or outage:

  • HTTP 200 (Okay)
    Returning a HTTP 200 response code, which is the normal response code when everything is working as expected. It is quite common to see error handlers like a 404 or 503 mistakenly return a 200. In this example, Google might index the content on your error page against each of those unavailable URLs which will impact your rankings and visits.
  • HTTP 301 (Permanent Redirect)
    Returning a HTTP 301 on each of the unavailable URLs to some other URL. For example, maybe a particular directory on the site is offline and you redirect those URLs to the home page. Like the 200 example above, this might cause Google to move all of those URLs with the 301 response code and replace them with the home page, effectively deleting those URLs from the site.

Each of these options are simple to confirm they are implemented properly today and to fix the configuration if they aren’t to avoid unnecessary impacts from scheduled maintenance or unplanned outages.

WordPress Drop Technorati For Incoming Links

WordPress has a feature in it which shows activity surrounding your particular blog, named “Incoming Links”. For a long time, WordPress has been using the services of blog search engine and aggregator Technorati to deliver this feature. Using Technorati was an excellent decision for quite some time, especially when blogging was still relatively new and Technorati where blazing their own trail in that space. It made even more sense when Automattic released Pingomatic, as virtually all blogging platforms sent activity notifications to that and Technorati subscribed to that stream of data.

Things started to change and the usefulness of Technorati started to fade as the big guns entered into the blog search space, namingly Google. Google Blog Search was a great service on its own, using the incredible infrastructure behind Google to keep their blog search index fresh. Not being content with great, Google set out to make their Google Blog Search index exceptionally fresh as they started accepting ping notifications. Of course, as soon as that happened – Pingomatic started sending notifications into Google, which has yielded an index which is minty fresh – usually showing only minutes of delay.

With the recent release of WordPress 2.3, the WordPress team have now switched from Technorati to Google Blog Search for their “Incoming Links” feature. This single link change could have a fairly profound impact on Technorati, as with literally hundreds of thousands of blogs running WordPress – they were getting traffic for free. The lack of the link from WordPress, coupled with the superior fire power of Google and tongues have to be wagging about the future of blog search engine Technorati.

Internet Scale

After launching ifdebug on the 3rd November, it’s only taken Googlebot and Yahoo! Slurp an amazing four days to crawl and index the site. Many moons ago, people would report having their sites online for literally months before being crawled by search engines, let alone have the content showing up in their index.

In August, Matt Cutts pointed out that the Google index is becoming minty fresh. What used to take months back in the year 2000, is now happening in days and what was taking days in 2005 is now regularly happening in hours or minutes. While the majority of the world don’t care about this sort of stuff and it never even enters into their consciousness, I find this nothing short of a technical marvel.

All major search engines currently report that they index literally billions of objects reaching into the farthest corners of the internet. This is where the amazing aspect comes into effect, ifdebug is but one of hundreds of millions online and some how the major search engines manage to find the time to crawl and index it only a matter of days after it was created!

The fast crawl rate is surely due to the link from my personal blog pointing here, as it is already well indexed and receives constant attention from the major search engines on a daily basis. As for managing the on going freshness of the site, sitemaps and online services such as pingomatic must play a reasonably substantial role in helping to keep their indexes fresh.

I’ll be keeping an eye on the major search engines over the coming weeks and months to see how they are performing; I’ll report back with the finding if there are any worth mentioning.