Archive for the ‘Search’ Category

Changing Temporary (302) To Permanent (301) Redirects

Monday, May 26th, 2008

It’s common place to register multiple variations of a domain to protect the brand or product that the domain is related to. At some point, a web master must choose what he or she is going to do with the variations, the normal choices are:

  • Do nothing, simply owning them is sufficient
  • Set them up, alias them so the site content is accessible via any of the variations
  • Set them up and redirect the variations to the primary domain

This post is going to discuss the third option, as I have recently seen what I’d consider strange results in that space.

Setting The Scene

Imagine you sell Product A and you have a web site at http://producta.com. For three years http://producta.com has been used as the main web site, however in an exercise for brand consistency - you opt to move the web site to http://brandproducta.com.

The change of domain is handled using a temporary redirect and is successful. Soon after the move, http://producta.com is no longer visible in the search engines and has been replaced with http://brandproducta.com.

Weirdness

As a clean up exercise, I recently went through and updated the redirects on the domain variations (including http://producta.com) to use permanent (301) redirects. At the time, I didn’t think I’d see any changes in the search engine result pages, as http://producta.com hasn’t been in use for quite some time and all that was changing was a temporary (302) redirect into a permanent (301) redirect.

What has happened is that a brand+producta search term which would have returned http://brandproducta.com as the first listing, is now sharing that space with http://producta.com. Since that domain hasn’t been in use for such a long time, Google are using the results from DMOZ for the title and snippet.

Explanation

I’ve read through the information that Matt Cutts provided when he discussed 302 redirects back in January 2006. There is a lot of good information on that page and also the previously linked article about URL canonicalisation - however nothing that I felt described what I have outlined above.

What I think has happened is that the temporariness of the 302 redirect has kicked in. Google have been seeing the 302 redirect from http://producta.com into http://brandproducta.com for quite some time and have been checking it periodically since it was temporary. When something changed (hence temporary) - Google kicked back into gear and displayed the results from http://producta.com.

Since it is now showing a 301 permanently moved redirect, I suspect that within a short amount of time Google will remove the listing for http://producta.com and it’ll be replaced by http://brandproducta.com.

I’d love to hear from someone if they have a more comprehensive answer on the results I’ve seen.

Non-English Languages & Whacky Domain Names

Friday, May 2nd, 2008

Asian language glyphs from the web sites of James Holderness While doing a little research for an upcoming article tonight, I revisited the web site of James Holderness. If the name looks familiar, it’s because I linked to him in January regarding detecting duplicate items within RSS feeds.

When I stumbled onto his site, I couldn’t believe the domain that he was using:

  • http://www.xn--8ws00zhy3a.com

as it seemed completely unmanageable for a normal person. At the time, it seemed so unmanageable for a normal person that I thought James must have been participating in some obscure SEO challenge; today I realise that isn’t the case at all.

The image shown above is displayed on James site beside his name. It turns out that those three glyphs some how translate into the obscure domain listed earlier as can be seen by the following screenshot from Google Search:

Non-English written characters or glyphs displayed within a Google Search result as the domain name

For those that are interested, Yahoo!, MSN and Live search all showed the English translation of the foreign language in the domain name and not the glyph based version - though were more than happy to display the glyphs within the title of the web site.

Does anyone know how a glyph is translated into the standard English alphabet and more so, what within the domain name delineates one glyph from the next?

You Know You’re Popular When

Saturday, December 29th, 2007

Today my personal site was pinged by Live Business Radio. As I do as a matter of course, I checked out the Live Business Radio web site and was disappointed to find that it’s nothing more than your average run of the mill site ridden with advertising, spam and buy this crap product now.

I get pinged by web sites regularly that don’t have anything to do with me and when I saw the site, I was about to abandon it immediately. Just before I did though, I scanned over the article and noticed that I’d been featured in a list of sites with a high Google Pagerank which offered links which are ‘followed’. It wouldn’t be a good filthy spammers site if they didn’t offer you software (for a fee) which you could use to spam take advantage of the followed links.

If you’re not quite sure what I’m referring to regarding the ‘followed’ remark, you can read about it on my personal site:

I should feel so honoured.

Google Alerts Getting Smarter

Monday, December 24th, 2007

Google Alerts lets the user define keyword lists and phrases, which when found by Google while crawling and indexing web sites - will send a user a notification about that particular occurrence.

Historically, it always appeared as though the technology behind the alerting system was quite simple - literally matching the keyword and phrases that the user had nominated. Recently alerts have been generated that don’t strictly meet the keyword list and phrase requirements for a given page. It seems as though Google are using all of the additional meta data about a web site and the content to infer certain pieces of information.

As an example, I was recently notified about the my name being used within the post about extending the Nintendo Wii. If you view that particular item, you will not find the phrase “alistair lattimore” anywhere within it. Just to be sure, I have also ruled out my name within the RSS feeds generated by the site as well.

Putting the tinfoil hat on for a second, there is a raft of information that Google know about me already:

  • I have a Gmail account
  • The same Gmail account is associated to Google Analytics, Google Reader, Google Webmasters, Google Adwords and Google Adsense.
  • Within Google Webmasters, I monitor my personal blog and this site.
  • Within Google Analytics, I monitor my personal site and this site.
  • Within Google Reader, I subscribe to the feed of both sites.
  • Google are a domain registar, which means they could theoretically see that I purchased both domains.
  • I have linked in both directions between the two sites in the past.

When you start to see how all of that information is inter-linked, it becomes quite easy to see how Google can provide insightful results through their various services. Of course if you take the tin foil hat off and look at the more standard items such as web site content, my name is listed in the title on the front page and also on the about page. Those two bits of information might have been all it took, who knows.

If the technology behind that flexibility has a high level of accuracy in determining or inferring that information, it really is an excellent service. In the above example, if Google hadn’t of inferred my name as being associated to that document - I would have never found out about it via the alerting system. Granted in this particular example, it makes no difference as I know I wrote it - however for all other content on the internet it really lifts the products capability.

Changing The HTML Source Order Can Damage Search Engine Referrals

Saturday, December 8th, 2007

What follows is a quick digest of the impacts a web site owner might expect from reordering the HTML source on a web site, in particular what effects it can have on the search engine performance of a web site.

During the month of June, I decided to freshen up the layout of my personal site. When it was complete and without a whole lot of consideration - I published the new design onto my site. I sat back and admired my work for a little while until a few days later when I started to notice a decline in the number of natural search engine referrals. At the time, I didn’t bother to look into it and associated it to random flux on the internet. A few days later I checked the statistics again to confirm that it had recovered, only to find that not only had it not recovered but that it had dropped further.

After investigating the problem, I immediately realised that I had changed the HTML source order within my WordPress template. After the change, the primary content of the site was now placed at the bottom of the HTML document with a large amount of less important content above it. Worst yet, the information listed first within the HTML was identical for the entire site, as it was related to the sidebar which is largely static.

The table below shows the number of Google search engine referrals per month to the site. As you can see, the monthly referrals have been steadily increasing from the start of the year until they started to drop in June. Realising what had happened, I took the hit on the search engine referrals to see just how far it would drop down if it were left for a complete month. The ordering of the HTML was not restored until the beginning of August, as such July represents a complete month with the suboptimal ordering of the HTML.

  Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov
Referrals 9899 10395 13206 13281 13200 10942 8426 10580 12648 12084 12361

The change in the number of natural search engine referrals was caused by what was being listed within Google as the snippet for each page within the site. Ordinarily, the primary content is listed toward the top of the HTML document and as such, it is featured heavily within the snippet. After reorganising the order of the HTML, the snippet within the search engine results was displaying information about the current list of months in the sidebar of the site. As a by product of the snippet not being contextually relevant to the title of the page, the click through rate plummeted.

In an ideal world, a webmaster should be able to change their site layout as frequently as they choose without it impacting their search engine ranking and associated click through rate. In this particular case, changing the layout and unknowingly the HTML source order, had a significant knock on effect as I wasn’t controlling what was being displayed within the search engines specifically via a meta description tag. By not specifying it directly, I was relying on the search engines to automatically generate or choose one on my behalf and after changing the HTML ordering, the choice was suboptimal.

Measuring The Impact Of Page Titles

Monday, December 3rd, 2007

Following on from my simple experiment regarding the performance of different Adsense themes, it seemed like as good a time as any to start another experiment surrounding the impact of web page titles on search engine performance.

Out of the box, WordPress doesn’t ship with particularly search engine friendly page titles:

  • {blog title} » {blog archive} » {page title}

The biggest problem with the default title format from a search engine optimisation point of view, is that it contains relatively useless information in the title, positioned in the highest visibility location. The useless information I’m referring to, is of course the blog title and that a given page is within your blog archive. Some people might argue that your blog title is very important and in some cases that is the case, however in my opinion that isn’t the norm. As for the fact that a web page is within your archive, that has little significance, as publishing content online is essentially places it into an archive immediately - its called the internet.

To move the highest importance keywords and phrases into the highest visibility location, I have opted to place the page title at the start of the title tag. Since I don’t think users care about a web page being archived, I’ve also opted to drop that from the page title too. These changes have resulted in the page titles that you’re seeing currently, which take the form:

  • {page title} | {blog title}

Given that #if debug is taking a very small amount of traffic currently, any changes to the site at the moment are highly visible within the web statistics. I’m hoping that with the changes to the page title format, the search engine referrals will increase by a figure of over 10%. It is a modest figure, however given that I have so little content at the moment - it is something that I hope is attainable.

Stopping Search Engines Indexing Website Maintenance Pages

Sunday, December 2nd, 2007

There is no schedule for when a search engine will or won’t turn up on your web sites door step and starting indexing it; what happens when they turn up unannounced during scheduled maintenance? Under normal conditions, the search engine spiders will notice the difference between what they crawled last time and will include your changes into their index. Of course, you really don’t want your ‘we are currently performing scheduled maintenance and expect to be back in 1 hour’ message showing up in search engine results when your users enter an appropriate query.

To stop search engines indexing your site while it is in maintenance mode, there are two simple solutions available:

HTTP 404 response code
Search engines don’t immediately remove your web pages from their index because they cannot access it on a given request; just like they won’t remove it if your site is returning an internal server error. Instead, they will take notice that they attempted to crawl a given web page at time particular time and try again later. Only after repeatedly failing to retrieve the document will they mark that particular page as being non-existent and remove it from their index.
META no-index tag
When a search engine spider encounters a no-index meta tag, they should immediately abort indexing that particular page. After the scheduled maintenance is over and the spiders return, the no-index flag is no longer present - so the spiders will proceed with the crawl as normal.

Next time your site is under maintenance, make sure you’ve implemented one of these point or you could be very surprised what’ll show up in the search engine results the following day!

WordPress Drop Technorati For Incoming Links

Wednesday, November 28th, 2007

WordPress has a feature in it which shows activity surrounding your particular blog, named “Incoming Links”. For a long time, WordPress has been using the services of blog search engine and aggregator Technorati to deliver this feature. Using Technorati was an excellent decision for quite some time, especially when blogging was still relatively new and Technorati where blazing their own trail in that space. It made even more sense when Automattic released Pingomatic, as virtually all blogging platforms sent activity notifications to that and Technorati subscribed to that stream of data.

Things started to change and the usefulness of Technorati started to fade as the big guns entered into the blog search space, namingly Google. Google Blog Search was a great service on its own, using the incredible infrastructure behind Google to keep their blog search index fresh. Not being content with great, Google set out to make their Google Blog Search index exceptionally fresh as they started accepting ping notifications. Of course, as soon as that happened - Pingomatic started sending notifications into Google, which has yielded an index which is minty fresh - usually showing only minutes of delay.

With the recent release of WordPress 2.3, the WordPress team have now switched from Technorati to Google Blog Search for their “Incoming Links” feature. This single link change could have a fairly profound impact on Technorati, as with literally hundreds of thousands of blogs running WordPress - they were getting traffic for free. The lack of the link from WordPress, coupled with the superior fire power of Google and tongues have to be wagging about the future of blog search engine Technorati.

Internet Scale

Wednesday, November 7th, 2007

After launching ifdebug on the 3rd November, it’s only taken Googlebot and Yahoo! Slurp an amazing four days to crawl and index the site. Many moons ago, people would report having their sites online for literally months before being crawled by search engines, let alone have the content showing up in their index.

In August, Matt Cutts pointed out that the Google index is becoming minty fresh. What used to take months back in the year 2000, is now happening in days and what was taking days in 2005 is now regularly happening in hours or minutes. While the majority of the world don’t care about this sort of stuff and it never even enters into their consciousness, I find this nothing short of a technical marvel.

All major search engines currently report that they index literally billions of objects reaching into the farthest corners of the internet. This is where the amazing aspect comes into effect, ifdebug is but one of hundreds of millions online and some how the major search engines manage to find the time to crawl and index it only a matter of days after it was created!

The fast crawl rate is surely due to the link from my personal blog pointing here, as it is already well indexed and receives constant attention from the major search engines on a daily basis. As for managing the on going freshness of the site, sitemaps and online services such as pingomatic must play a reasonably substantial role in helping to keep their indexes fresh.

I’ll be keeping an eye on the major search engines over the coming weeks and months to see how they are performing; I’ll report back with the finding if there are any worth mentioning.