Category Archives: Services

Live Search Webmaster Centre Drastically Improved

In the last few days, the Live Search Webmaster blog have posted about two significant improvements to the webmaster center, how Live Search crawls your site and more detailed backlink information.

Live Search Webmaster Center now supports the following four items, which are a great help in identifying problems with your site and how Live Search is spidering your content:

  • File not found (404) errors, a straight forward date stamped account of the HTTP “404 File Not Found” errors that Live Search encountered when crawling the site. Conveniently, this includes broken links within your own site and sites that you are linking to.
  • Pages Blocked by Robots Exclusion Protocol (REP), reported when Live Search has been prevented from indexing or displaying a cached copy of the page because of a policy in your robots exclusion protocol (REP).
  • Long Dynamic URLs, reported when Live Search encounters a URL with an exceptionally long query string. These URLs have the potential to create an infinite loop for search engines due to the number of combination’s of their parameters, and are often not crawled. I haven’t come across one of these yet and so far I haven’t seen any documentation of what ‘exceptionally long’ means, so clarification on that point would be handy.
  • Unsupported Content-Types, reported when a page either specifies a content-type that is not supported by Live Search, or simply doesn’t specify any content type. Examples of supported content-types are: text/html, text/xml, and application/PowerPoint.

In 2007, Microsoft removed the ability for users to drill into backlink data within Live Search. It took a long time, however that functionality has now been replaced within Live Search Webmaster Center and is looking quite promising.

Common functionality shared between the crawl information and back link data, is that Live Webmaster allows you to download the information CSV format. Possibly the best feature for a large complex site though, is that each of the above options can be filtered (search style) further by entering in a subdomain and/or directory to restrict the results to. The backlink interface additionally supports a top level domain in the search box, allowing you to isolate only back links originating from an Australian site by entering in .au.

Future Improvements

The interface doesn’t support paging of results, in case you want to step through a few pages without wanting to export information in CSV format. If you do want to download more information, there isn’t an option to export all information in a hit – you can only retrieve 1000 lines of data. I can appreciate that they don’t want to provide an ‘all’ option or that they want to limit how many can be fetched at once, however there isn’t a way to set 1000 items per page to download them and then go to the next page and download them. The other issue with the 1000 lines of data, is that there is no information on how the 1000 lines are selected. As an example, the backlink section uses the language ‘Download up to 1000 results’ – however there isn’t any indication of how the 1000 are selected.


While there is still room for improvement and really, when isnt there, I’m personally encouraged by the changes that Microsoft are making to Live Search Webmaster Center. The sooner services from Microsoft start to catch up to other services offered by the leaders – the sooner more businesses and webmasters will spend investing time into the Live Search product.

Bitbucket, Hosted Mercurial Source Control

Bitbucket, Mercurial distributed version control hostingBitbucket is the latest project by Jesper Nøhr. If the name looks familiar, it’s because I wrote about a Jesper in March when he used Django and Python as a rapid development environment for an indy advertising product named Indiego Connection.

This time around, Jesper has moved gears to provide a hosting for a popular distributed version control system named Mercurial. I haven’t started drinking the distributed version control kool-aid just yet, however it has been gaining a lot of attention lately via another open source product named Git, developed by Linus Torvalds – the creator of the Linux kernel.

The Mercurial hosting provided by Bitbucket comes in a few different flavours, one of which is free and allows up to 150Mb of storage. I really like the fact that they are not attempting to offer a completely free service, if they were – I suspect that it’d be under enormous pressure. The cost of using Bitbucket to host your Mercurial repositories is very reasonable, starting from $5/month and stepping up to $100/month which includes 25Gb of storage.

Changeset visualisation provided by Bitbucket, a Mercurial hosting serviceBitbucket provides a very convenient interface for interacting with the Mercurial repositories. As with most web interfaces to source control management packages, you can browse through different repositories, see all of the changes flowing through them and compare them if you like. A couple features that simpler products don’t support that I like is that you can ‘follow’ a repository, create queues for patches related to a repository, download the repository at time x in zip, gz or bz2 formats and it provides an easy to understand visual linking between changesets.

If you are looking for Mercurial hosting, I would definitely investigate whether Bitbucket is a suitable candidate to store whatever you need versioned. The service certainly looks the goods and from what I’m reading online, it is getting really solid reviews already.

Google News Algorithms Get It Wrong

Pamela Anderson image mistakenly being associated to Northern Territory council mergers on Google News AustraliaGoogle News is a great service, probably the single best feature of Google News is that it aggregates news stories from numerous sources into one place and then condenses them, so as a user you don’t need to be bothered by or read the same story more than once. As with everything else Google related, its driven by clever algorithms in how it decides what to collapse/consolidate, the snippets to show and images to associate with a given topic or news item.

When viewing the Australian Google News page today, I stumbed across something that I thought was quite funny. In a moment of algorithms acting badly, they had managed to associate an image of Pamela Anderson against a collapsed set of news items related to regional council mergers in the Northern Territory. Clicking on the Pamela Anderson photo took you to the appropriate story, so that part of the system was behaving correctly – just that she was being associated to Northern Territory council mergers wasn’t!

Automattic Account Management

Automattic, the fine folk behind the WordPress blogging engine, and Akismet have started merging accounts between and Gravatar.

Toward the start of 2006, I signed up for an account with and for obvious reasons, I’ve never needed to use it. Not that long afterward, I used the API key that was provided with my to fight spam using Akismet.

Today I signed into Gravatar, a web service acquired by Automattic late in 2007 to check some settings and was presented with some information about upcoming changes to my existing account. Not having used my account actively, I had to go sifting through signup emails from two years ago; not unsurprisingly, my account still had the randomly generated password!

Within two minutes of finding my account information, I’d followed the prompts and merged/associated it with my existing Gravatar account. The way in which this is being handled is great, it’s a passive change that happens when you next sign in and if you do have an existing account – you can associate them together.

The best thing is now I have one less login to worry about and I can see all of my information for all Automattic assets in one place!

Google Analytics Benchmarking Verticals

In March, Google announced a new feature for Google Analytics named Benchmarking. One of the most compelling reasons to opt-in to the benchmarking component of Google Analytics is to compare how your sites perform against other sites.

Once the data from your sites has been analysed by Google Analytics, it is then possible to compare the following metrics against other sites:

  • Visits
  • Pageviews
  • Pages/visit
  • Average Time on Site
  • Bounce Rate
  • Percentage New Visits

Google Analytics allows the user to choose which one of a number of industry verticals to place their site into for comparison; telecommunications, travel, business and news are but just a few. This industry specific targeting allows for comparison against sites which are similar in theme – vitally important, as you wouldn’t want to compare the statistics of a heavily ecommerce driven site against that of a social networking site.

To make sure that the first two metrics above make sense to each site, Google Analytics automatically places a site into one of three categories based on the number of visits – small, medium or large. When viewing benchmarking data about a site, only the data from other sites within your size category are visible. As such, if you have a small but up and coming site – it isn’t possible to see what the market leader may potentially be doing.

So far, we can compare six simple but very useful metrics against similarly sized web sites within the same industry vertical, though specifying a vertical for comparison is completely optional. While very useful, having a better unerstanding of exactly what you’re comparing against would be handy. I’d personally like a little clarification on the following points:

  • What is the boundary in visits per time period for small, medium & large?
  • How long does a site need to sustain the number of visits per time period to officially be moved between size categories?
  • If a site does move between categories, as a user – am I notified that it has happened?
  • If I use a country specific domain, am I comparing only against sites of a similar size within the country specific domain name space or is it a global comparison? I find this point quite important, as users from different countries have different usage patterns.
  • Does placing your site within a country via Google Webmasters have an impact on the previous point – in case you use a top level domain such as a .com/.net?
  • How are sites placed into an industry vertical and is it possible to see what vertical a given site has been placed in? The latter part of that question is important, as if your site has been placed into the wrong sub-category list and as a user you are nominating a different category (which you feel is the correct one), it could be providing you a different skew of the results.

The benchmarking service from Google Analytics has only just been launched and is still marked beta. I expect as more people start sharing their information with Google, more and more questions will get raised, more will be answered and the product will continue to evolve as do most Google products.

Google Analytics Ecommerce Outage

Six weeks ago, along with colleagues from my work place – we implemented Google Analytics Ecommerce functionality within a handful of sites.*

The statistics had been pouring into Google Analytics and then around April 25, same time that Australia has a long weekend to celebrate ANZAC Day, the transactions going through the site started to drop. At first I didn’t think much of it, in the tourism industry it is common place to see lower periods of activity over a long weekend.

I continued to keep an eye on the transactions being reported and expected them to resume the next work day, however that didn’t happen. At this stage, I investigated the issue further to see what the actual figures were and my suspicion was confirmed – the transactions going through the site had dropped, however no where near the levels that the ecommerce functionality within Google Analytics was suggesting.

A fortnight has passed and I haven’t seen any noises about it online and then today when I logged into Google Analytics, the dashboard included a notice stating that analytics was delayed in processing data from 30th April to 5th May and that ecommerce data across that period was unable to be recovered. I’m pleased that the Google Analytics team have posted a notice about it, at least that confirms that it wasn’t something that we had done which inadvertently stopped us reporting the transactions into Google.

Two things:

  1. The image above suggests that the outage began on the 27th April, not 30 April as Google suggested. Either the sudden drop was the lull of the long weekend or Google have reported the wrong date?
  2. Why did it take a fortnight to post a notice about the unplanned outage? While I appreciate it wasn’t going to change anything, if I had of known that there was an outage in place – I wouldn’t have spent any time investigating the lull and just moved on.

* For those that have an ecommerce site and aren’t utilising the ecommerce functionality within Google Analytics, I cannot impress on you how amazing this feature is; the insight it provides into the revenue that your site(s) generate is amazing.

Gaming Google Reader For Higher Click Through Rates

Everyone looking to promote their web sites are always looking for ways to get more traffic, higher click through rates and better conversion rates (whatever a conversion might represent).

For a long time, publishers around the world were looking at ways of exploiting small omissions in how the search engines crawled, indexed and subsequently displayed a result within the search engine listings. One of the most popular methods was adding in non-standard characters into the <title> element for a page, in an attempt to make it stand out within the search engine results.

It didn’t take long for the search engines to cotton onto this tactic and it was shut down – however I’ve recently noticed that a handful have slipped through into Google Reader.

Based on the image above, does the additional star at the start of the title catch your attention? For me it immediately grabbed it, as its similar to the star used by Google Reader to remember a feed item for later.

For comparisions sake, you can see that Google search is filtering that same non-standard character out of the search results; it’s a matter of time before the Google Reader team pug that hole.

Detecting Duplicates Within XML Feeds

The same web page, shown within Google Reader multiple times

At the end of January, I commented on and offered a suggestion to the Google Reader team about how to improve their product by removing duplicate feed items.

At the time, I didn’t think to post a screenshot to aid in my explanation but remembered to grab one recently and felt it would help explain just how annoying this can be within Google Reader.

From the screenshot, you can see that I have highlighted eight different references to an article by Simon Willison about jQuery style chaining with the Django ORM. When a human looks at that image, it is abundantly clear that each of the eight highlighted references are ultimately going to link through to the same page.

The Google Reader team could use this new feature to their advantage by collapsing the duplicates and offering a visual clue that the item is hot/popular based on the number of references found to the same article. Google search already has the notion of the date/time when content is published, so using that information along with the number of inbound references they discover, the number of duplicates collapsed within your RSS streams could be quite useful.

I know I would really love better facilities within Google Reader for detecting duplicates within RSS, it’d just remove so much noise from the information stream when you’re trying to keep a eye on what is happening within the community.

Google Analytics Benchmarking

Google have announced a new feature for Google Analytics named Benchmarking. The Google Analytics Benchmarking service is still in its beta phase, however aims to allow analytics users to compare or benchmark their web sites against other web sites.

The benchmarking service from Google is opt-in, not default-in. If a user would like to view benchmarking data for their sites, they must first opt-in to allow Google to use their own web statistics. Of interest, opting in is on a per account basis – not per web profile. As such, if you have 50 web profiles set up within your account – opting in will share all of your web profiles data with Google.

After opting into the benchmarking service, Google proceed to anonomise the users web statistic information. What this means is that any identifiable information within the web statistics is removed and only aggregate information is held; as such it isn’t possible to spy on your competitor directly or visa versa.

At this early stage, the benchmarking data is fairly high level but provides you comparative metrics on:

  • Visits
  • Pageviews
  • Pages/visit
  • Average Time on Site
  • Bounce Rate
  • Percentage New Visits

The usefulness and ultimately the success of the benchmarking service is reliant on how many Google Analytics users opt-in to sharing their web statistics with Google. If the greater user base don’t feel inclined to share their web statistics with Google in this manner, then the comparative nature of what they are offering is hamstrung to some degree.