Favikon, Favicon Generator For The Masses

Stumbled onto a simple but useful free online service last week named Favikon.

As the name suggests, it is related to the small but often memorable favicon or favourite icon. For those that don’t know what a favicon is – it’s the small icon shown in the address bar to the left of the domain name, in your bookmarks and serves as a visual way to remember a domain or web site.

The Favikon generator service is dead simple:

  1. choose an existing image already on the internet by URL or upload one from your computer in PNG, GIF or JPG file formats
  2. use the web based cropping tool to highlight all or part of the image
  3. download it and be impressed with your masterful graphic ability

I realise that it isn’t mind bending, however I found it so simple that I actually bothered to create a favicon for my personal site – which I haven’t bothered to do in 5 years!

Google Account Canonical URL Failure

Search engine optimisation consultants world wide have been pushing the URL canonicalisation wagon for quite some time. URL canonicalisation ensures that a given internet resource can only be reached by a single URL.

That might seem like a relatively straight forward task, however poorly configured web servers and the wide ranging quality of modern contentn management systems has meant that it isn’t as simple as first throught.

If there was ever going to be a company that you’d think would nail URL canonicalistion right across the board, it’d be Google. However while searching for the signup URL for a new Google Account [google account], I found something rather interesting – the first search result was https://google.com/accounts/.

There are two things wrong with that result:

  1. as a general rule of thumb, all Google products live under the www sub-domain and not the root domain
  2. more importantly, the SSL certificate is valid for http://www.google.com and not http://google.com

A couple of other slightly interesting bits about that result:

  • a Google cache check for the www and non-www versions of that URL show the exact same crawl time
  • a Google link check for the www and non-www versions shows the same number of links into both URLs
  • based on the first point, it would appear that Googlebot is happy to crawl an secure page with a broken SSL certificate

Wikirank, Visualising Wikipedia Usage Data


Wikirank: Whats popular on Wikipedia

I came across a clever web site named Wikirank, which provides visualisation tools to explore and compare the usage data from wikipedia.org. 

If you’re wondering how Wikirank could manage that, wikipedia.org provide access to their web server traffic logs as a service to the community for free. Wikirank consumes that public data, analyses it and provides a convenient way to see what topics on wikipedia.org are popular at the moment.

Wikirank isn’t just a tool to find out what is popular at the moment though, it also lets you view the usage data on a nominated page over time, up to the last 90 days. That sort of functionality is great, as it lets you see how a particular topic is being received among the community. Not wanting to stop there though, Wikirank also lets you compare different topics as well.  The example on the Wikirank home page at the moment is who is more popular out of John, Paul, George or Ringo from The Beatles and according to Wikirank, John Lennon is nearly twice as popular as Paul McCartney.

I think Wikirank is going to be a fantastic companion to the primary wikipedia.org web site. It’d be facinating if they spun off a wikianalytics.com and broke down the usage data from wikipedia.org and allowed people to explore that data in a similar but cutdown fashion to what Google Analtyics provides.

Enhancing Dopplr “Add Trip” Functionality

I recently signed up to the fabulous travel service Dopplr, which lets you share your travel plans with friends, family and colleagues. While adding in a trip from the Gold Coast to my home town of Chinchilla, Dopplr got a little confused about my destination and suggested that the Chinchilla I was referring to was Chinchilla de Monte-Aragón in Spain.

When creating your account with Dopplr, you’ve got the ability to provide the service with a certain amount of information about yourself. Among the information is a setting for your home town, which I have set as the Gold Coast in Australia. Given that my country and home town are set, I think it is possible for the Dopplr service to make slightly smarter choices when a user isn’t explicit about a destination.

For this particular trip, I left on the 20th February and I’m returning on the 22nd February. I didn’t specify that this trip was not originating from my home town, so it should assume that I’m leaving from the Gold Coast. Given that Dopplr knows where you’re originating from (even if it isn’t your home town), it’d be possible for them to calculate a relative distance between it and any destination.  If they cycled through each of the 12 possible matches for Chinchilla that they provided – they would have found that one of the Chinchilla’s listed was in the same country and state as my home town and was approximately 350km away. To a human reading that sort of information, it becomes immediately apparent that since I’m only on the road for three days, I’m leaving from the Gold Coast and there is a Chinchilla approximately 350km away that it’d be the sensible choice for the destination.

I think that small improvements such as the above are one of the key types of enhancements to a product that really sets a service apart from its competition.

CommBank Keeping Warm By Burning Money

While viewing http://www.news.com.au yesterday, I noticed an advertisement on the home page for Commonwealth Bank of Australia.

The ad was for instant approvals and same day funds (working now), however clicking the ad presented me with a “Page Not Found” error on the CommBank web site.

Everyone makes mistakes, it is unavoidable. However, when you’re paying the sort of money to advertise on a high visibility web site like news.com.au – you’d think that someone would have gone through and checked everything was in place before approving the creative to go live on the site.

I figure the air conditioning isn’t working in the CommBank offices and they are just burning money to keep warm.

Twitter Performance Problems, The Root Cause

The performance and scalability problems of Twitter have been covered to death, so I won’t wax lyrical about the different reasons that the micro-blogging service has had performance and uptime problems over the last year.

With the advent of cloud computing and inter-connected web services, the requirement to have a good quality API has just about become a must have. One of the things that an API allows is new and creative mechanisms for users to consume and repurpose your service – which by and large is fantastic. Every now and then though, people will find a way to exploit a service to their advantage – usually financially driven.

In the case of Twitter, clever folk are using the service to ‘watch’ what discussions are happening on and around the internet about a given topic. Case in point this afternoon, I mentioned the phrase “WordPress” in a tweet and I suddenly received 10 new emails notifying me that random people I don’t know are now following me.

The fact that random people are following me isn’t the concern, it is that they automated that based on what I was disucssing in a Twitter conversation. The knock on effect is that those users will no doubt be following  hundreds or thousands of other Twitter users.

From an architectural point of view, this problem quickly spirals out of control as now every message that I write, generates a notification to be sent to those users. If they had a legitimate interest in following me, no problem at all but more than likely it will go completely unnoticed and the only thing that it has really achieved is increasing the load on the Twitter infrastructure.

If users continue to abuse this type of functionality, inevitably the Twitter folk will further tighten the screws on how many people you can follow per account. Of course, then the users abusing the service will start creating multiple accounts so they can get what they want – always looking for a way to side step the restrictions.

Search Engine Optimisation Via Dead Trees

I thought I’d undertake some professional development surrounding search engine optimisation, ironically in the form of a paper back book, named Get to the top on Google written by David Viney.

As I work my way through the book, I thought I might share some thoughts on the content covered – see what ideas I like about his search engine optimising techniques compared to what I already do or potentially what I don’t do.

If nothing else, having a competing train of thought surrounding optimising for search engines has to be healthy. It could reinforce solid ideas that I already had, disspell what I considered good advice as nothing more than a myth or offer completely new optimisation strategies and techniques.

We’ll find out how that all pans out in the next week or two as I complete Get to the top of Google.

Project Euler, Problem #1

The first problem within Project Euler isn’t meant to twist the brain in knots and is a gentle introduction to what is coming further down the road. The problem states:

If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.

Find the sum of all the multiples of 3 or 5 below 1000.

As I mentioned in my initial post about exploring Project Euler, I noted that I often write in a lowest common denominator with whatever language I’m using at the time. The net effect of that decision is often a working but poorly optimised solution or a solution that doesn’t fit well with the spirit of the programming language.

The solution to problem #1 took three distinct evolutions, the first of which is below and looks like a solution you’d likely see in any programming language:

i = 1
sum = 0

while i < 1000:
  if i % 3 == 0 or i % 5 == 0:
    sum += i
  i += 1

print sum

I wasn’t a fan of needing to declare the variables to start with for such a simple problem, so removed the need for the i but using the Python range function to generate a series of numbers. The range function allows you to generate any length series of numbers, starting and ending where you want with any stepping. By default, it starts at 0 and a stepping of 1 – which means range(1000) is going to generate a list of numbers from 0 – 999.

sum = 0

for x in range(1000):
  if x % 3 == 0 or x % 5 == 0:
    sum += x

print sum

Python provides functionality for its lists called list comprehensions. The idea behind a list comprehension is that you can perform the same action, whatever that may be, against each item in the set. In the final solution below, I have applied a filter (the if statements), which now only returns those items from the for x in range that match and I’m not applying an expression to the resulting matches (the solitary x before the for). Once collected the new list is passed to the sum function to yield our result.

sum([x for x in range(1000) if x % 3 == 0 or x % 5 == 0])

Project Euler, An Exercise In Exploration

Some time ago I stumbled onto a math based problem solving site for programmers named Project Euler. The Project Euler site describes itself as:

Project Euler is a series of challenging mathematical/computer programming problems that will require more than just mathematical insights to solve. Although mathematics will help you arrive at elegant and efficient methods, the use of a computer and programming skills will be required to solve most problems.

The motivation for starting Project Euler, and its continuation, is to provide a platform for the inquiring mind to delve into unfamiliar areas and learn new concepts in a fun and recreational context.

Possible solutions to the problems are verified through the Project Euler web site, where your successes are recorded if you sign up for an account. You don’t need to work through the problems in order, however doing so may be beneficial as work completed and previous questions is often reused.

I thought Project Euler would be a great exercise to explore more of the Python programming language. While I’ve read quite a bit about the Python language, I’ve not written anything substantial in it and I often find myself using the lowest common denominator in my approaches to solving problems as a by product.

My intention while working through the problems presented within Project Euler is to find a neater, cleaner method of solving the problem which is hopefully more pythonic. I’ll present my solution to the problems and with a little luck, I’ll receive some helpful improvements from the great programming community as well.

Django Internationalisation (i18n) Statistics

The Python web framework Django supports internationalistaion (i18n) for nearly 30 different languages already.

While reviewing the changesets flowing through the Django source repository, I often notice amendments to the internationalisation code and it got me thinking about how ‘complete’ the i18n status is for the languages that Django is attempting to support.

Enter a visually simple but very informative web site built using Google App Engine which polls the Django subversion repository periodically and compiles a table showing the percentage completion for each of the different languages.

I’m impressed that with nearly 30 different languages under their belt that the majority of them are reporting very solid percentage completion numbers, no wonder so many non-English speaking developers are using Django.