Category Archives: Services

Django Friendly Hosting

If you’re about to purchase hosting for your Django application, everything you’ll need to make a good decision is in one place at Django Friendly.

Ryan Berg is the man behind Django Friendly and put together the site as a way to consolidate the plethora of hosting options available for Django. As with some other scripting languages such as Ruby, Python also has some special hosting requirements which makes it inconvenient to host it within standard hosting configurations. The development of mod_wsgi for Apache is aiming to provide a simple, high performance option for hosting Python applications within shared hosting environments.

Allowing users the ability to filter web hosts by price and shared/dedicated hosting types is a great step forward. My suggested improvement for the site would be the ability for users to dynamically build a search query. As an example, being able to filter by server location, hosting type, price ranges, ratings and so on. Maybe an interface in a similar fashion to what is offered by a custom ticket search within the popular Trac software could be used as a starting point.

Either way, it’s another fantastic looking Django specific site which has been offered up to the greater community – you’ve got to love it.

Google Account Signin With CAPTCHA

Google Account login featuring CAPTCHA for additional securityTonight I was presented with a Google login page which was different in a few ways:

  • size and shape of the control were different
  • instead of using an in page control, it took me to a completely new page
  • required additional CAPTCHA validation

I suspect this may have been triggered by logging in and out of various Google products tonight, where I closed a tab but didn’t close the browser, opened new tabs and logged in and out and it might have had conflicting session information.
Does anyone know what causes this type of login prompt to be thrown up by Google?

Google Reader Duplicate Item Improvement

One of the features that I love about RSS, is that it allows users to keep their finger on the pulse of certain topics very easily. As some people may know, I quite like the Python web framework Django and I use Google Reader to help me keep up to date about what is happening within the greater Django community. I do this by subscribing to RSS feeds of people who I know regularly write about the product but also by utilising online social booking marking sites such as del.icio.us and ma.gnolia.

I recently read an article by James Holderness about detecting duplicate content within an RSS feed (via). For those not bothered with the jump, James outlines different techniques that the top x many feed reading products use to detect duplicate RSS content, which ranges from using the id field within the RSS down to comparing title and description information.

Back to the improvement, which is related to the information that James provided. When I subscribe to the social booking marking sites, they end up providing back a huge range of content matching certain criteria. The ones I’m subscribing to at the moment for Django are:

As you can imagine, each of these services has a different and overlapping user base. Each of which will find common items throughout the internet and bookmark them each day. When that stream of information is received by Google Reader, it will display half a dozen of the same unique resource, but masked by different user accounts within their bookmarking tool of choice.

What would be a great optional feature to add into Google Reader would be the ability to detect duplicate items even when they are sourced via the same domain or different domains.

The trick to something like this would be identifying the pattern, so as to allow Google to use an algorithm to flag it. For the sake of this concept, I think it’d be reasonable to consider items posted into social bookmarking sites and an aside or link drop in a blog to be reasonably similar.

My initial concept would involve comparing the amount of content within an item. If there are less than a predefined limit of words and a small number of links, then that item might be considered to be a link drop. You could apply that logic not only to social bookmarking sites but also to the standard blog, where an author might find something cool they want to link to.

The next thing up for consideration might be which items to remove as duplicates and which to include. News of any kind on the internet tends to reverberate throughout it quite quickly, so it’s common to find the same information posted many times. As the vibrations are felt throughout, people will tend to link back to where they found that information (as I did above). Google Reader could leverage off the minty fresh search engine index to help with this by using the number of attribution links passed around. As a quick and simple example, imagine the following scenario:

  • Site A has the unique Django content that I’m interested in
  • Sites B through Z all link to site A directly
  • Some sites from C through Z also link back to B, which is where they found that information

I don’t subscribe to site A directly, however some of the sites B through Z have been picked up by the social networks. Using the link graph throughout those sites, it’d be possible to find out which one(s) among that list are considered authoritative (based on attribution or back links) and start filtering based on that. It might then be possible to use other features of the Google Search index to do with theme, quality, trust to filter it further.

I think a feature like that within Google Reader would be fantastic, especially if I could apply those sorts of options on a per feed or folder basis. That way, I could group all of the common information together (Django) and have Google Reader automatically filter out the duplicates that matched the above criteria.

I’m sure the development team from Google Reader will hear my call; who knows, in a few months maybe a feature like this could work its way into the product.

Akismet Losing Its Mojo?

I have long praised the free spam fighting service Akismet but yesterday a horribly obvious spam comment wasn’t filtered which is very unusual. I’ll include the comment here so people can see what I’m referring to:

Name: Armond
Site: http://groups.google.com/group/otekal/web/free-bestiality-sex-stories
Message: free bestiality sex stories…
accessories distributed at most major retailers for such
…

Automattic have never disclosed with any specificity how the internals of Akismet work as a service, however it is more than reasonable to assume that Bayesian filtering is in their spam fighting tool belt. For those that aren’t aware, Bayesian filtering works by learning or being told what messages are spam and then analyses each word with those spam versus non spam messages. If a given message contains words contained in spam emails above a threshold, the message is considered spam.

Given that it is a learning based system, so to speak, I find it hard to believe that the words beastiality, sex in the URL and within the body of the comment aren’t throwing up great big red flags. I’m going to put this slip up down to one of two things:

  1. I was one of the first people to receive and register that particular spam signature
  2. When the comment was submitted, the Akismet service wasn’t able to be contacted

I’m heavily leaning towards the latter, for no other reason than there are literally hundreds of thousands of blogs on the internet – the likelihood I was one of the first for that particular spam signature is highly unlikely.

Long live Akismet!

You Know You’re Popular When

Today my personal site was pinged by Live Business Radio. As I do as a matter of course, I checked out the Live Business Radio web site and was disappointed to find that it’s nothing more than your average run of the mill site ridden with advertising, spam and buy this crap product now.

I get pinged by web sites regularly that don’t have anything to do with me and when I saw the site, I was about to abandon it immediately. Just before I did though, I scanned over the article and noticed that I’d been featured in a list of sites with a high Google Pagerank which offered links which are ‘followed’. It wouldn’t be a good filthy spammers site if they didn’t offer you software (for a fee) which you could use to spam take advantage of the followed links.

If you’re not quite sure what I’m referring to regarding the ‘followed’ remark, you can read about it on my personal site:

I should feel so honoured.

Google Alerts Getting Smarter

Google Alerts lets the user define keyword lists and phrases, which when found by Google while crawling and indexing web sites – will send a user a notification about that particular occurrence.

Historically, it always appeared as though the technology behind the alerting system was quite simple – literally matching the keyword and phrases that the user had nominated. Recently alerts have been generated that don’t strictly meet the keyword list and phrase requirements for a given page. It seems as though Google are using all of the additional meta data about a web site and the content to infer certain pieces of information.

As an example, I was recently notified about the my name being used within the post about extending the Nintendo Wii. If you view that particular item, you will not find the phrase “alistair lattimore” anywhere within it. Just to be sure, I have also ruled out my name within the RSS feeds generated by the site as well.

Putting the tinfoil hat on for a second, there is a raft of information that Google know about me already:

  • I have a Gmail account
  • The same Gmail account is associated to Google Analytics, Google Reader, Google Webmasters, Google Adwords and Google Adsense.
  • Within Google Webmasters, I monitor my personal blog and this site.
  • Within Google Analytics, I monitor my personal site and this site.
  • Within Google Reader, I subscribe to the feed of both sites.
  • Google are a domain registar, which means they could theoretically see that I purchased both domains.
  • I have linked in both directions between the two sites in the past.

When you start to see how all of that information is inter-linked, it becomes quite easy to see how Google can provide insightful results through their various services. Of course if you take the tin foil hat off and look at the more standard items such as web site content, my name is listed in the title on the front page and also on the about page. Those two bits of information might have been all it took, who knows.

If the technology behind that flexibility has a high level of accuracy in determining or inferring that information, it really is an excellent service. In the above example, if Google hadn’t of inferred my name as being associated to that document – I would have never found out about it via the alerting system. Granted in this particular example, it makes no difference as I know I wrote it – however for all other content on the internet it really lifts the products capability.

Google Analytics & URL Rewriting Caveats

As the internet has matured and web sites have aged and expanded over the years, it has now become common place for web site owners to restructure their web sites to increase the sites accessibility and search engine effectiveness.

During the restructuring process, less savvy web masters reorganise their web sites without any concern for the impact it might have to their search engine rankings, referrals and user experience while more savvy web masters understand that cool URL’s don’t change. That isn’t to say that the content that was originally published against that URL must remain there, just that the URL exists so that anyone linking into it don’t receive missing document or HTTP 404 error.

When restructuring web sites, the savvy web master mentioned earlier requires a way to make an existing URL redirect to its new URL after the restructure. The two common methods to handle the redirection are:

  • It is perfectly acceptable to use a standard HTML web page with the tracking code installed and a meta refresh to redirect a user from the old to the new. This method does have the down side that all of the redirections for the web site are scattered throughout.
  • Another solution is offloading the redirection into a utility such as the Apache mod_rewrite module or the equivalent ISAPI_Rewrite for IIS. Using this method allows the web master to place all of the URL redirection in once place for easy management.

Under normal conditions such as option one above, where Google Analytics is installed on every web page within a site – it’s possible for the service to collect a complete click stream for the site. Google Analytics is also capable of handling standard HTTP redirects, so long as the tracking code is installed on both the referring and destination pages.

While it is convenient to use URL rewriting, there is a caveat which reduces the amount of information that Google Analytics can collect. The redirection will happen before any content is returned to the user, which means there is no opportunity for Google Analytics tracking code to fire. This results in Google Analytics reporting zero activity against the redirecting URL.

WordPress Drop Technorati For Incoming Links

WordPress has a feature in it which shows activity surrounding your particular blog, named “Incoming Links”. For a long time, WordPress has been using the services of blog search engine and aggregator Technorati to deliver this feature. Using Technorati was an excellent decision for quite some time, especially when blogging was still relatively new and Technorati where blazing their own trail in that space. It made even more sense when Automattic released Pingomatic, as virtually all blogging platforms sent activity notifications to that and Technorati subscribed to that stream of data.

Things started to change and the usefulness of Technorati started to fade as the big guns entered into the blog search space, namingly Google. Google Blog Search was a great service on its own, using the incredible infrastructure behind Google to keep their blog search index fresh. Not being content with great, Google set out to make their Google Blog Search index exceptionally fresh as they started accepting ping notifications. Of course, as soon as that happened – Pingomatic started sending notifications into Google, which has yielded an index which is minty fresh – usually showing only minutes of delay.

With the recent release of WordPress 2.3, the WordPress team have now switched from Technorati to Google Blog Search for their “Incoming Links” feature. This single link change could have a fairly profound impact on Technorati, as with literally hundreds of thousands of blogs running WordPress – they were getting traffic for free. The lack of the link from WordPress, coupled with the superior fire power of Google and tongues have to be wagging about the future of blog search engine Technorati.

Free Facebook Application Hosting Provided by Joyent

As many people would be aware, Facebook has become some what of a phenomenon of late. Over the last 12 months, the site has seen growth that most internet companies could only dream of. If that wasn’t enough, in May 2007 Facebook announced that it was going to open up the service with what they called the Facebook Platform.

The Facebook Platform allows third parties to develop plugins or ‘application’s which work in concert with Facebook. When your application is loaded by a Facebook user, it passes information into your application which communicates with your own services. Of course, the dependence of your Facebook application on your own hosting, means that the more popular that your Facebook application becomes – the more web hosting capacity that you must have. There are some great tales of the struggles that companies like iLike faced when they launched their product and trying to keep up with the incredible demand. The insane popularity of Facebook means that if even a small percentage of their users load your application, it can translate in literally millions of web server requests and hundreds of gigabytes of data transferred; far more than any normal person could afford.

To help combat the problem, Joyent have teamed up with Dell and Facebook to offer over USD$3 million dollars worth of their accelerator hosting for free. We’re not talking about cheap Facebook application hosting, this offer from Joyent is absolutely free. The free Facebook application hosting offer from Joyent includes a complete virtualised environment with everything you’d need to get your Facebook application up and running using popular programming languages like PHP, Python and Ruby. Joyent are pushing the product as an on demand, scalable architecture which is built on top of their very successful Accelerator product. The beautiful thing about the Joyent Accelerators, is that their free hosting offers a seamless upgrade path into something more substantial if your Facebook application takes off.

To make sure that the offer isn’t abused, Joyent have some pretty straight forward terms. You’re application must be active and in use on Facebook for you to be eligible. The free Facebook application offer provides 1 year worth of free application hosting for your Facebook application – after that point you’ll be required to pay for a normal plan. Joyent are offering 3500 accounts with their free Facebook hosting offer. That might not seem like a lot, however if your application is dormant for more than 60 days – your account will be reclaimed.

If you’re looking into building a Facebook application or already have and the hosting costs keep going up, check out the Joyent offer – it might just be your saving grace.

Twitter Lost My Tweets

Over the weekend, I signed up for a Twitter account and that process had a few hiccups which were entirely my fault. After sorting that out, I started posting items into Twitter using the web site and everything seemed to be going swimmingly.

Unfortunately, the site went down with the familiar offline message informing Twitters that they were reorganising some stuff. To my surprise, when the service came back online – it had lost my last tweet. Assuming this was a bit of a one off, I let it go. Later that same night, Twitter lost another tweet. I haven’t posted about it since Friday night as I figured that they were doing maintenance and it wasn’t worth raising, however yesterday it lost yet another tweet and this time I didn’t notice that the site went offline.

I think we’re up to a total of three or four tweets which Twitter have lost in as many days since joining the service. I realise that I might just be a little unlucky, so I’ll be keeping an eye on it over the next week or two to see if it is normal or the exception.