Justkez

Trying to be a consistent blog 
Filed under

blog

 

Why are there so many NoSQL options?

Having recently posted a question asking Why are there so many NoSQL databases? over at Hacker News, I thought it would be useful to summarise the responses, and to draw any common thoughts out.

Background

There is a "standard set" of traditional databases if you are developing a web (or non-web, of course) application; MySQL, PostgreSQL, SQLite and perhaps Oracle and SQL Server if you're in an enterprise environment.

However, with the NoSQL anti-database movement gaining in momentum and becoming more widespread, people are starting to look towards the new schema-free, key-value and document store databases that are hitting the market.

The problem is the proliferation of NoSQL options, and trying to boil everything down to understand which options suits your needs most closely.

Why are there so many NoSQL options?

Without delving into too much detail, there have been many recent innovations ^1 in the field (Facebook created Cassandra, Google created BigTable and MapReduce, Amazon created SimpleTable and LinkedIn created Project Voldemort). Innvoations that came about to solve the relatively new challenge of scaling web applications ^2 for millions of users.

The general view is that things will settle down in the future, with a few clear front-runners emerging next year ^3. In the mean time, a useful - and related - analogy is that of SQL. There were many different ways of communicating with relational databases, and a common syntax was needed ^4; SQL was the result of compromise and common ground between all those different query languages.

Whilst I don't think that NoSQL projects will merge to create a common system, it does seem likely that some will lag behind in their development, and be superceeded by the better-engineered solutions.

How to decide on a NoSQL option

In the mean time, all you can do is read read read. No one is going to tell you which path you should take; you need to research it yourself and fit it with your requirements before commiting.

Remember that there are three general camps for NoSQL systems:

  • Key-value stores
  • Column stores
  • Document stores

And there is now redis, which straddles these camps.

My advice...

A good jumping off point is NOSQL: scaling to size and scaling to complexity (which gives a good high level overview of the concepts), then browse some of the posts over on MyNoSQL to see which projects are active and what new technologies are being added.

Good luck, and thanks to all the respondees on Hacker News.

Filed under  //   blog   curiousity   databases   nosql  

Comments [0]

Adding a little redis to Nginx

I have been doing some reading into key-value databases (schema-free/NOSQL), and am particularly taken with redis, which has had some limelight of late.

Many people have been advocating it as an alternative to memcached as a web-app caching system, so I thought I'd dip my toe in the ocean of caching with redis.

Note that I don't actually need to cache anything, it's just the performance angle appeals to me.

After moving from the standard apt-get install nginx to a from-source version of Nginx (compiled with the Nginx redis module, of course), I was able to do a few basic tests with ab (ApacheBench).

Using the following command:

ab -n 1000 -c 50 localhost/

I get some results:

  • Without redis caching: 2,864 requests per second (mean)
  • With redis caching: 3,354 requests per second (mean)

Conclusive?

Not really, just indicative. We all know Nginx is very accomplished at serving static files, and this was a very simple "Hello" index page, 15 bytes long.

However, this would certainly make me think hard about deploying redis alongside Nginx for any low-write/high-read web applications or sites.

The results aren't particularly astounding, but the simplicity of integrating redis with Nginx and the fact that it is so transparent should make any Nginx user think about going down this route.

Filed under  //   blog   nginx   performance   redis   server  

Comments [0]

Generating a Tag Cloud in Jekyll

The term category and tag are used interchangeably throughout this posting; they are assumed to be the same thing.

Having recently adopted Jekyll to power this website, I have been doing a bit of hacking/extending to get some added features in. A few days ago it was integrating Twitter with Jekyll, and now it's generating a tag cloud.

Back in July, Alex Young blogged about his Jekyll migration, and thoughtfully included a link to some code he wrote to list all posts broken out by category/tag.

I wanted to take this a bit further, and generate a per-category page which listed all the postings for that category, but also to generate a tag cloud.

Generating tag pages

After making a few changes to Alex's code, I ended up with a tag stub in a Rakefile which loops through all the categories used on the site and generates a static HTML page with a list of all the postings in that category.

Remember that this code snippet requires you to define your per-post categories in the YAML header of each post, e.g.

    categories:
  - jekyll
  - blog
  - ruby
  

(You need to mkdir tags in your Jekyll directory before executing the code below)

Now, the Rakefile segment:

    desc 'Generate tags page'
task :tags do
  puts "Generating tags..."
  require 'rubygems'
  require 'jekyll'
  include Jekyll::Filters
  
  options = Jekyll.configuration({})
  site = Jekyll::Site.new(options)
  site.read_posts('')
  site.categories.sort.each do |category, posts|
    html = ''
    html << <<-HTML
---
layout: default
title: Postings tagged "#{category}"
---
    <h1 id="#{category}">Postings tagged "#{category}"</h1>

    html << '<ul class="posts">'
    posts.each do |post|
      post_data = post.to_liquid
      html << <<-HTML
        <li>#{post_data['title']}</li>
      HTML
    end
    html << '</ul>'
    
    File.open("tags/#{category}.html", 'w+') do |file|
      file.puts html
    end
  end
  puts 'Done.'
end
  

There is also a gist here

Now you can run rake tags and it will generate a number of HTML files in the tags/ subdirectory; regenerating through Jekyll will then copy these files over to your site. Navigating to /tags/jekyll.html should list all your Jekyll related posts.

Generating your tag cloud

The below snippet does something similar, but just loops through each category and counts the number of tagged postings. It then does some very rudimentary font-size scaling to make the more popular tags bigger.

    puts 'Generating tag cloud...'
require 'rubygems'
require 'jekyll'
include Jekyll::Filters

options = Jekyll.configuration({})
site = Jekyll::Site.new(options)
site.read_posts('')

html =<<-HTML
---
layout: default
title: Tag cloud
---

<h1>Tag cloud</h1>

    HTML

    site.categories.sort.each do |category, posts|
      html << <<-HTML
      HTML
      
      s = posts.count
      font_size = 12 + (s*1.5);
      html << "twitter?) with any queries or improvements, or post a comment below.

Filed under  //   blog   geekery   jekyll   ruby  

Comments [2]

Integrating Twitter with Jekyll

Having migrated Justkez.com to be based on Jekyll, I was pondering how I might include my recent twitterings on the front page of the site. In the Wordpress world, this would have been done via a plugin which may or may not have hung the loading of the page, might have employed caching, but would certainly have had some overheads.

Not in Jekyll.

Integrating Jekyll and Twitter

It is rather simple to create a Ruby script to pull down your most recent Twitter updates and dump them into a file.

It is also simple to wrap each update in some rudimentary HTML.

We then use the Liquid include tag to insert the updates where desired.

The Ruby script

I have the following sat in my ~/bin/ directory, which can be executed by a cron job at whatever interval you see fit:

[code] require 'twitter' twitter_user = 'JohnDoe' # TODO: Change to your Twitter username puts '<ul id="twitter_list">' Twitter::Search.new.from(twitter_user).each do |r| d = DateTime.parse(r.created_at).strftime('%d %b') puts "<li><span class=\"gentle\">#{d}</span> #{r.text}

All this does is fetch the latest updates for twitter_users, and wrap each one in an HTML <li> tag.

Including the HTML

This one is pretty straightforward, as Liquid (Jekyll's templating engine) supports the inclusion of partials/fragments.

  • Create an _includes directory in your Jekyll directory (not in _site)

  • Add the liquid include line {% include twitter.html %} where the latest updates will go (for me this was in index.html)

  • Now you can populate _includes/twitter.html by running the above Ruby script and dumping the output to file, ala: ruby ~/bin/script.rb > _includes/twitter.html.

Finishing up

You will need to regenerate the site, and you would ideally run it in auto mode. Now, whenever the cron job updates the HTML file, Jekyll will regenerate the relevant files for you.

There you have it, seamless Twitter and Jekyll integration.

Filed under  //   blog   geekery   jekyll   ruby   twitter  

Comments [0]

Getting to grips with Jekyll

Justkez.com » A brief posting

Having recently heard about Jekyll from a Hacker News posting, I felt the urge of website change stir within me.

Justkez.com has been run through Wordpress for many years now (since 2006), but it has got to a point where I just wasn't comfortable with it - felt too clunky.

Why Jekyll?

Jekyll is a static site generator written in Ruby, with support for Markdown documents for blog posts. I have been writing postings in Markdown for the last year, so it seemed a logical move. With the working world taking over a lot of my time, I become less bothered about the latest plugins, or publishing widgets.

As such, a few of the old posts didn't make the cut, and some of the more popular ones have been polished up a bit; there shouldn't be too many lost links.

Speed boost

In a very rudimentary test, the previous Wordpress version could dish out 3.5 requests per second (ab -n 20), the statically generated Jekyll version is serving at 24 requests per second; something you would call statistically significant.

Living with Jekyll

As a result of the migrating...

  • Writing posts is much more straightforward, and I can punt the markdown files around on whichever computer is to hand

  • I have learned to --love-- use the Liquid templating engine; not quite my ideal, but it works

  • I am a lot more inclined to put postings together; everything just feels cleaner

So, if you're tempted by any of the above, give Jekyll a go.


Justkez.com is produced by Kester Dobson.
Feed icon There is an Atom feed available for syndication, and some archives to browse.

Filed under  //   blog   geekery   jekyll   ruby   server  

Comments [0]

Migrating Justkez to Linode

There has been some intermittent down time of the site over the last couple of weeks; apologies.

The first bout was moving from Slicehost over to Linode - something I toyed with some months ago, but was finally swayed by Eivind Uggedal's excellent comparison of VPS performance. Whilst I'm sure there are other performance metrics that would put Slicehost in the lead, there was more bang for less buck at Linode.

The second bout was migrating the baby-fresh Linode image to the recently announced London server, reducing average ping latency from 86ms to 4ms - nearly 12x faster.

Filed under  //   blog   geekery   server  

Comments [0]