Simple Hierarchical Clustering

Get Version

0.1.5

→ ‘hierclust’

What

Given a set of points, organizes them into clusters. You can either have it continue clustering until all the clusters are organized into larger clusters, or tell it to stop once a certain minimum level of separation between clusters has been reached.

Useful for taking a large set of points to be plotted on a map, and reducing them to a smaller number of clusters, separated enough so that the map remains legible.

Installing

sudo gem install hierclust

The basics

points = (1..6).map { Hierclust::Point.new(rand(10), rand(10)) }
clusterer = Hierclust::Clusterer.new(points)
puts clusterer.clusters # => [[[(4, 9), (4, 8)], (9, 6)], [[(1, 4), (3, 1)], (6, 3)]]

Demonstration of usage

Let’s say you have an existing set of objects with latitudes and longitudes, and you want to organize them into clusters that are separated by at least 5 degrees (for simplicity’s sake we’ll pretend that latitudes and longitude form a rectangular grid).

require 'hierclust'

Start by extending the built-in Point class so that it can maintain a reference to your data:

class Hierclust::Point
  attr_accessor :data
end

Then turn your data into a set of points:

dataset = MyGeocodedThing.find(:all)
points = dataset.map do |thing|
  point = Hierclust::Point.new(thing.lon, thing.lat)
  point.data = thing
  point
end

Then tell Hierclust to cluster those points to at least 5 degrees separation:

clusterer = Hierclust::Clusterer.new(points, 5)
clusters = clusterer.clusters

Then do what you will with your clusters:

map = MapThing.new
clusters.each do |cluster|
  map.add_point(
    x => cluster.x,
    y => cluster.y,
    label => "#{cluster.points} Things"
  )
end

Documentaion

API documentation: RDoc

Forum

http://groups.google.com/group/hierclust

Source code

You can browse the source at http://hierclust.rubyforge.org/svn/trunk/

How to submit patches

Read the 8 steps for fixing other people’s code and for section 8b: Submit patch to Google Groups, use the Google Group above.

License

This code is free to use under the terms of the MIT license.

Contact

Comments are welcome. Send an email to Brandt Kurowski email via the forum

Brandt Kurowski, 6th February 2008
Theme extended from Paul Battley