Simple Hierarchical Clustering
Get Version
0.1.5→ ‘hierclust’
What
Given a set of points, organizes them into clusters. You can either have it continue clustering until all the clusters are organized into larger clusters, or tell it to stop once a certain minimum level of separation between clusters has been reached.
Useful for taking a large set of points to be plotted on a map, and reducing them to a smaller number of clusters, separated enough so that the map remains legible.
Installing
sudo gem install hierclust
The basics
points = (1..6).map { Hierclust::Point.new(rand(10), rand(10)) } clusterer = Hierclust::Clusterer.new(points) puts clusterer.clusters # => [[[(4, 9), (4, 8)], (9, 6)], [[(1, 4), (3, 1)], (6, 3)]]
Demonstration of usage
Let’s say you have an existing set of objects with latitudes and longitudes, and you want to organize them into clusters that are separated by at least 5 degrees (for simplicity’s sake we’ll pretend that latitudes and longitude form a rectangular grid).
require 'hierclust'
Start by extending the built-in Point class so that it can maintain a reference to your data:
class Hierclust::Point attr_accessor :data end
Then turn your data into a set of points:
dataset = MyGeocodedThing.find(:all) points = dataset.map do |thing| point = Hierclust::Point.new(thing.lon, thing.lat) point.data = thing point end
Then tell Hierclust to cluster those points to at least 5 degrees separation:
clusterer = Hierclust::Clusterer.new(points, 5) clusters = clusterer.clusters
Then do what you will with your clusters:
map = MapThing.new clusters.each do |cluster| map.add_point( x => cluster.x, y => cluster.y, label => "#{cluster.points} Things" ) end
Documentaion
API documentation: RDoc
Forum
http://groups.google.com/group/hierclust
Source code
You can browse the source at http://hierclust.rubyforge.org/svn/trunk/
How to submit patches
Read the 8 steps for fixing other people’s code and for section 8b: Submit patch to Google Groups, use the Google Group above.
License
This code is free to use under the terms of the MIT license.
Contact
Comments are welcome. Send an email to Brandt Kurowski email via the forum
Brandt Kurowski, 6th February 2008
Theme extended from Paul Battley