Disjoint Sets

DisjointSets provides a purely functional implementation for the union-find collection. An Union-Find (aka Disjoint Sets) structure is a set of sets where the intersection of any two sets is empty.

This constraint opens the way to fast unions (O(1) average). That makes of Disjoint Sets the perfect tool for clustering algorithms such as calculating the connected components in a graph.

Initially, it is a flat collection where each element forms its own, size 1, disjoint set. New elements are added as new disjoint sets and union operations can be used to fusion these sets. The joined sets are represented by one of its elements known as label or root.

Fast fusion of disjoints sets is key and its provided through parenthood relations. Sets labels are always the top-level ancestor.

The following example shows 3 disjoint sets of size 1.

Whereas, in the next one, C is the parent of the {A, B, C} set which is the only set in the DisjointSets structure.

Supported operations

Example usage

import DisjointSets._

val operations = for {
  _ <- union(1,2)
  oneAndTwo <- find(2)
  _ <- union(3,4)
  threeAndFour <- find(3)
  _ <- union(2,3)
  sets <- toSets
} yield sets

val label2disjointset: Map[Int, Set[Int]] = operations.runA(DisjointSets(1,2,3,4)).value

Structure and performance

The main idea is that each element starts as a disjoint set itself. A set with two or more elements is always the result of one or several union operations. Thus, a multi-element set is composed of sets of just one element, call them components.

Each component has 3 fields:

Let’s assume that the next operations are executed:

dsets.union(B,A) //1
dsets.union(B,C) //2

From a mathematical point of view, the result should be similar to the one shown below:

However, in order to improve lookups performance, some optimizations need to be applied. Therefore, with optimized operations, the resulting structure after (1) and (2) is:

Each set is a tree represented by its root. Therefore, looking for the set an element belongs to is not more than following its parental relations until the root of the tree is reached. So the shallower the tree is, the fewer the operations to be performed in the lookup are.

On the other hand, the operation where two sets are merged (union) consist on making one of the two trees to become a branch of the other:

Heuristics

These are two optimizations whose goal is to reduce the depth of each tree representing a set. The lower the tree height is, the fewer operations are needed to find one element set label.

Considering DisjointSets structure as a forest of tree-sets, let’s compute as the maximum tree height of the forest. The maximum number of operations to complete before finding one element in one set is proportional to the number of jumps from the deepest leaf to its tree root. That is .

There are two heuristics designed to keep these tress low: