Triplequote are sponsoring Typelevel, click here to learn how you can help them help us (and speed up your builds into the bargain)

*This is a guest post by Tomas Mikula. It was initially published as a document in the hasheq. It has been slightly edited and is being republished here with the permission of the original author.*

This article describes what we mean when we say that the data structures in this library are *equivalence-aware* in a *type-safe* fashion.

*Set* is a data structure that doesn’t contain *duplicate* elements. An implementation of *Set* must therefore have a way to compare elements for *“sameness”*.
A useful notion of sameness is *equivalence*, i.e. a binary relation that is *reflexive*, *symmetric* and *transitive*.
Any reasonable implementation of *Set* is equipped with *some* equivalence relation on its element type.

Here’s the catch: For any type with more than one inhabitant there are *multiple* valid equivalence relations.
We cannot (in general) pick one that is suitable in all contexts.
For example, are these two binary trees *same*?

```
+ +
/ \ / \
1 + + 3
/ \ / \
2 3 1 2
```

It depends on the context. They clearly have different structure, but they are both binary search trees containing the same elements.
For a balancing algorithm, they are different trees, but as an implementation of *Set*, they represent the same set of integers.

Despite the non-uniqueness, there is one equivalence relation that stands out: *equality*.
Two objects are considered *equal* when they are *indistinguishable* to an observer.
Formally, equality is required to have the *substitution property:*

\[ \forall a,b \in A, \forall f \in (A \to B): a=_A b \implies f(a)=_B f(b) \]

(Here, $=_A$ denotes equality on $A$, $=_B$ denotes equality on $B$.)

Equality is the finest equivalence: whenever two elements are *equal*, they are necessarily *equivalent* with respect to every equivalence.

Popular Scala libraries take one of these two approaches when dealing with comparing elements for *“sameness”*.

The current approach of cats is *equality*.
Instances of the `cats.Eq[A]`

typeclass are required to have all the properties of equality, including the substitution property above.
The problem with this approach is that for some types, such as `Set[Int]`

, equality is too strict to be useful:
Are values `Set(1, 2)`

and `Set(2, 1)`

*equal*?
For that to be true, they have to be indistinguishable by any function.
Let’s try `(_.toList)`

:

```
scala> Set(1, 2).toList == Set(2, 1).toList
res0: Boolean = false
```

So, `Set(1, 2)`

and `Set(2, 1)`

are clearly *not* equal.
As a result, we cannot use `Set[Int]`

in a context where equality is required (without cheating).

On the other hand, scalaz uses unspecified *equivalence*.
Although the name `scalaz.Equal[A]`

might suggest *equality*, instances of this typeclass are only tested for properties of *equivalence*.
As mentioned above, there are multiple *valid* equivalence relations for virtually any type.
When there are also multiple *useful* equivalences for a type, we are at risk of mixing them up (and the fact that they are usually resolved as implicit arguments only makes things worse).

Let’s look at how *we* deal with this issue. We define typeclass `Equiv`

with an extra type parameter that serves as a *“tag”* identifying the meaning of the equivalence.

```
trait Equiv[A, Eq] {
def equiv(a: A, b: A): Boolean
}
// defined trait Equiv
```

For the compiler, the “tag” is an opaque type. It only has specific meaning for humans. The only meaning it has for the compiler is that different tags represent (intensionally) different equivalence relations.

An *equivalence-aware* data structure then carries in its *type* the tag of the equivalence it uses.

```
import hasheq._
// import hasheq._
import hasheq.immutable._
// import hasheq.immutable._
import hasheq.std.int._
// import hasheq.std.int._
```

```
scala> HashSet(1, 2, 3, 4, 5)
res0: hasheq.immutable.HashSet[Int] = HashSetoid(5, 1, 2, 3, 4)
```

What on earth is `HashSetoid`

?
A *setoid* is an *equivalence-aware set*.
`HashSetoid`

is then just a setoid implementated using hash-table.
Let’s look at the definition of `HashSet`

:

```
type HashSet[A] = HashSetoid[A, Equality.type]
```

So `HashSet`

is just a `HashSetoid`

whose equivalence is *equality*.
To create an instance of `HashSet[Int]`

above, we needed to have an implicit instance of `Equiv[Int, Equality.type]`

in scope.

```
implicitly[Equiv[Int, Equality.type]]
```

For the compiler, `Equality`

is just a rather arbitrary singleton object.
It only has the meaning of mathematical *equality* for us, humans.

There is a convenient type alias provided for *equality* relation:

```
type Equal[A] = Equiv[A, Equality.type]
```

```
implicitly[Equal[Int]]
```

So how do we deal with the problem of set equality mentioned above, i.e. that `HashSet(1, 2)`

and `HashSet(2, 1)`

are not truly *equal*?
We just don’t provide a definition of equality for `HashSet[Int]`

.

```
scala> implicitly[Equal[HashSet[Int]]]
<console>:22: error: could not find implicit value for parameter e: hasheq.Equal[hasheq.immutable.HashSet[Int]]
implicitly[Equal[HashSet[Int]]]
^
```

But that means we cannot have a `HashSet[HashSet[Int]]`

!
(Remember, for a `HashSet[A]`

, we need an instance of `Equal[A]`

, and we just showed we don’t have an instance of `Equal[HashSet[Int]]`

.)

```
scala> HashSet(HashSet(1, 2, 3, 4, 5))
<console>:22: error: could not find implicit value for parameter A: hasheq.Hash[hasheq.immutable.HashSet[Int]]
HashSet(HashSet(1, 2, 3, 4, 5))
^
```

But we can have a `HashSetoid[HashSet[Int], E]`

, where `E`

is *some* equivalence on `HashSet[Int]`

.

```
scala> HashSet.of(HashSet(1, 2, 3, 4, 5))
res5: hasheq.immutable.HashSetoid[hasheq.immutable.HashSet[Int],hasheq.immutable.Setoid.ContentEquiv[Int,hasheq.Equality.type]] = HashSetoid(HashSetoid(5, 1, 2, 3, 4))
```

`HashSet.of(elems)`

is like `HashSet(elems)`

, except it tries to infer the equivalence on the element type, instead of requiring it to be equality.

Notice the *equivalence tag*: `Setoid.ContentEquiv[Int, Equality.type]`

.
Its meaning is (again, for humans only) that two setoids are equivalent when they contain the same elements (here, of type `Int`

), as compared by the given equivalence of elements (here, `Equality`

).

The remaining question is: How does this work in the presence of *multiple useful equivalences?*

Let’s define another equivalence on `Int`

(in addition to the provided equality).

```
// Our "tag" for equivalence modulo 10.
// This trait will never be instantiated.
sealed trait Mod10
// Provide equivalence tagged by Mod10.
implicit object EqMod10 extends Equiv[Int, Mod10] {
def mod10(i: Int): Int = {
val r = i % 10
if (r < 0) r + 10
else r
}
def equiv(a: Int, b: Int): Boolean = mod10(a) == mod10(b)
}
// Provide hash function compatible with equivalence modulo 10.
// Note that the HashEq typeclass is also tagged by Mod10.
implicit object HashMod10 extends HashEq[Int, Mod10] {
def hash(a: Int): Int = EqMod10.mod10(a)
}
```

Now let’s create a “setoid of sets of integers”, as before.

```
scala> HashSet.of(HashSet(1, 2, 3, 4, 5))
res13: hasheq.immutable.HashSetoid[hasheq.immutable.HashSet[Int],hasheq.immutable.Setoid.ContentEquiv[Int,hasheq.Equality.type]] = HashSetoid(HashSetoid(5, 1, 2, 3, 4))
```

This still works, because `HashSet`

requires an *equality* on `Int`

, and there is only one in the implicit scope (the newly defined equivalence `EqMod10`

is *not* equality).
Let’s try to create a “setoid of setoids of integers”:

```
scala> HashSet.of(HashSet.of(1, 2, 3, 4, 5))
<console>:24: error: ambiguous implicit values:
both method hashInstance in object int of type => hasheq.Hash[Int]
and object HashMod10 of type HashMod10.type
match expected type hasheq.HashEq[Int,Eq]
HashSet.of(HashSet.of(1, 2, 3, 4, 5))
^
```

This fails, because there are now more equivalences on `Int`

in scope.
(There are now also multiple hash functions, which is what the error message actually says.)
We need to be more specific:

```
scala> HashSet.of(HashSet.of[Int, Mod10](1, 2, 3, 4, 5))
res15: hasheq.immutable.HashSetoid[hasheq.immutable.HashSetoid[Int,Mod10],hasheq.immutable.Setoid.ContentEquiv[Int,Mod10]] = HashSetoid(HashSetoid(5, 1, 2, 3, 4))
```

Finally, does it **prevent mixing up equivalences**? Let’s see:

```
scala> val s1 = HashSet(1, 2, 3, 11, 12, 13 )
s1: hasheq.immutable.HashSet[Int] = HashSetoid(1, 13, 2, 12, 3, 11)
scala> val s2 = HashSet( 2, 3, 4, 5, 13, 14)
s2: hasheq.immutable.HashSet[Int] = HashSetoid(5, 14, 13, 2, 3, 4)
scala> val t1 = HashSet.of[Int, Mod10](1, 2, 3, 11, 12, 13 )
t1: hasheq.immutable.HashSetoid[Int,Mod10] = HashSetoid(1, 2, 3)
scala> val t2 = HashSet.of[Int, Mod10]( 2, 3, 4, 5, 13, 14)
t2: hasheq.immutable.HashSetoid[Int,Mod10] = HashSetoid(5, 2, 3, 4)
```

Combining compatible setoids:

```
scala> s1 union s2
res16: hasheq.immutable.HashSetoid[Int,hasheq.Equality.type] = HashSetoid(5, 14, 1, 13, 2, 12, 3, 11, 4)
scala> t1 union t2
res17: hasheq.immutable.HashSetoid[Int,Mod10] = HashSetoid(5, 1, 2, 3, 4)
```

Combining incompatible setoids:

```
scala> s1 union t2
<console>:26: error: type mismatch;
found : hasheq.immutable.HashSetoid[Int,Mod10]
required: hasheq.immutable.HashSetoid[Int,hasheq.Equality.type]
s1 union t2
^
scala> t1 union s2
<console>:26: error: type mismatch;
found : hasheq.immutable.HashSet[Int]
(which expands to) hasheq.immutable.HashSetoid[Int,hasheq.Equality.type]
required: hasheq.immutable.HashSetoid[Int,Mod10]
t1 union s2
^
```

We went one step further in the direction of type-safe equivalence in Scala compared to what is typically seen out in the wild today.
There is nothing very sophisticated about this encoding.
I think the major win is that we can design APIs so that the extra type parameter (the “equivalence tag”) stays unnoticed by the user of the API as long as they only deal with *equalities*.
As soon as the equivalence tag starts requesting our attention (via an ambiguous implicit or a type error), it is likely that the attention is justified.

*This article was tested with Scala 2.11.8 and hasheq version 0.3.*

Unless otherwise noted, all content is licensed under a Creative Commons Attribution 3.0 Unported License.

Back to blog