by Adelbert Chang on Mar 13, 2016

technical

Code should be reusable. An expression traversing a data structure shouldn’t be written multiple times, it should be pulled out into a generic traversal function. At a larger scale, a random number generator shouldn’t be written multiple times, but rather pulled out into a module that can be used by others.

It is important that such abstractions must be done carefully. Often times a type is visible to the caller, and if the type is not handled carefully the abstraction can leak.

For example, a set with fast random indexing (useful for random
walks on a graph) can be implemented with a sorted `Vector`

.
However, if the `Vector`

type is
leaked, the user can use this knowledge to violate the invariant.

```
import scala.annotation.tailrec
/** (i in repr, position of i in repr) */
def binarySearch(i: Int, repr: Vector[Int]): (Boolean, Int) = /* elided */
object IntSet {
type Repr = Vector[Int]
def empty: Repr = Vector.empty
def add(i: Int, repr: Repr): Repr = {
val (isMember, indexOf) = binarySearch(i, repr)
if (isMember) repr
else {
val (prefix, suffix) = repr.splitAt(indexOf)
prefix ++ Vector(i) ++ suffix
}
}
def contains(i: Int, repr: Repr): Boolean =
binarySearch(i, repr)._1
}
```

```
import IntSet._
// import IntSet._
val good = add(1, add(10, add(5, empty)))
// good: IntSet.Repr = Vector(1, 5, 10)
val goodResult = contains(10, good)
// goodResult: Boolean = true
val bad = good.reverse // We know it's a Vector!
// bad: scala.collection.immutable.Vector[Int] = Vector(10, 5, 1)
val badResult = contains(10, bad)
// badResult: Boolean = false
val bad2 = Vector(10, 5, 1) // Alternatively..
// bad2: scala.collection.immutable.Vector[Int] = Vector(10, 5, 1)
val badResult2 = contains(10, bad2)
// badResult2: Boolean = false
```

The issue here is the user knows more about the representation than they
should. The function `add`

enforces the sorted invariant on each insert,
and the function `contains`

leverages this to do an efficient look-up.
Because the `Vector`

definition of `Repr`

is exposed, the user is
free to create any `Vector`

they wish which may violate the invariant,
thus breaking `contains`

.

In general, the **name** of the representation type is needed but the
**definition** is not. If the definition is hidden, the user is only able to
work with the type to the extent the module allows. This is precisely
the notion of information hiding. If this can be enforced by the type
system, modules can be swapped in and out without worrying about breaking
client code.

It turns out there is a well understood principle
behind this idea called *existential quantification*. Contrast with
universal quantification which says “for all”, existential quantification
says “there is a.”

Below is an encoding of universal quantification via parametric polymorphism.

```
trait Universal {
def apply[A]: A => A
}
```

Here `Universal#apply`

says *for all* choices of `A`

, a function `A => A`

can be
written. In the Curry-Howard Isomorphism, a profound
relationship between logic and computation, this translates to “for all propositions
`A`

, `A`

implies `A`

.” It is therefore acceptable to write the following, which picks
`A`

to be `Int`

.

```
def intInstantiatedU(u: Universal): Int => Int =
(i: Int) => u.apply(i)
// intInstantiatedU: (u: Universal)Int => Int
```

Existential quantification can also be written in Scala.

```
trait Existential {
type A
def apply: A => A
}
```

Note that this is just one way of encoding existentials - for a deeper discussion, refer to the excellent Type Parameters and Type Members blog series.

The type parameter on `apply`

has been moved up to a type member of the trait.
Practically, this means every instance of `Existential`

must pick **one** choice of
`A`

, whereas in `Universal`

the `A`

was parameterized and therefore free. In the
language of logic, `Existential#apply`

says “there is a” or “there exists some `A`

such that
`A`

implies `A`

.” This “there is a” is the crux of the error when trying
to write a corresponding `intExistential`

function.

```
def intInstantiatedE(e: Existential): Int => Int =
(i: Int) => e.apply(i)
// <console>:19: error: type mismatch;
// found : i.type (with underlying type Int)
// required: e.A
// (i: Int) => e.apply(i)
// ^
```

In code, the type in `Existential`

is chosen per-instance, so there is no way
of knowing what the actual type chosen is. In logical terms, the only guarantee is
that there exists some proposition that satisfies the implication, but it is not
necessarily the case (and often is not) it holds for all propositions.

In the ML family of languages (e.g. Standard ML, OCaml), existential quantification and thus information hiding, is achieved through type members. Programs are organized into modules which are what contain these types.

In Scala, this translates to organizing code with the object system, using the same
type member feature to hide representation. The earlier example of `IntSet`

can then
be written:

```
/** Abstract signature */
trait IntSet {
type Repr
def empty: Repr
def add(i: Int, repr: Repr): Repr
def contains(i: Int, repr: Repr): Boolean
}
/** Concrete implementation */
object VectorIntSet extends IntSet {
type Repr = Vector[Int]
def empty: Repr = Vector.empty
def add(i: Int, repr: Repr): Repr = {
val (isMember, indexOf) = binarySearch(i, repr)
if (isMember) repr
else {
val (prefix, suffix) = repr.splitAt(indexOf)
prefix ++ Vector(i) ++ suffix
}
}
def contains(i: Int, repr: Repr): Boolean =
binarySearch(i, repr)._1
}
```

As long as client code is written against the signature, the representation cannot be leaked.

```
def goodUsage(set: IntSet) = {
import set._
val s = add(1, add(10, add(5, empty)))
contains(5, s)
}
// goodUsage: (set: IntSet)Boolean
```

If the user tries to assert the representation type, the type
checker prevents it **at compile time**.

```
def badUsage(set: IntSet) = {
import set._
val s = add(10, add(1, empty))
// Maybe it's a Vector
s.reverse
contains(10, Vector(10, 5, 1))
}
// <console>:23: error: value reverse is not a member of set.Repr
// s.reverse
// ^
// <console>:24: error: type mismatch;
// found : scala.collection.immutable.Vector[Int]
// required: set.Repr
// contains(10, Vector(10, 5, 1))
// ^
```

Abstract types enforce information hiding at the definition site (the definition
of `IntSet`

is what hides `Repr`

). There is another mechanism that enforces information
hiding, which pushes the constraint to the use site.

Consider implementing the following function.

```
def foo[A](a: A): A = ???
```

Given nothing is known about `a`

, the only possible thing `foo`

can do is return `a`

. If
instead of a type parameter the function was given more information..

```
def bar(a: String): String = "not even going to use `a`"
```

..that information can be leveraged to do unexpected things. This is similar to
the first `IntSet`

example when knowledge of the underlying `Vector`

allowed unintended
behavior to occur.

From the outside looking in, `foo`

is universally quantified - the caller gets to
pick any `A`

they want. From the inside looking out, it is
existentially quantified - the implementation knows only as much
about `A`

as there are constraints on `A`

(in this case, nothing).

Consider another function `listReplace`

.

```
def listReplace[A, B](as: List[A], b: B): List[B] = ???
```

Given the type parameters, `listReplace`

looks fairly constrained. The name and signature
suggests it takes each element of `as`

and replaces it with `b`

, returning a new list.
However, even knowledge of `List`

can lead to type checking implementations with strange behavior.

```
// Completely ignores the input parameters
def listReplace[A, B](as: List[A], b: B): List[B] = List.empty[B]
```

Here, knowledge of `List`

allows the implementation
to create a list out of thin air and use that in the implementation. If instead `listReplace`

only knew about some `F[_]`

where `F`

is a `Functor`

, the implementation becomes much more
constrained.

```
trait Functor[F[_]] {
def map[A, B](fa: F[A])(f: A => B): F[B]
}
implicit val listFunctor: Functor[List] =
new Functor[List] {
def map[A, B](fa: List[A])(f: A => B): List[B] =
fa.map(f)
}
def replace[F[_]: Functor, A, B](fa: F[A], b: B): F[B] =
implicitly[Functor[F]].map(fa)(_ => b)
```

```
replace(List(1, 2, 3), "typelevel")
// res8: List[String] = List(typelevel, typelevel, typelevel)
```

Absent any knowledge of `F`

other than the ability to `map`

over it, `replace`

is
forced to do the correct thing. Put differently, irrelevant information about `F`

is hidden.

The fundamental idea behind this is known as parametricity, made popular by Philip Wadler’s seminal Theorems for free! paper. The technique is best summarized by the following excerpt from the paper:

Write down the definition of a polymorphic function on a piece of paper. Tell me its type, but be careful not to let me see the function’s definition. I will tell you a theorem that the function satisfies.

Information hiding is a core tenet of good program design, and it is important to make
sure it is enforced. Underlying information hiding is existential quantification,
which can manifest itself in computation through abstract types and
parametricity. Few languages support defining abstract type members, and fewer
yet support higher-kinded types used in the `replace`

example. It is therefore
to the extent that a language’s type system is expressive that
abstraction can be enforced.

*This blog post was tested with Scala 2.11.7 using tut.*