Information hiding, enforced

by Adelbert Chang on Mar 13, 2016

technical

Code should be reusable. An expression traversing a data structure shouldn’t be written multiple times, it should be pulled out into a generic traversal function. At a larger scale, a random number generator shouldn’t be written multiple times, but rather pulled out into a module that can be used by others.

It is important that such abstractions must be done carefully. Often times a type is visible to the caller, and if the type is not handled carefully the abstraction can leak.

For example, a set with fast random indexing (useful for random walks on a graph) can be implemented with a sorted Vector. However, if the Vector type is leaked, the user can use this knowledge to violate the invariant.

import scala.annotation.tailrec

/** (i in repr, position of i in repr) */
def binarySearch(i: Int, repr: Vector[Int]): (Boolean, Int) = /* elided */

object IntSet {
  type Repr = Vector[Int]

  def empty: Repr = Vector.empty

  def add(i: Int, repr: Repr): Repr = {
    val (isMember, indexOf) = binarySearch(i, repr)
    if (isMember) repr
    else {
      val (prefix, suffix) = repr.splitAt(indexOf)
      prefix ++ Vector(i) ++ suffix
    }
  }

  def contains(i: Int, repr: Repr): Boolean =
    binarySearch(i, repr)._1
}
import IntSet._
// import IntSet._

val good = add(1, add(10, add(5, empty)))
// good: IntSet.Repr = Vector(1, 5, 10)

val goodResult = contains(10, good)
// goodResult: Boolean = true

val bad = good.reverse // We know it's a Vector!
// bad: scala.collection.immutable.Vector[Int] = Vector(10, 5, 1)

val badResult = contains(10, bad)
// badResult: Boolean = false

val bad2 = Vector(10, 5, 1) // Alternatively..
// bad2: scala.collection.immutable.Vector[Int] = Vector(10, 5, 1)

val badResult2 = contains(10, bad2)
// badResult2: Boolean = false

The issue here is the user knows more about the representation than they should. The function add enforces the sorted invariant on each insert, and the function contains leverages this to do an efficient look-up. Because the Vector definition of Repr is exposed, the user is free to create any Vector they wish which may violate the invariant, thus breaking contains.

In general, the name of the representation type is needed but the definition is not. If the definition is hidden, the user is only able to work with the type to the extent the module allows. This is precisely the notion of information hiding. If this can be enforced by the type system, modules can be swapped in and out without worrying about breaking client code.

Quantification

It turns out there is a well understood principle behind this idea called existential quantification. Contrast with universal quantification which says “for all”, existential quantification says “there is a.”

Below is an encoding of universal quantification via parametric polymorphism.

trait Universal {
  def apply[A]: A => A
}

Here Universal#apply says for all choices of A, a function A => A can be written. In the Curry-Howard Isomorphism, a profound relationship between logic and computation, this translates to “for all propositions A, A implies A.” It is therefore acceptable to write the following, which picks A to be Int.

def intInstantiatedU(u: Universal): Int => Int =
  (i: Int) => u.apply(i)
// intInstantiatedU: (u: Universal)Int => Int

Existential quantification can also be written in Scala.

trait Existential {
  type A

  def apply: A => A
}

Note that this is just one way of encoding existentials - for a deeper discussion, refer to the excellent Type Parameters and Type Members blog series.

The type parameter on apply has been moved up to a type member of the trait. Practically, this means every instance of Existential must pick one choice of A, whereas in Universal the A was parameterized and therefore free. In the language of logic, Existential#apply says “there is a” or “there exists some A such that A implies A.” This “there is a” is the crux of the error when trying to write a corresponding intExistential function.

def intInstantiatedE(e: Existential): Int => Int =
  (i: Int) => e.apply(i)
// <console>:19: error: type mismatch;
//  found   : i.type (with underlying type Int)
//  required: e.A
//          (i: Int) => e.apply(i)
//                              ^

In code, the type in Existential is chosen per-instance, so there is no way of knowing what the actual type chosen is. In logical terms, the only guarantee is that there exists some proposition that satisfies the implication, but it is not necessarily the case (and often is not) it holds for all propositions.

Abstract types

In the ML family of languages (e.g. Standard ML, OCaml), existential quantification and thus information hiding, is achieved through type members. Programs are organized into modules which are what contain these types.

In Scala, this translates to organizing code with the object system, using the same type member feature to hide representation. The earlier example of IntSet can then be written:

/** Abstract signature */
trait IntSet {
  type Repr

  def empty: Repr
  def add(i: Int, repr: Repr): Repr
  def contains(i: Int, repr: Repr): Boolean
}

/** Concrete implementation */
object VectorIntSet extends IntSet {
  type Repr = Vector[Int]

  def empty: Repr = Vector.empty

  def add(i: Int, repr: Repr): Repr = {
    val (isMember, indexOf) = binarySearch(i, repr)
    if (isMember) repr
    else {
      val (prefix, suffix) = repr.splitAt(indexOf)
      prefix ++ Vector(i) ++ suffix
    }
  }

  def contains(i: Int, repr: Repr): Boolean =
    binarySearch(i, repr)._1
}

As long as client code is written against the signature, the representation cannot be leaked.

def goodUsage(set: IntSet) = {
  import set._
  val s = add(1, add(10, add(5, empty)))
  contains(5, s)
}
// goodUsage: (set: IntSet)Boolean

If the user tries to assert the representation type, the type checker prevents it at compile time.

def badUsage(set: IntSet) = {
  import set._
  val s = add(10, add(1, empty))

  // Maybe it's a Vector
  s.reverse
  contains(10, Vector(10, 5, 1))
}
// <console>:23: error: value reverse is not a member of set.Repr
//          s.reverse
//            ^
// <console>:24: error: type mismatch;
//  found   : scala.collection.immutable.Vector[Int]
//  required: set.Repr
//          contains(10, Vector(10, 5, 1))
//                             ^

Parametricity

Abstract types enforce information hiding at the definition site (the definition of IntSet is what hides Repr). There is another mechanism that enforces information hiding, which pushes the constraint to the use site.

Consider implementing the following function.

def foo[A](a: A): A = ???

Given nothing is known about a, the only possible thing foo can do is return a. If instead of a type parameter the function was given more information..

def bar(a: String): String = "not even going to use `a`"

..that information can be leveraged to do unexpected things. This is similar to the first IntSet example when knowledge of the underlying Vector allowed unintended behavior to occur.

From the outside looking in, foo is universally quantified - the caller gets to pick any A they want. From the inside looking out, it is existentially quantified - the implementation knows only as much about A as there are constraints on A (in this case, nothing).

Consider another function listReplace.

def listReplace[A, B](as: List[A], b: B): List[B] = ???

Given the type parameters, listReplace looks fairly constrained. The name and signature suggests it takes each element of as and replaces it with b, returning a new list. However, even knowledge of List can lead to type checking implementations with strange behavior.

// Completely ignores the input parameters
def listReplace[A, B](as: List[A], b: B): List[B] = List.empty[B]

Here, knowledge of List allows the implementation to create a list out of thin air and use that in the implementation. If instead listReplace only knew about some F[_] where F is a Functor, the implementation becomes much more constrained.

trait Functor[F[_]] {
  def map[A, B](fa: F[A])(f: A => B): F[B]
}

implicit val listFunctor: Functor[List] =
  new Functor[List] {
    def map[A, B](fa: List[A])(f: A => B): List[B] =
      fa.map(f)
  }

def replace[F[_]: Functor, A, B](fa: F[A], b: B): F[B] =
  implicitly[Functor[F]].map(fa)(_ => b)
replace(List(1, 2, 3), "typelevel")
// res8: List[String] = List(typelevel, typelevel, typelevel)

Absent any knowledge of F other than the ability to map over it, replace is forced to do the correct thing. Put differently, irrelevant information about F is hidden.

The fundamental idea behind this is known as parametricity, made popular by Philip Wadler’s seminal Theorems for free! paper. The technique is best summarized by the following excerpt from the paper:

Write down the definition of a polymorphic function on a piece of paper. Tell me its type, but be careful not to let me see the function’s definition. I will tell you a theorem that the function satisfies.

Why types matter

Information hiding is a core tenet of good program design, and it is important to make sure it is enforced. Underlying information hiding is existential quantification, which can manifest itself in computation through abstract types and parametricity. Few languages support defining abstract type members, and fewer yet support higher-kinded types used in the replace example. It is therefore to the extent that a language’s type system is expressive that abstraction can be enforced.

This blog post was tested with Scala 2.11.7 using tut.