by Stephen Compall on Sep 21, 2015
technical
This is the seventh of a series of articles on “Type Parameters and Type Members”. You may wish to start at the beginning; more specifically, this post is meant as a followup to the previous entry. However, in a first for this series, it stands on its own, as introductory matter.
A program is a system for converting data from one format to another, which we have endowed with the color of magic. In typed programming, we use a constellation of types to mediate this transformation; a function’s result can only be passed as another function’s argument to the extent to which those parts of the functions’ types unify.
We rely on the richness of our types in these descriptions. So it is
natural to want the types to change as you move to different parts of
the process; each change reflects the reality of what has just
happened. For example, when you parse a string into an AST, your
program’s state has changed types, from String
to MyAST
.
But, as we have just seen, due to decisions we have made to simplify our lives, values cannot change types, no matter how important it is to the sanity of our code. At the same time, we don’t want to give up the richness of using more than one type to describe our data.
Fortunately, there is a solution that satisfies these competing concerns: to change types, change values. You can’t do anything about the values you have, but you can create new ones of the right type, and use those instead.
In values with complex construction semantics, it is common to write
imperative programs that leave “holes” in the data structures using
the terrible null
misfeature of Java, Scala, and many other
languages. This looks something like this.
class Document(filename: Path) {
// this structure has three parts:
var text: String = null // ← a body of text,
var wordIndex: Map[String, List[Int]] = null
// ↑ an index of words to every
// occurrence in the text,
var mostPopular: List[(String, Int)] = null
// ↑ the most frequently used words
// in the text, and their number of
// occurrences
...
Now, we must fill in these variables, by computing and assigning to each in turn. First, we compute the corpus text.
initText()
Then, we compute and fill in the word index. If we didn’t fill in
text
first, this compiles, but crashes at runtime.
initWordIndex()
Finally, we figure out which words are most popular. If we didn’t
fill in wordIndex
first, this compiles, but crashes.
initMostPopular()
How do I know that? Well, I have to inspect the definitions of these three methods.
def initText(): Unit =
text = Source.fromFile(filename.toFile).mkString
def initWordIndex(): Unit = {
val words = """\w+""".r findAllMatchIn text
wordIndex = words.foldLeft(Map[String, List[Int]]()){
(m, mtch) =>
val word = mtch.matched
val idx = mtch.start
m + (word -> (idx :: m.getOrElse(word, Nil)))
}
}
def initMostPopular(): Unit =
mostPopular = wordIndex.mapValues(_.size).toList
.sortBy(p => 0 - p._2).take(10)
This method of organizing object initialization is popular because, among other properties:
However! It has the tremendous drawback of preventing the compiler from helping you get the order of initialization correct. Go, look; see if you can spot why I said the latter two calls would crash if you don’t get the order exactly right. Now, I have four questions for you.
init
functions?Ironically, as your initialization becomes more complex, the compiler becomes less able to help you with uninitialized-variable warnings and the like. But, this is not the natural order of things; it is a consequence of using imperative variable initialization but not representing this variable refinement in the type system. By initializing in a different way, we can recover type safety.
Document
If we consider Document
as the simple product of its three state
variables, with some special functions associated with them as
whatever Document
methods we intend to support, we have a simple
3-tuple.
(String, Map[String, List[Int]],
List[(String, Int)])
Let us no longer pretend that it is any more complicated than that.
But this cannot be mutated to fill these in as they are initialized, you say! Yes, that’s right, we want a type-changing transformation. By changing values, this is easy. There are three phases of initialization, so four states, including uninitialized.
Path
String
(String, Map[String, List[Int]])
(String, Map[String, List[Int]], List[(String, Int)])
For interesting phases, such as the final one, we might create a case
class
to hold its contents, instead. Let us call that class, for
this example, Doc
.
final case class Doc
(text: String, wordIndex: Map[String, List[Int]],
mostPopular: List[(String, Int)])
Finally, we can build 3 functions to take us through these steps. Each begins by taking one as an argument, and produces the next state as a return type.
def initText(filename: Path): String =
Source.fromFile(filename.toFile).mkString
def initWordIndex(text: String): (String, Map[String, List[Int]]) = {
val words = """\w+""".r findAllMatchIn text
(text, words.foldLeft(Map[String, List[Int]]()){
(m, mtch) =>
val word = mtch.matched
val idx = mtch.start
m + (word -> (idx :: m.getOrElse(word, Nil)))
})
}
def initMostPopular(twi: (String, Map[String, List[Int]])): Doc = {
val (text, wordIndex) = twi
Doc(text, wordIndex,
wordIndex.mapValues(_.size).toList
.sortBy(p => 0 - p._2).take(10))
}
If we have a Path
, we can get a Doc
by (initText _) andThen
initWordIndex andThen initMostPopular: Path => Doc
. But that hardly
replicates the rich runtime behavior of our imperative version, does
it? That is, we can do reordering of operations in a larger context
with Document
, but not Doc
. Let us see what that means.
Dealing with one document in isolation is one thing, but suppose we
have a structure of Document
s.
sealed abstract class DocumentTree
final case class SingleDocument(id: Int, doc: Document)
extends DocumentTree
final case class DocumentCategory
(name: String, members: List[DocumentTree])
extends DocumentTree
In the imperative mode, we can batch and reorder initialization. Say,
for example, we don’t initialize Document
when we create it. This
tree then contains Document
s that contain only Path
s. We can walk
the tree, doing step 1 for every Document
.
// add this to DocumentTree
def foreach(f: Document => Unit): Unit =
this match {
case SingleDocument(_, d) => f(d)
case DocumentCategory(_, dts) => dts foreach (_ foreach f)
}
// now we can initialize the text everywhere,
// given some dtree: DocumentTree
dtree foreach (_.initText())
The way software does, it got more complex. And we can be ever less sure that we’re doing things right, under this arrangement.
Our tree only supports one type of document. We could choose the
final one, Doc
, but there is no way to replicate more exotic
document tree initializations like the one above.
Instead, we want the type of the tree to adapt along with the document
changes. If we have four states, Foo, Bar, Baz, and Quux, we
want four different kinds of DocumentTree
to go along with them. In
a language with type parameters, this is easy: we can model those four
as DocTree[Foo]
, DocTree[Bar]
, DocTree[Baz]
, and
DocTree[Quux]
, respectively, by adding a type parameter.
sealed abstract class DocTree[D]
final case class SingleDoc[D](id: Int, doc: D)
extends DocTree[D]
final case class DocCategory[D]
(name: String, members: List[DocTree[D]])
extends DocTree[D]
Now we need a replacement for the foreach
that we used with the
unparameterized DocumentTree
to perform each initialization step on
every Document
therein. Now that DocTree
is agnostic with respect
to the specific document type, this is a little more abstract, but
quite idiomatic.
// add this to DocTree
def map[D2](f: D => D2): DocTree[D2] =
this match {
case SingleDoc(id, d) => SingleDoc(id, f(d))
case DocCategory(c, dts) =>
DocCategory(c, dts map (_ map f))
}
It’s worth comparing these side by side. Now we should be able to
step through initialization of DocTree
with map
, just as with
DocumentTree
and foreach
.
scala> val dtp: DocTree[Path] = DocCategory("rt", List(SingleDoc(42, Paths.get("hello.md"))))
dtp: tmtp7.DocTree[java.nio.file.Path] = DocCategory(rt,List(SingleDoc(42,hello.md)))
scala> dtp map Doc.initText
res3: tmtp7.DocTree[String] =
DocCategory(rt,List(SingleDoc(42,contents of the hello.md file!)))
There is nothing magical about DocTree
that makes it especially
amenable to the introduction of a type parameter. This is not a
feature whose proper use is limited to highly abstract or
general-purpose data structures; with its String
s and Int
s strewn
about, it is utterly domain-specific, “business” code.
In fact, if we were likely to annotate Doc
s with more data, Doc
would be a perfect place to add a type parameter!
// suppose we add some "extra" data
final case class Doc[A]
(text: String, wordIndex: Map[String, List[Int]],
mostPopular: List[(String, Int)],
extra: A)
You can use a type parameter to represent one simple slot in an otherwise concretely specified structure, as above. You can use one to represent 10 slots.
Parameterized types are the type system’s version of functions. They aren’t just for collections, abstract code, or highly general-purpose libraries: they’re for your code!
Unless you are going to suggest that functions are “too academic”. Or that functions have no place in “business logic”. Or perhaps that, while it would be nice to define functions to solve this, that, and sundry, you’ll just do the quick no-defining-functions hack for now and maybe come back to add some functions later when “paying off technical debt”. Then, I’m not sure what to say.
Now we are doing something very close to functional programming. Moreover, we were led here not by a desire for referential transparency, nor for purity, but merely for a way to represent the states of our program in a more well-typed way.
In this series of posts, I have deliberately avoided discussion of
functional programming until this section; my chosen subject is types,
not functional programming. But the features we have been considering
unavoidably coalesce here into an empirical argument for functional
programming. Type parameters let us elegantly lift transformations
from one part of our program to another; the intractable complexities
of imperative type-changing direct us to program more functionally, by
computing new values instead of changing old ones, if we want access
to these features. This, in turn, encourages ever more of our program
to be written in a functional style, just as the switch to different
Doc
representations induced a switch to different document tree
representations, map
instead of foreach
.
Likewise, the use of functional programming style feeds back, in the aforementioned virtuous circle, to encourage the use of stronger types.
When we wanted stronger guarantees about the initialization of our documents, and thereby also of the larger structures incorporating them, we turned to the most powerful tool we have at our disposal for describing and ensuring such guarantees: the type system. In so doing, we induced an explosion of explicit data representations; where we had two, we now have eight, whose connections to each other are mediated by the types of functions involved.
With the increase in the number of explicit concepts in the code comes a greater need for an automatic method of keeping track of all these connections. The type system is ideally suited to this role.
Document
has
four stages of initialization, at each of which it exhibits
different behavior. All we have done is expose this fact to the
type system level, at which our usage can be checked.
As it is declared, the type-changing DocTree#map
has another
wonderful advantage over DocumentTree#foreach
.
Let us say that each category should also have a document of its own,
not just a list of subtrees. In refactoring, we adjust the
definitions of DocumentCategory
or DocCategory
.
// imperative version
final case class DocumentCategory
(name: String, doc: Document,
members: List[DocumentTree])
extends DocumentTree
// functional version
final case class DocCategory[D]
(name: String, doc: D,
members: DocTree[D])
extends DocTree[D]
So far, so good. Next, neither foreach
nor map
compile anymore.
TmTp7.scala:70: wrong number of arguments for pattern
⤹ tmtp7.DocumentCategory(name: String,doc: tmtp7.Document,
⤹ members: List[tmtp7.DocumentTree])
case DocumentCategory(_, dts) =>
^
TmTp7.scala:71: not found: value d
f(d)
^
TmTp7.scala:91: wrong number of arguments for pattern
⤹ tmtp7.DocCategory[D](name: String,doc: D,members: List[tmtp7.DocTree[D]])
case DocCategory(c, dts) =>
^
TmTp7.scala:92: not enough arguments for method
⤹ apply: (name: String, doc: D, members: List[tmtp7.DocTree[D]]
⤹ )tmtp7.DocCategory[D] in object DocCategory.
Unspecified value parameter members.
DocCategory(c, dts map (_ map f))
^
So let us fix foreach
in the simplest way possible.
// added ↓
case DocumentCategory(_, _, dts) => ...
This compiles. It is wrong, and we can figure out exactly why by
trying the same shortcut with map
.
case DocCategory(c, d, dts) =>
DocCategory(c, d, dts map (_ map f))
We are treating the d: D
like the name: String
, just passing it
through. It is “ignored” in precisely the same way as the foreach
ignores the new data. But this version does not compile!
TmTp7.scala:90: type mismatch;
found : d.type (with underlying type D)
required: D2
DocCategory(c, d, dts map (_ map f))
^
More broadly, map
must return a DocTree[D2]
. By implication, the
second argument must be a D2
, not a D
. We can fix it by using
f
.
DocCategory(c, f(d), dts map (_ map f))
Likewise, we should make a similar fix to DocumentTree#foreach
.
case DocumentCategory(_, d, dts) =>
f(d)
dts foreach (_ foreach f)
But only in the case of map
did we get help from the compiler.
That’s because DocumentTree
is not the only thing to gain a type
parameter in this new design. When we made DocTree
take one, it was
only natural to define map
with one, too.
We can see how this works out by looking at both foreach
and map
as the agents of our practical goal: transformation of the tree by
transforming the documents therein. foreach
works like this.
document transformer
(Document => Unit)
DocumentTree ~~~~~~~~~~~~~~~~~~~> DocumentTree
------------ -----------------
initial state final state
(old tree) (same type, “new”
but same tree)
The way map
looks at DocTree
is very similar, and we give it the
responsibilities that foreach
had, so it is unsurprising that the
“shape” we imagine for transformation is similar.
document transformer
(D => D2)
DocTree[D] ~~~~~~~~~~~~~~~~~~~~> DocTree[D2]
------------- ----------------
initial state final state
(old tree) (changed type,
changed value!)
The replacement of D
with D2
also means that values of type D
cannot occur anywhere in the result, as D
is abstract, so only
appears as doc
by virtue of being the type parameter passed to
DocTree
and its data constructors (er, “subclasses”).
As our result type is DocTree[D2]
, we have two options, considering
only the result type:
DocTree
with no D2
s in its representation, one role of
None
and Nil
in Option
and List
respectively, orD2
s from the D
s in the DocTree[D]
we have in hand, by
passing them to the ‘document transformer’ D => D2
.Similarly, no DocTree[D]
values can appear anywhere in the result.
As with the D
s, they must all be transformed or dropped, with a
different ‘empty’ DocTree
chosen.
Suppose we instead defined map
as follows.
def map(f: D => D): DocTree[D]
If you subscribe to the idea of type parameters being for wonky
academics, this is “simpler”. And it’s fine, I suppose, if you only
have one D
in mind, one document type in mind. Setting aside that
we have four, there is another problem. Let’s take a look at the
“shape” of this transformation.
document transformer
(D => D)
DocTree[D] ~~~~~~~~~~~~~~~~~~~~> DocTree[D]
------------- ----------------
initial state final state
(old tree) (but no promise,
same type!)
The problem with a D => D
transformer is that we can’t make promises
that all our data passed through it. After all, a source d
has the
same type as f(d)
. We could even get away with
def map(f: D => D): DocTree[D] = this
map[D2]
is strictly more general. Even if we have only one D
in
mind for DocTree
, it still pays to type-parameterize it and to add
‘type-changing’ extra type parameters like D2
.
Have you ever started getting a new bill, then missed a payment because you thought you were covered for the month?
Have you ever gone on vacation and, in your relief at having not left anything important at home, left something behind when packing for your return trip?
This kind of thing cannot be characterized in the manner of “well, I would just get a runtime error if I didn’t have a type checker, so it’s fine.” Yet it simply falls out of the system we have chosen; moreover, we have barely begun to consider the possibilities.
In this series, I have focused on existential types, which we can in
one sense consider merely abstract types that the compiler checks that
we treat as independent, like D
and D2
. Existential types are
only one natural outcome of the system of type abstraction brought to
us by type parameters; there are many more interesting conclusions,
like the ones described above.
Next, in “It’s existential on the inside”, we will see how deeply intertwined universal and existential types really are.