by Adelbert Chang on Dec 15, 2013
technical
A lot of people see Scalaz as a hard fringe, ivory tower, not suited for real-world applications library, which is unfortunate. The goal of this blog post series is to introduce various components of Scalaz, and hopefully through this allow folks to gain an understanding towards the power of Scalaz.
As a prerequisite, I assume knowledge of type classes as they
are implemented and used in Scala, higher kinded types,
and sum types (e.g. Option/Some/None
, Either/Left/Right
).
For a tutorial/review on (higher) kinds, I recommend the following resources:
Last time we left off after
writing our own generic sum
function:
import scalaz.Monoid
def sumGeneric[A](l: List[A])(implicit A: Monoid[A]): A =
l.foldLeft(A.zero)((x, y) => A.append(x, y))
This allowed us to sum a list not only of numeric types like
Int
, but also others that could be added and had a “zero” such as
String
via string concatenation and the empty string, as well as
List[A]
via list concatenation and the empty list.
But, we can do better! Why limit ourselves to List
? What if we want
to sum over a Vector
, or even a tree? We could use Seq
and that
would allow us to pass in List
or Vector
, but it still brings up
the problem of trees, and any other data structure that may not fit
the Seq
bill.
Recall when we “came up with” Semigroup
and Monoid
last time -
what did we do? We simply looked at what operations we needed
(add/append
and zero
) and factored it out into a type class.
Let’s try doing the same this time.
So what are we doing with List
in our implementation? Nothing much
really, we’re just folding over it. If we think about it, we could
“fold” over say, a tree as well. Let’s take this operation out into
a type class, and aptly name it Foldable
.
trait Foldable[F[_]] {
// Instead of requiring the contents to be monoidal, let's
// make it flexible by allowing a fold as long as we can convert
// the contents to a type that has a `Monoid`.
def foldMap[A, B](fa: F[A])(f: A => B)(implicit B: Monoid[B]): B
}
And let’s implement instances of this type class for List
and our own
Tree
.
Our tree definition:
sealed trait Tree[A]
case class Node[A](value: A, left: Tree[A], right: Tree[A]) extends Tree[A]
case class Leaf[A]() extends Tree[A]
and our instances:
object Foldable {
implicit val listIsFoldable: Foldable[List] =
new Foldable[List] {
def foldMap[A, B](fa: List[A])(f: A => B)(implicit B: Monoid[B]): B =
fa.foldLeft(B.zero)((acc, elem) => B.append(acc, f(elem)))
}
implicit val treeIsFoldable: Foldable[Tree] =
new Foldable[Tree] {
def foldMap[A, B](fa: Tree[A])(f: A => B)(implicit B: Monoid[B]): B =
fa match {
case Leaf() =>
B.zero
case Node(value, left, right) =>
B.append(f(value), B.append(foldMap(left)(f), foldMap(right)(f)))
}
}
}
and finally, our new summing function:
def sumGeneric[F[_], A](fa: F[A])(implicit F: Foldable[F], A: Monoid[A]): A =
fa.foldMap(identity)
As with last time, Scalaz defines the Foldable
type class for us. However,
to really be “foldable”, not only should you define foldMap
, but foldRight
as well. Some of you may be wondering why foldRight
and not foldLeft
, or both?
The reasons for this decision are that
foldLeft
can be defined in terms of foldRight
(a fun exercise is to try this for yourself)foldLeft
fails on infinite lists (think Stream
in Scala)That being said, Scalaz defines instances of Foldable
for many of the standard
Scala types (List
, Vector
, Stream
, Option
), as well as its own (Tree
, EphemeralStream
).
The methods available on the type class not only include foldMap
and foldRight
which
are required to be implemented, but several derived ones as well including fold
(foldMap
with
identity
), foldLeft
, toList/IndexedSeq/Stream
, among others.
So our code with scalaz.Foldable
now looks like:
import scalaz.{ Foldable, Monoid }
// Note that this is equivalent to scalaz.Foldable#fold
def sumGeneric[F[_], A](fa: F[A])(implicit F: Foldable[F], A: Monoid[A]): A =
fa.fold
Note that the implementation of the function is rather plain, but that’s a good thing!
This shows the level of genericity type classes, folds, and Scalaz is capable of. If you ever
find yourself needing to fold something down, look at the methods available on
scalaz.Foldable
. By simply adding an instance of Foldable
to your F[_]
by implementing
the two methods above, you get “for free” a bunch of
derived ones!
In recent days, the word “Hadoop” has become synonymous with “big data.” The MapReduce system made popular by Google has made it’s way into several companies looking to glean information from their data.
Why am I mentioning this in a typelevel.scala blog post? Well, think about the reduce phase –
what is really happening? For a particular key, we’re given a list of values emitted
for that key, and we want to reduce those values into a single value. Sound familiar?
Sounds a bit like fold
, doesn’t it? Note that not all reductions in MapReduce have to follow
monoid laws, but a surprising amount do as demonstrated by Twitter’s
Algebird project.
Going back to fold
, recall that in order to just fold
we need to have something
Foldable
that contains something that already has a Monoid
instance. A more general
approach, as taken by scalaz.Foldable
, is to also provide a foldMap
function which
lets us also pass in a function mapping each element of the Foldable
to something
that is a Monoid
, and reduce over that instead.
So. Given something, say a List[A]
, we want to Map each element of the list to
an element of a type that has a Monoid
instance, and then we want to Reduce the
list down to a single value. What is this? All together now: MapReduce!
Unfortunately, Hadoop MapReduce by itself does not give you anything like a List
.
Fortunately, our good friends at NICTA have developed
and open sourced the wonderful Scoobi project,
which abstracts over Hadoop MapReduce by providing a List
-like interface, called a
DList
(distributed list). Users treat the DList
very similarly to how they would
a regular Scala List
, and perform operations on it that get compiled down into
MapReduce jobs. Such operations include not only the familiar (and expected) map
and reduce
combinators, but also our friends foldMap
and fold
. While DList
’s
do not have a proper Foldable
instance due to the difficulty of implementing foldRight
for the MapReduce, I find it to be a great example of the power of abstractions and
genericity abstract algebra and Scalaz provides to us as programmers.
If you have any questions/comments/concerns, feel free to hop onto the IRC channel on
Freenode at #scalaz
.