by Lars Hupel on Oct 17, 2016
technical
ScalaCheck is a well-known property-based testing library, based on ideas from Haskell’s QuickCheck. It is also a Typelevel project. In this post, I’d like to show some of the underlying mechanisms, stripped down to the bare minimum.
Testing with properties is well-understood in academia and widely used in parts of the industry – namely the parts which embrace functional programming. However, the design space of property-testing libraries is rather large. I think it is high time to talk about various tradeoffs done in libraries. Here, I’d like to contribute by implementing a ScalaCheck clone from scratch using a very similar design and explaining the design choices along the way.
This is not an introduction to property testing. However, it can be read as a guide to implementation ideas. QuickCheck, ScalaCheck and the like are nice examples of functional library design, but their internals are often obscured by technicalities. I hope that by clearing up some of the concepts it will become easier to read their code and perhaps designing your own property-testing library.
The basic point of a property testing library is providing an interface looking roughly like this:
class Prop {
def check(): Unit = ???
}
object Prop {
def forAll[A](prop: A => Boolean): Prop = ???
}
Now, you can use that in your test code:
Prop.forAll { (x: Int) =>
x == x
}
This expresses that you have a property which is parameterized on a single integer number. Hence, the library must somehow provide these integer numbers. The original Haskell QuickCheck, ScalaCheck and many other libraries use a random generator for this. This comes with a number of advantages:
But it is also not without problems:
Of course, there are other possible design choices:
For this post, we’re assuming that random generation is a given.
Do we want to do this purely or poorly?
Of course, this motto is tongue-in-cheek. Just because something isn’t pure doesn’t mean that it is poor.
To understand the design space here, let’s focus on the smallest building block: A primitive random generator. There are two possible ways to model this. The mutable way is what Java, Scala and many other languages offer in their libraries:
trait Random {
def nextInt(min: Int, max: Int): Int
def nextFloat(): Float
def nextItem[A](pool: List[A]): A
}
By looking at the types alone, we can already see that two subsequent calls of nextInt
will produce different results; the interface is thus impure.
The pure way is to make the internal state (also known as “seed” in the context of random generators) explicit:
trait Seed {
def nextInt(min: Int, max: Int): (Int, Seed)
def nextFloat: (Float, Seed)
def nextItem[A](pool: List[A]): (A, Seed)
}
object Seed {
def init(): Seed = ???
}
Because this is difficult to actually use (don’t mix up the Seed
instances and use them twice!), one would wrap this into a state monad:
class Random[A](private val op: Seed => (A, Seed)) { self =>
def run(): A = op(Seed.init())._1
def map[B](f: A => B): Random[B] =
new Random[B]({ seed0 =>
val (a, seed1) = self.op(seed0)
(f(a), seed1)
})
def flatMap[B](f: A => Random[B]): Random[B] =
new Random[B]({ seed0 =>
val (a, seed1) = self.op(seed0)
f(a).op(seed1)
})
override def toString: String = "<random>"
}
object Random {
def int(min: Int, max: Int): Random[Int] = new Random(_.nextInt(min, max))
val float: Random[Float] = new Random(_.nextFloat)
}
Now we can use Scala’s for
comprehensions:
for {
x <- Random.int(-5, 5)
y <- Random.int(-3, 3)
} yield (x, y)
// res2: Random[(Int, Int)] = <random>
The tradeoffs here are the usual when we’re talking about functional programming in Scala: Reasoning ability, convenience, performance, … In the pure case, there are also multiple other possible encodings, including free monads. Luckily, this blog covers that topic in another post.
How do other libraries fare here?
scala.util.Random
.Seed
trait from above, and there is also an additional state-monadic layer on top of it.Seed
, but they don’t use the updated seed.
Instead, their approach is via an additional primitive split
of type Seed => (Seed, Seed)
, which gets used to “distribute” randomness during composition (see the paper by Claessen & Pałka about the theory behind that).
It is worth noting that Java 8 introduced a SplittableRandom
class.For this post, we’re assuming that mutable state is a given.
We’ll use scala.util.Random
(because it’s readily available) in a similar fashion to ScalaCheck 1.12.x.
Asynchronous programming is all the rage these days.
This means that many functions will not return plain values of type A
, but rather Future[A]
, Task[A]
or some other similar type.
For our testing framework, this poses a challenge:
If our properties call such asynchronous functions, the framework needs to know how to deal with a lot of Future[Boolean]
values.
On the JVM, although not ideal, we could fall back to blocking on the result and proceed as usual.
On Scala.js, this won’t fly, because you just can’t block in JavaScript.
Most general-purpose testing frameworks, like Specs2, have a story about this, enabling asynchronous checking of assertions.
In theory, it’s not a problem to support this in a property testing library. But in practice, there are some complications:
Future
s in ScalaTest.Future
values?
We can easily imagine wanting to draw from a pool of inputs stemming from a database, or possibly to get better randomness from random.org.
(The latter is a joke.)Task
?
fs2’s Task
?
All of them?If in the first design decisions we had chosen exhaustive generators, this problem would be even tougher, because designing a correct effectful stream type (of all possible inputs) is not trivial.
For this post, we’re assuming that we’re only interested in synchronous properties, or can always block. However, I’d like to add, I’d probably try to incorporate async properties right from the start if I were to implement a testing library from scratch.
What about the existing libraries?
morallyDubiosIOProperty
(nowadays just ioProperty
).
But there is also more advanced support for monadic testing.Let’s summarize what we have so far:
Now, I’d like to talk about how to “package” random generators.
Earlier, we’ve only seen random integer and floating-point numbers, but of course, we want something more complex, including custom data structures.
It is convenient to abstract over this and specify the concept of a generator for type A
.
The idea is to make a generator for a type as “general” as possible and then provide combinators to compose them.
import scala.util.Random
trait Gen[T] {
def generate(rnd: Random): T
}
object Gen {
val int: Gen[Int] = new Gen[Int] {
def generate(rnd: Random): Int =
rnd.nextInt()
}
}
An obvious combinator is a generator for tuples:
def zip[T, U](genT: Gen[T], genU: Gen[U]): Gen[(T, U)] = new Gen[(T, U)] {
def generate(rnd: Random): (T, U) =
(genT.generate(rnd), genU.generate(rnd))
}
But we still have a problem: There is currently no way to talk about the size of the generated inputs. Let’s say we want to check an expensive algorithm over lists, for example with a complexity of $\mathcal O(n^3)$ over lists. A naive implemenation of a list generator would take a a random size, and then give you some generator for lists. The problem arises at the use site: Whenver you want to change the size of the generated inputs, you need to change the expression constructing the generator.
But we’d like to do better here:
For this post, there should be a way to specify a maximum size of generated values, together with a way to influence that size in the tests without having to modify the generators.
Here’s how we can do that:
import scala.util.Random
trait Gen[T] {
def generate(size: Int, rnd: Random): T
override def toString: String = "<gen>"
}
object Gen {
val int: Gen[Int] = new Gen[Int] {
def generate(size: Int, rnd: Random): Int = {
val range = size * 2 + 1
rnd.nextInt(range) - size
}
}
def list[A](genA: Gen[A]): Gen[List[A]] = new Gen[List[A]] {
def generate(size: Int, rnd: Random): List[A] = {
val length = rnd.nextInt(size + 1)
List.fill(length)(genA.generate(size, rnd))
}
}
}
We can now check this (note that for the purpose of this post we’ll be using fixed seeds):
def printSample[T](genT: Gen[T], size: Int, count: Int = 10): Unit = {
val rnd = new Random(0)
for (i <- 0 until size)
println(genT.generate(size, rnd))
}
scala> printSample(Gen.int, 10)
2
6
-6
-8
1
4
-8
5
-4
-8
scala> printSample(Gen.int, 3)
2
-1
1
scala> printSample(Gen.list(Gen.int), 10)
List()
List(-6, -8, 1, 4, -8, 5)
List(-8, -2)
List(4, 6)
List(4, 0)
List(10, 4, -9, -7, 7)
List(-8, 6, -4, 9, -1, 10, 4, 7, -8)
List(1, 7, -7, 4, 0, 5, 4, 9, 7, 4)
List(5, 9, -3, 3, -10)
List()
scala> printSample(Gen.list(Gen.int), 3)
List(-1, 1)
List(1, -3)
List(-2, 3)
That’s already pretty cool.
But there’s another hidden design decision here:
We’re using the same size on all sub-elements in the generated thing.
For example, in Gen.list
, we’re just passing the size through to the child generator.
SmallCheck does that differently: The “size” is defined to be the total number of constructors in the generated value.
For integer numbers, the “number of constructors” is basically the number itself.
For example, the value List(1, 2)
has size $2$ in our framework (length of the list), but size $1 + 2 + 2 = 5$ in SmallCheck (roughly: size of all elements plus length of list).
Of course, our design decision might mean that stuff grows too fast. The explicit size parameter can be used to alleviate that, especially for writing recursive generators:
def recList[T](genT: Gen[T]): Gen[List[T]] = new Gen[List[T]] {
// extremely stupid implementation, don't use it
def generate(size: Int, rnd: Random): List[T] =
if (rnd.nextInt(size + 1) > 0)
genT.generate(size, rnd) :: recList(genT).generate(size - 1, rnd)
else
Nil
}
// recList: [T](genT: Gen[T])Gen[List[T]]
printSample(recList(Gen.int), 10)
// List()
// List(-6, 1, -6)
// List(-8, 8, 4, 7, 0, 5, -4)
// List(-8)
// List(9, -3, 4)
// List(1, 3, -7, -2, -3, 0, -3, 3)
// List()
// List(10, 5, -8, -4, -5, 4, -1)
// List(-5, 9, 7)
// List(-8)
We can also provide a combinator for this:
def resize[T](genT: Gen[T], newSize: Int): Gen[T] = new Gen[T] {
def generate(size: Int, rnd: Random): T =
genT.generate(newSize, rnd)
}
That one is useful because in reality ScalaCheck’s generate
method takes some more parameters than just the size.
Some readers might be reminded that this is just the reader monad and its local
combinator in disguise.
In order to make these generators nicely composable, we can leverage for
comprehensions.
We just need to implement map
, flatMap
and withFilter
:
import scala.util.Random
trait Gen[T] { self =>
def generate(size: Int, rnd: Random): T
// Generate a value and then apply a function to it
def map[U](f: T => U): Gen[U] = new Gen[U] {
def generate(size: Int, rnd: Random): U =
f(self.generate(size, rnd))
}
// Generate a value and then use it to produce a new generator
def flatMap[U](f: T => Gen[U]): Gen[U] = new Gen[U] {
def generate(size: Int, rnd: Random): U =
f(self.generate(size, rnd)).generate(size, rnd)
}
// Repeatedly generate values until one passes the check
// (We would usually call this `filter`, but Scala requires us to
// call it `withFilter` in order to be used in `for` comprehensions)
def withFilter(p: T => Boolean): Gen[T] = new Gen[T] {
def generate(size: Int, rnd: Random): T = {
val candidate = self.generate(size, rnd)
if (p(candidate))
candidate
else // try again
generate(size, rnd)
}
}
override def toString: String = "<gen>"
}
object Gen {
// unchanged from above
val int: Gen[Int] = new Gen[Int] {
def generate(size: Int, rnd: Random): Int = {
val range = size * 2 + 1
rnd.nextInt(range) - size
}
}
def list[A](genA: Gen[A]): Gen[List[A]] = new Gen[List[A]] {
def generate(size: Int, rnd: Random): List[A] = {
val length = rnd.nextInt(size + 1)
List.fill(length)(genA.generate(size, rnd))
}
}
}
Look how simple composition is now:
case class Frac(numerator: Int, denominator: Int)
// defined class Frac
val fracGen: Gen[Frac] =
for {
num <- Gen.int
den <- Gen.int
if den != 0
} yield Frac(num, den)
// fracGen: Gen[Frac] = <gen>
printSample(fracGen, 10)
// Frac(2,6)
// Frac(-6,-8)
// Frac(1,4)
// Frac(-8,5)
// Frac(-4,-8)
// Frac(-2,-8)
// Frac(4,6)
// Frac(1,4)
// Frac(0,-10)
// Frac(10,4)
And we can even read the construction nicely: “First draw a numerator, then draw a denominator, then check that the denominator is not zero, then construct a fraction.”
However, we need to be cautious with the filtering.
If you look closely at the implementation of withFilter
, you can see that there is potential for an infinite loop.
For example, when you pass in the filter _ => false
.
It will just keep generating values and then discard them.
How do existing frameworks alleviate this?
Gen[A]
as above, and one that return Gen[Option[A]]
.
The latter uses a number of tries and if they all fail, terminates and returns None
.
The former uses the latter, but keeps increasing the size parameter.
Of course, this might not terminate.filter
method returns Gen[A]
, but the possibility of failure is encoded in the return type of its equivalent of the generate
method, which always returns Option[T]
.
But there is also a combinator which retries until it finds a valid input, called retryUntil
.As a side note: Gen
as it is right now is definitely not a valid monad, because it internally relies on mutable state.
But in my opinion, it is still justified to offer the map
and flatMap
methods, but don’t give a Monad
instance.
This prevents you from shoving Gen
into functions which expect lawful monads.
It’s still tedious to having to construct these generators by hand.
Both QuickCheck and ScalaCheck introduce a thin layer atop generators, called Arbitrary
.
This is just a type class which contains a generator, nothing more.
Here’s how it would look like in Scala:
trait Arbitrary[T] {
def gen: Gen[T]
}
// in practice we would put that into the companion object
//object Arbitrary {
implicit val arbitraryInt: Arbitrary[Int] = new Arbitrary[Int] {
def gen = Gen.int
}
//}
Based on this definition, ScalaCheck provides a lot of pre-defined instances for all sorts of types.
For your custom types, the idea is that you define a low-level generator and wrap it into an implicit Arbitrary
.
Then, in your tests, you just use the implicitly provided generator, and avoid to drop down to constructing them manually.
The purpose of the additional layer is explained easily: It is common to have multiple Gen[T]
for the same T
depending on which context it is needed in.
But there should only be one Arbitrary[T]
for each T
.
For example, you might have Gen[Int]
for positive and negative integers, but you only have a single Arbitrary[Int]
which covers all integers.
You use the latter when you actually need to supply an integer to your property, and the former to construct more complex generators, like for Frac
above.
This is where everything really comes together.
We’re now looking at how to use Gen
to implement the desired forAll
function we’ve seen early in the introduction of the post, and how that is related to the Prop
type I didn’t define.
I’ll readily admit that the following isn’t really a design decision per se, because we’ll be guided by the presence of type classes in Scala.
Still, one could reasonably structure this differently, and in fact, the design of the Prop
type in e.g. QuickCheck is much more complex than what you’ll see.
The rest of this post will now depart from the way it’s done in ScalaCheck, although the ideas are still similar. Instead, I’ll try to show a simplified version without introducing complications required to make it work nicely.
Let’s start with the concept of a property. A property is something that we can run and which returns a result. The result should ideally be something like a boolean: Either the property holds or it doesn’t. But one of the main features of any property testing library is that it will return a counterexample for the inputs where the property doesn’t hold. Hence, we need to store this counterexample in the failure case. In practice, the result type would be much richer, with attached labels, reasons, expectations, counters, … and more diagnostic fields.
sealed trait Result
case object Success extends Result
final case class Failure(counterexample: List[String]) extends Result
object Result {
def fromBoolean(b: Boolean): Result =
if (b)
Success
else
// if it's false, it's false; no input has been produced,
// so the counterexample is empty
Failure(Nil)
}
You’ll note that I’ve used List[String]
here, because in the end we only want to print the counterexample on the console.
ScalaCheck has a dedicated Pretty
type for that.
We could do even more fancy things here if we wanted to, but let’s keep it simple.
Now we define the Prop
type:
trait Prop {
def run(size: Int, rnd: Random): Result
override def toString: String = "<prop>"
}
What’s missing is a way to construct properties.
Sure, we could implement the trait manually in our tests, but that would be tedious.
Type classes to the rescue!
We call something testable if it can be converted to a Prop
:
trait Testable[T] {
def asProp(t: T): Prop
}
// in practice we would put these into the companion object
//object Testable {
// Booleans can be trivially converted to a property:
// They are already basically a `Result`, so no need
// to run anything!
implicit val booleanIsTestable: Testable[Boolean] = new Testable[Boolean] {
def asProp(t: Boolean): Prop = new Prop {
def run(size: Int, rnd: Random): Result =
Result.fromBoolean(t)
}
}
// Props are already `Prop`s.
implicit val propIsTestable: Testable[Prop] = new Testable[Prop] {
def asProp(t: Prop): Prop = t
}
//}
Now we’re all set:
def forAll[I, O](prop: I => O)(implicit arbI: Arbitrary[I], testO: Testable[O]): Prop =
new Prop {
def run(size: Int, rnd: Random): Result = {
val input = arbI.gen.generate(size, rnd)
val subprop = testO.asProp(prop(input))
subprop.run(size, rnd) match {
case Success =>
Success
case Failure(counterexample) =>
Failure(input.toString :: counterexample)
}
}
}
Let’s unpack this step by step.
I => O
.
This is supposed to be our parameterized property, for example { (x: Int) => x == x }
.
Because we abstracted over values that can be generated (Arbitrary
) and things that can be tested (Testable
), the input and output types are completely generic.
In the implicit
block, we’re taking the instructions of how to fit everything together.Prop
; that is, a thing that we can run and that produces a boolean-ish Result
.Gen[I]
which we get from the Arbitrary[I]
.I
into the parameterized property.
To stick with the example, we evaluate the anonymous function { (x: Int) => x == x }
at input 5
, and obtain true
.Prop
again.
This allows us to recursively nest forAll
s, for example when we need two inputs.At this point we should look at an example.
val propReflexivity =
forAll { (x: Int) =>
x == x
}
// propReflexivity: Prop = <prop>
Cool, but how do we run this?
Remember that our tool is supposed to evaluate a property on multiple inputs.
All these evaluations will produce a Result
.
Hence, we need to merge those together into a single result.
We’ll also define a convenient function that runs a property multiple times on different sizes:
def merge(rs: List[Result]): Result =
rs.foldLeft(Success: Result) {
case (Failure(cs), _) => Failure(cs)
case (Success, Success) => Success
case (Success, Failure(cs)) => Failure(cs)
}
def check[P](prop: P)(implicit testP: Testable[P]): Unit = {
val rnd = new Random(0)
val rs =
for (size <- 0 to 100)
yield testP.asProp(prop).run(size, rnd)
merge(rs.toList) match {
case Success =>
println("✓ Property successfully checked")
case Failure(counterexample) =>
val pretty = counterexample.mkString("(", ", ", ")")
println(s"✗ Property failed with counterexample: $pretty")
}
}
What is happening here?
merge
function takes a list of Result
s and returns the first Failure
, if it exists.
Otherwise it returns Success
.
In case there are multiple Failure
s, it doesn’t care and just discards the later ones.check
function initializes a fresh random generator.Let’s check our property!
scala> check(propReflexivity)
✓ Property successfully checked
… and how about something wrong?
scala> check(forAll { (x: Int) =>
| x > x
| })
✗ Property failed with counterexample: (0)
Okay, we’re almost done.
The only tedious thing that remains is that we have to use the forAll
combinator, especially in the nested case.
It would be great if we could just use check
and pass it a function.
But since we’ve used type classes for everything, we’re in luck!
implicit def funTestable[I : Arbitrary, O : Testable]: Testable[I => O] = new Testable[I => O] {
def asProp(f: I => O): Prop =
// wait for it ...
// ...
// ...
// it's really simple ...
forAll(f)
}
Now we can check our functions even easier!
scala> check { (x: Int) =>
| x == x
| }
✓ Property successfully checked
scala> check { (x: Int) =>
| x > x
| }
✗ Property failed with counterexample: (0)
scala> check { (x: Int) => (y: Int) =>
| x + y == y + x
| }
✓ Property successfully checked
scala> check { (x: Int) => (y: Int) =>
| x + y == x * y
| }
✗ Property failed with counterexample: (0, 1)
Now, if you look closely, you can basically get rid of the Prop
class and define it as
type Prop = Gen[Result]
If you think about this for a moment, it makes sense: A “property” is really just a thing which feeds on randomness and produces a result.
The only thing left is to define a driver which runs a couple of iterations and gathers the results; in our implementation, that’s the check
function.
I encourage you to spell out the other functions (e.g. forAll
), and you will notice that our Prop
trait is indeed isomorphic to Gen[Result]
.
In practice, QuickCheck uses such a representation (although with some more contraptions).
It turns out that it’s not that hard to write a small property-testing library. I’m going to stop here with the implementation, although there are still some things to explore:
Finally, I’d like to note that there are many more libraries out there than I’ve mentioned here, some of which depart more, some less, from the original Haskell implementation. They even exist for not-functional languages, e.g. Javaslang for Java or Hypothesis for Python.
Correction: In a previous version of this post, I incorrectly stated that ScalaCheck uses a mutable random generator. This is only true up to ScalaCheck 1.12.x. I have updated that section in the post.