State

API Documentation: IndexedStateT

State is a structure that provides a functional approach to handling application state. State[S, A] is basically a function S => (S, A), where S is the type that represents your state and A is the result the function produces. In addition to returning the result of type A, the function returns a new S value, which is the updated state.

Robots

Let's try to make this more concrete with an example. We have this Robot model:

final case class Robot(
  id: Long,
  sentient: Boolean,
  name: String,
  model: String)

We would like to generate some random Robot instances for test data.

Pseudorandom values

Scala's standard library has a built-in Random class that provides a (pseudo)random number generator (RNG). Let's use it to write a method that creates robots.

val rng = new scala.util.Random(0L)

def createRobot(): Robot = {
  val id = rng.nextLong()
  val sentient = rng.nextBoolean()
  val isCatherine = rng.nextBoolean()
  val name = if (isCatherine) "Catherine" else "Carlos"
  val isReplicant = rng.nextBoolean()
  val model = if (isReplicant) "replicant" else "borg"
  Robot(id, sentient, name, model)
}
val robot = createRobot()
// robot: Robot = Robot(
//   id = -4962768465676381896L,
//   sentient = false,
//   name = "Catherine",
//   model = "replicant"
// )

We create a single Random instance, which is mutated as a side-effect each time that we call nextLong or nextBoolean on it. This mutation makes it more difficult to reason about our code. Someone might come along and see that we have rng.nextBoolean repeated three times within a single method. They might cleverly avoid repeated code and method invocations by extracting the common code into a variable:

val rng = new scala.util.Random(0L)

def createRobot(): Robot = {
  val id = rng.nextLong()
  val b = rng.nextBoolean()
  val sentient = b
  val isCatherine = b
  val name = if (isCatherine) "Catherine" else "Carlos"
  val isReplicant = b
  val model = if (isReplicant) "replicant" else "borg"
  Robot(id, sentient, name, model)
}
val robot = createRobot()
// robot: Robot = Robot(
//   id = -4962768465676381896L,
//   sentient = false,
//   name = "Carlos",
//   model = "borg"
// )

But now the output of our program has changed! We used to have a replicant robot named Catherine, but now we have a borg robot named Carlos. It might not have been obvious, but the nextBoolean calls we were making had the side effect of mutating internal RNG state, and we were depending on that behavior.

When we can't freely refactor identical code into a common variable, the code becomes harder to reason about. In functional programming lingo, one might say that such code lacks referential transparency.

Purely functional pseudorandom values

Since mutating state caused us trouble, let's create an RNG that is immutable.

We'll use a simple RNG that can generate pseudorandom Long values based only on the previous "seed" value and some carefully chosen constants. You don't need to understand the details of this implementation for the purposes of this example, but if you'd like to know more, this is Knuth's 64-bit linear congruential generator.

final case class Seed(long: Long) {
  def next = Seed(long * 6364136223846793005L + 1442695040888963407L)
}

Instead of mutating the existing long value, calling next returns a new Seed instance with an updated long value.

Since the RNG isn't updating state internally, we will need to keep track of state outside of the RNG. When we call nextBoolean we will want it to return a Boolean as it did before, but we will also want it to return an updated Seed that we can use to generate our next random value.

def nextBoolean(seed: Seed): (Seed, Boolean) =
  (seed.next, seed.long >= 0L)

Similarly, nextLong will return an updated Seed along with a Long value.

def nextLong(seed: Seed): (Seed, Long) =
  (seed.next, seed.long)

Now we need to explicitly pass in the updated state as we generate each new value.

def createRobot(seed: Seed): Robot = {
  val (seed1, id) = nextLong(seed)
  val (seed2, sentient) = nextBoolean(seed1)
  val (seed3, isCatherine) = nextBoolean(seed2)
  val name = if (isCatherine) "Catherine" else "Carlos"
  val (seed4, isReplicant) = nextBoolean(seed3)
  val model = if (isReplicant) "replicant" else "borg"
  Robot(id, sentient, name, model)
}

val initialSeed = Seed(13L)
val robot = createRobot(initialSeed)
// robot: Robot = Robot(
//   id = 13L,
//   sentient = false,
//   name = "Catherine",
//   model = "replicant"
// )

Now it is a bit more obvious that we can't extract the three nextBoolean calls into a single variable, because we are passing each one a different seed value.

However, it is a bit cumbersome to explicitly pass around all of this intermediate state. It's also a bit error-prone. It would have been easy to accidentally call nextBoolean(seed2) for both the name generation and the model generation, instead of remembering to use nextBoolean(seed3) the second time.

Cleaning it up with State

State's special power is keeping track of state and passing it along. Recall the description of State at the beginning of this document. It is basically a function S => (S, A), where S is a type representing state.

Our nextLong function takes a Seed and returns an updated Seed and a Long. It can be represented as Seed => (Seed, Long), and therefore matches the pattern S => (S, A) where S is Seed and A is Long.

Let's write a new version of nextLong using State:

import cats.data.State

val nextLong: State[Seed, Long] = State(seed =>
  (seed.next, seed.long))

The map method on State allows us to transform the A value without affecting the S (state) value. This is perfect for implementing nextBoolean in terms of nextLong.

val nextBoolean: State[Seed, Boolean] = nextLong.map(long =>
  long >= 0)

The flatMap method on State[S, A] lets you use the result of one State in a subsequent State. The updated state (S) after the first call is passed into the second call. These flatMap and map methods allow us to use State in for-comprehensions:

val createRobot: State[Seed, Robot] =
  for {
    id <- nextLong
    sentient <- nextBoolean
    isCatherine <- nextBoolean
    name = if (isCatherine) "Catherine" else "Carlos"
    isReplicant <- nextBoolean
    model = if (isReplicant) "replicant" else "borg"
  } yield Robot(id, sentient, name, model)

At this point, we have not yet created a robot; we have written instructions for creating a robot. We need to pass in an initial seed value, and then we can call value to actually create the robot:

val (finalState, robot) = createRobot.run(initialSeed).value
// finalState: Seed = Seed(long = 2999987205171331217L)
// robot: Robot = Robot(
//   id = 13L,
//   sentient = false,
//   name = "Catherine",
//   model = "replicant"
// )

If we only care about the robot and not the final state, then we can use runA:

val robot = createRobot.runA(initialSeed).value
// robot: Robot = Robot(
//   id = 13L,
//   sentient = false,
//   name = "Catherine",
//   model = "replicant"
// )

The createRobot implementation reads much like the imperative code we initially wrote for the mutable RNG. However, this implementation is free of mutation and side-effects. Since this code is referentially transparent, we can perform the refactoring that we tried earlier without affecting the result:

val createRobot: State[Seed, Robot] = {
  val b = nextBoolean

  for {
    id <- nextLong
    sentient <- b
    isCatherine <- b
    name = if (isCatherine) "Catherine" else "Carlos"
    isReplicant <- b
    model = if (isReplicant) "replicant" else "borg"
  } yield Robot(id, sentient, name, model)
}
val robot = createRobot.runA(initialSeed).value
// robot: Robot = Robot(
//   id = 13L,
//   sentient = false,
//   name = "Catherine",
//   model = "replicant"
// )

This may seem surprising, but keep in mind that b isn't simply a Boolean. It is a function that takes a seed and returns a Boolean, threading state along the way. Since the seed that is being passed into b changes from line to line, so do the returned Boolean values.

Interleaving effects

Let's expand on the above example; assume that our random number generator works asynchronously by fetching results from a remote server:

import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global

final case class AsyncSeed(long: Long) {
  def next = Future(AsyncSeed(long * 6364136223846793005L + 1442695040888963407L))
}

Using a proper IO data type would be ideal, but Future serves our purpose here.

We now need to redefine our nextLong action:

val nextLong: State[AsyncSeed, Future[Long]] = State { seed =>
  (seed.next, seed.long)
}
// error: type mismatch;
//  found   : scala.concurrent.Future[AsyncSeed]
//  required: AsyncSeed
//   (seed.next, seed.long)
//    ^^^^^^^^^
// error: type mismatch;
//  found   : Long
//  required: scala.concurrent.Future[Long]
//   (seed.next, seed.long)
//               ^^^^^^^^^

Oops! That doesn't work: State[S, A] requires that we return a pure next S in the State.apply constructor. We could define nextLong as State[Future[AsyncSeed], Future[Long]], but that would get us into even more trouble:

val nextLong: State[Future[AsyncSeed], Future[Long]] = State { seedF =>
  seedF.map { seed =>
    (seed.next, seed.long)
  }
}
// error: type mismatch;
//  found   : scala.concurrent.Future[(scala.concurrent.Future[AsyncSeed], Long)]
//  required: (scala.concurrent.Future[AsyncSeed], scala.concurrent.Future[Long])
//   seedF.map { seed =>
//   ^

The seed that State.apply passes in is now a Future, so we must map it. But we can't return a Future[(Future[S], [A])]; so what do we do?

Luckily, State[S, A] is an alias for StateT[Eval, S, A] - a monad transformer defined as StateT[F[_], S, A]. This data type represents computations of the form S => F[(S, A)].

If we plug in our concrete types, we get AsyncSeed => Future[(AsyncSeed, A)], which is something we can work with:

import cats.data.StateT
import cats.instances.future._
import scala.concurrent.ExecutionContext.Implicits.global

val nextLong: StateT[Future, AsyncSeed, Long] = StateT { seed =>
  seed.next zip Future.successful(seed.long)
}

Now, what do we get back if we invoke run on our nextLong action?

nextLong.run(AsyncSeed(0))
// res3: Future[(AsyncSeed, Long)] = Future(Success((AsyncSeed(1442695040888963407),0)))

Since every intermediate computation returns a Future, the composite computation returns a Future as well. To summarize, StateT[F[_], S, A] allows us to interleave effects of type F[_] in the computations wrapped by it.

It should be noted that different combinators on StateT impose different constraints on F; for example, map only requires that F has a Functor instance, but flatMap naturally requires F to have a FlatMap instance. Have a look at the method signatures for the details.

Changing States

More complex, stateful computations cannot be easily modeled by a single type. For example, let's try to model a door's state:

sealed trait DoorState
case object Open extends DoorState
case object Closed extends DoorState

case class Door(state: DoorState)

def open: State[DoorState, Unit] = ???
def close: State[DoorState, Unit] = ???

We would naturally expect that open would only operate on doors that are Closed, and vice-versa. However, to implement these methods currently, we have to pattern-match on the state:

val openImpl: State[DoorState, Unit] = State { doorState =>
  doorState match {
    case Closed => (Open, ())
    case Open   => ??? // What now?
  }
}

This puts us in the awkward situation of deciding how to handle invalid states; what's the appropriate behavior? Should we just leave the door open? Should we return a failure? That would mean that our return type needs to be modified.

The most elegant solution would be to model this requirement statically using types, and luckily, StateT is an alias for another type: IndexedStateT[F[_], SA, SB, A].

This data type models a stateful computation of the form SA => F[(SB, A)]; that's a function that receives an initial state of type SA and results in a state of type SB and a result of type A, using an effect of F.

So, let's model our typed door:

import cats.Eval
import cats.data.IndexedStateT

def open: IndexedStateT[Eval, Closed.type, Open.type, Unit] = IndexedStateT.set(Open)
def close: IndexedStateT[Eval, Open.type, Closed.type, Unit] = IndexedStateT.set(Closed)

We can now reject, at compile time, sequences of open and close that are invalid:

val invalid = for {
  _ <- open
  _ <- close
  _ <- close
} yield ()
// error: type mismatch;
//  found   : cats.data.IndexedStateT[cats.Eval,Open.type,Closed.type,Unit]
//  required: cats.data.IndexedStateT[cats.Eval,Closed.type,?,?]
//   _ <- close
//   ^
// error: type mismatch;
//  found   : Unit => cats.data.IndexedStateT[cats.Eval,Open.type,Nothing,Nothing]
//  required: Unit => cats.data.IndexedStateT[cats.Eval,Open.type,SC,B]
//   _ <- open
//   ^

While valid sequences will be accepted:

val valid = for {
  _ <- open
  _ <- close
  _ <- open
} yield ()
// valid: IndexedStateT[Eval, Closed.type, Open.type, Unit] = cats.data.IndexedStateT@4998e199

Note that the inferred type of valid correctly models that this computation can be executed only with an initial Closed state.

valid.run(Open)
// error: type mismatch;
//  found   : Open.type
//  required: Closed.type
// valid.run(Open)
//           ^^^^
valid.run(Closed)
// res6: Eval[(Open.type, Unit)] = cats.Eval$$anon$1@7a103ba8