Existential types are not raw types

by Stephen Compall on Feb 26, 2015

technical

While this blog is typically strictly for Scala developers interested in strongly-typed programming, this particular article is of interest to Java developers as well. You don’t need to know Scala to follow along.

Scala makes a welcome simplification in its type system: type arguments are always required. That is, in Java, you may (unsafely) leave off the type arguments for compatibility with pre-1.5 code, e.g. java.util.List, forming a raw type. Scala does not permit this, and requires you to pass a type argument.

The most frequent trouble people have with this rule is being unable to implement some Java method with missing type arguments in its signature, e.g. one that takes a raw List as an argument. Let us see why they have trouble, and why this is a good thing.

Existentials are safe, raw types are not

Stripping the type argument list, e.g. going from java.util.List<String> to java.util.List is an unsafe cast. Wildcarding the same type argument, e.g. going from java.util.List<String> to java.util.List<?>, is safe. The latter type is written java.util.List[_], or java.util.List[T] forSome {type T}, in Scala. In both Java and Scala, this is an existential type. As compiled with -Xlint:rawtypes -Xlint:unchecked:

import java.util.Arrays;
import java.util.ArrayList;
import java.util.List;

public abstract class TestEx {
    public static List<String> words() {
        return new ArrayList<>(Arrays.asList("hi", "there"));
    }

    // TestEx.java:17: warning: [rawtypes] found raw type: List
    //  missing type arguments for generic class List<E>
    //  where E is a type-variable:
    //    E extends Object declared in interface List
    //                  ↓
    public static final List wordsRaw = words();

    // there is no warning for this
    public static final List<?> wordsET = words();
}

Also note that there is no warning for the equivalent to wordsET in Scala. Because it, like javac, knows that it’s safe.

scala> TestEx.words
res0: java.util.List[String] = [hi, there]

scala> val wordsET = TestEx.words : java.util.List[_]
wordsET: java.util.List[_] = [hi, there]

Raw Types are bad. Stop using them

The reason that existentials are safe is that the rules in place for values of existential type are consistent with the rest of the generic system, whereas raw types contradict those rules, resulting in code that should not typecheck, and only does for legacy code support. We can see this in action with two Java methods.

public static void addThing(final List xs) {
    xs.add(42);
}

public static void swapAround(final List<?> xs) {
    xs.add(84);
}

These methods are the same, except for the use of raw types versus existentials. However, the second does not compile:

TestEx.java:26: error: no suitable method found for add(int)
        xs.add(84);
          ^
    method Collection.add(CAP#1) is not applicable
      (argument mismatch; int cannot be converted to CAP#1)
    method List.add(CAP#1) is not applicable
      (argument mismatch; int cannot be converted to CAP#1)
  where CAP#1 is a fresh type-variable:
    CAP#1 extends Object from capture of ?

Why forbid adding 42 to the list? The element type of list is unknown. The answer lies in that statement: its unknownness isn’t a freedom for the body of the method, it’s a restriction. The rawtype version treats its lack of knowledge as a freedom, and the caller pays for it by having its data mangled.

public static void testIt() {
    final List<String> someWords = words();
    addThing(someWords);
    System.out.println("Contents of someWords after addThing:");
    System.out.println(someWords);
    System.out.println("Well that seems okay, what's the last element?");
    System.out.println(someWords.get(someWords.size() - 1));
}

And it compiles:

TestEx.java:23: warning: [unchecked] unchecked call to add(E) as a
                         member of the raw type List
        xs.add(42);
              ^
  where E is a type-variable:
    E extends Object declared in interface List

But when we try to run it:

scala> TestEx.testIt()
Contents of someWords after addThing:
[hi, there, 42]
Well that seems okay, what's the last element?
java.lang.ClassCastException: java.lang.Integer cannot be cast to
                              java.lang.String
  at rawtypes.TestEx.testIt(TestEx.java:32)
  ... 43 elided

It is a mistake to think that just because some code throws ClassCastException, it must be to blame for a type error. This line is blameless. It is the fault of the unchecked cast when we called addThing, and more specifically, the unsafe assumption about the List’s element type that was made in its body.

Existentials are much better

When we used the wildcard, we were forbidden from doing the unsafe thing. But what kinds of things can we do with the safe, existential form? Here’s one:

private static <E> void swapAroundAux(final List<E> xs) {
    xs.add(xs.get(0));
}

public static void swapAround(final List<?> xs) {
    swapAroundAux(xs);
}

In other words: let E be the unknown element type of xs. xs.get() has type E, and xs.add has argument type E. They line up, so this is okay, no matter what the element type of xs turns out to be. Let’s try a test:

scala> val w = TestEx.words
w: java.util.List[String] = [hi, there]

scala> TestEx.swapAround(w)

scala> w.get(w.size - 1)
res1: String = hi

The body of swapAround is guaranteed not to mangle its argument by the type checker, so we, as a caller, can safely call it, and know that our argument’s type integrity is protected.

Scala has more features to let us get away without swapAroundAux. This translation uses a lowercase type variable pattern to name the existential. To the right of the =>, we can declare variables of type e and use e to construct more types, while still referring to the _ in the xs argument’s type. But in this case, we just do the same as swapAroundAux above.

def swapAround(xs: java.util.List[_]): Unit =
  xs match {
    case xs2: java.util.List[e] => xs2.add(xs2.get(0))
  }

Crushing the existential

Let’s consider the xs.get() and xs.add methods, which have return type and argument type E, respectively. As you can’t write the name of an existential type in Java, what happens when we “crush” it, choosing the closest safe type we can write the name of?

First, we can simplify by considering every existential to be bounded. That is, instead of E, we think about E extends Object super Nothing, or E <: Any >: Nothing in Scala. While Object or Any is the “top” of the type hierarchy, which every type is a subtype of, Nothing is the “bottom”, sadly left out of Java’s type system, which every type is a supertype of.

For get, the E appears in the result type, a covariant position. So we crush it to the upper bound, Any.

scala> wordsET.get _
res2: Int => Any = <function1>

However, for add, the E appears in the argument type, a contravariant position. So if it is to be crushed, it must be crushed to the lower bound, Nothing, instead.

scala> (wordsET: java.util.Collection[_]).add _ : (Any => Boolean)
<console>:12: error: type mismatch;
 found   : _$1 => Boolean where type _$1
 required: Any => Boolean
              (wordsET: java.util.Collection[_]).add _ : (Any => Boolean)
                                                 ^
scala> (wordsET: java.util.Collection[_]).add _ : (Nothing => Boolean)
res8: Nothing => Boolean = <function1>

Each occurrence of an existential in a signature may be crushed independently. However, a variable that appears once but may be distributed to either side, such as in a generic type parameter, is invariant, and may not be crushed at that point. That is why the existential is preserved in the inferred type of wordsET itself.

scala> wordsET
res9: java.util.List[_] = [hi, there]

Herein lies something closer to a formalization of the problem with raw types: they crush existential occurrences in contravariant and invariant positions to the upper bound, Object, when the only safe positions to crush in this way are the covariant positions.

How do List and List<?> relate?

It is well understood that, in Java, List<String> is not a subtype of List<Object>. In Scala terms, this is because all type parameters are invariant, which has exactly the meaning it had in the previous section. However, that doesn’t mean it’s impossible to draw subtyping relationships between different Lists for different type arguments; they must merely be mediated by existentials, as is common in the Java standard library.

The basic technique is as follows: we can convert any T in List<T> to ? extends T super T. Following that, we can raise the argument to extends and lower the argument to super as we like. A ? by itself, I have described above, is merely the most extreme course of this formula you can take. So List<T> for any T is a subtype of List<?>. (This only applies at one level of depth; e.g. List<List<T>> is not necessarily a subtype of List<List<?>>.)

Does this mean that List is a subtype of List<?>? Well, kind of. Following the rule for specialization of method signatures in subclasses, we should be able to override a method that returns List<?> with one that returns List, and override a method that takes List as an argument with one that takes List<?> as an argument. However, this is like building a house on a foam mattress: the conversion that got us a raw type wasn’t sound in the first place, so what soundness value does this relationship have?

The frequent Java library bug

Let’s see the specific problem that people usually encounter in Scala. Suppose addThing, defined above, is an instance member of TestEx:

class TestEx2 extends TestEx {
    @Override
    public void addThing(final List<?> xs) {}
}

Or the Scala version:

class TestEx3 extends TestEx {
  override def addThing(xs: java.util.List[_]): Unit = ()
}

javac gives us this error:

TestEx.java:48: error: name clash: addThing(List<?>) in TestEx2 and
                addThing(List) in TestEx have the same erasure, yet
                neither overrides the other
    public void addThing(final List<?> xs) {}
                ^
TestEx.java:47: error: method does not override or implement a method
                from a supertype
    @Override
    ^

scalac is forgiving, though. I’m not sure how forgiving it is. However, the forgiveness is unsound: it lets us return less specific types when overriding methods than we got out.

How to fix it

  1. Stop using raw types.

  2. If you maintain a Java library with raw types in its API, you are doing a disservice to your users. Eliminate them.

  3. If you are using such a library, report a bug, or submit a patch, to eliminate the raw types. If you add -Xlint:rawtypes to the javac options, the compiler will tell you where you’re using them. Fix all the warnings, and you’re definitely not using raw types anymore.

  4. Help Java projects, including your own, avoid introducing raw types by adding -Xlint:rawtypes permanently to their javac options. rawtypes is more serious than unchecked; even if you do not care about unchecked warnings, you should still turn on and fix rawtypes warnings.

You may also turn on -Xlint:cast to point out casts that are no longer necessary now that your types are cleaner. If possible, add -Werror to your build as well, to convert rawtypes warnings to errors.

Why not just add wildcards automatically?

Adding wildcards isn’t a panacea. For certain raw types, you need to add a proper type parameter, even adding type parameters to your own API. The Internet has no copy and paste solutions to offer you; it all depends on how to model your specific scenario. Here are a few possibilities.

  1. Pass a type argument representing what’s actually in the structure. For example, replace List with List<String> if that’s what it is.

  2. Pass a wildcard.

  3. Propagate the type argument outward. For example, if you have a method List doThis(final List xs), maybe it should be <E> List<E> doThis(final List<E> xs). Or if you have a class Blah<X> containing a List, maybe it should be a class Blah<A, X> containing a List<A>. This is often the most flexible option, but it can take time to implement.

  4. Combine any of these. For example, in some circumstances, a more flexible version of #3 would be to define Blah<A, X> containing a List<? extends A>.

Wildcards and existentials are historically misunderstood in the Java community; Scala developers have the advantage of more powerful language tools for talking about them. So if you are unsure of how to eliminate some raw types, consider asking a Scala developer what to do! Perhaps they will tell you “use Scala instead”, and maybe that’s worth considering, but you’re likely to get helpful advice regardless of how you feel about language advocacy.

The Scala philosophy

As you can see, the Java compatibility story in Scala is not as simple as is advertised. However, I favor the strong stance against this unsound legacy feature. If Scala can bring an end to the scourge of raw types, it will have been worth the compatibility trouble.

This article was tested with Scala 2.11.5 and javac 1.8.0_31.