Thursday, February 10, 2011

The JVM and Type Erasure

Type erasure is the term for what is, in my opinion, one of the dirtiest hacks in existence.  Before getting into that, let's start with a story.

Originally, Java did not have support for generics.  That meant that code like:
List< Integer > list = new ArrayList< Integer >();
list.add( new Integer( 4 ) );
System.out.println( list.get( 0 ).intValue() + 5 );

...would instead be...
List list = new ArrayList();
list.add( new Integer( 4 ) );
System.out.println( ((Integer)list.get( 0 )).intValue() + 5 );

Ew.  Ew.  Granted, the type definition is shorter, but that's only because it has less information to store to begin with.  Then we have to do this ugly cast.  Not only that, we're free to do such terrible things as:
list.add( new Integer( 4 ) );
list.add( list );
list.add( new Random() ); 

The compiler puts absolutely no restriction on what goes in, as long as its an object.  This can obviously lead to bugs, as someone can end up with a heterogeneous collection of objects if one isn't careful.  Though this can be checked at compile time, this check is deferred until the class cast at run time.  Just ew overall.

In Java J2SE 5.0, this all changed.  Support for generics was included, and the people cheered.  But, at what cost?

The JVM can't do this sort of magic.  (I've heard that .NET can, hence why this post is titled "The JVM and Type Erasure".  I've never dabbled with any of the .NET languages, though I'd consider this a plus for them.)  Java runs on the JVM, so then Java can't do it either.

But it does do it, right?  I mean, it has to.  We have type-safe collections and generic types!

No, it's lies, all lies!  Yes, there is a check performed, and this check asserts type safety.  However, the semantics are identical.  In other words, you're wonderful, generic-using type-safe code below:
List< Integer > list = new ArrayList< Integer >();
list.add( new Integer( 1 ) );
list.add( new Integer( 2 ) );
list.add( new Integer( list.get( 0 ).intValue() + list.get( 1 ).intValue() ) );

...is actually converted into this at compile time:
List list = new ArrayList();
list.add( new Integer( 1 ) );
list.add( new Integer( 2 ) );
list.add( new Integer( ((Integer)list.get( 0 )).intValue() + ((Integer)list.get( 1 )).intValue() ) )


It's a trick!  A dirty trick!  Yes, it's type safe, but this safety is asserted by the compiler at a certain stage of compilation.  After this stage, it technically isn't, but the previous stage made sure that nothing bad would happen from these class casts.

Maybe I'm being too hard on it.  It is type safe, and it accomplishes roughly the same thing as a C++ template.  Who cares that it's implemented in such a way, besides maybe someone obsessed with performance?

Well, the evil does descend into other places.  For example, even though one has to pass solid types to instantiate a generic type, this typing information isn't actually available to the programmer.  Consider the type variable "T" in a generic class.  Even though "T" must be instantiated to some actual type to form instances of the class, the person writing the class has no idea what "T" is at compile time.  For example, consider the following overloaded method definition:
public int myMethod( List< Integer > list ) {...}
public int myMethod( List< String > list ) {...}


If you try to compile this, javac greets you with the following error message:
name clash: myMethod(java.util.List<java.lang.String>) and myMethod(java.util.List<java.lang.Integer>) have the same erasure

This is because after type erasure, these two methods actually both look like:
public int myMethod( List< Object > list ) {...}
 
...because erasure simply replaces whatever the type is with "Object".

A related issue is that Java doesn't let you make a generic array.  For instance, if you try:
public class Test< T > {
  public T[] array = new T[5 ];

  ...
}

...javac greets you with:
generic array creation

So what if you want to create a type safe, generic array?

You don't.  Yay Java.

Well, ok, that's not *quite* true, but it's close.  First, you're strongly encouraged to use something that's already type safe like an ArrayList, which can do pretty much everything an array can do.  Then if you're still complaining, people tell you to use newInstance, as part of the Array class.  newInstance will return a type safe array, but you need to pass it a Class<T> object.  How do you get this?  You explicitly specify it.  Seriously?  Seriously?  This is not a solution, this is just asking for more errors.  Effectively, you end up specifying it twice, once as < ClassName > for the generic type, and a second time as ClassName.class for the class object correlating to the class.  There is nothing from stopping you from specifying < String > and Integer.classSo you end up with a type safe array, but it has been replaced with a very type unsafe operation.  So much ick.

Don't get me wrong.  Generics are better than no generics.  I'm just saying that there are a tremendous number of associated gotchas.

1 comment: