Sunday, April 3, 2011

Type Erasure Part II: Bytecode and You

My last post ended in a mystery.  It seems that type erasure can occasionally be circumvented using different return types, despite the fact that return types are not actually part of a method's signature.  I asked this question on Stack Overflow, and I got an excellent response from "irreputable".  

For all of these examples, I'm importing java.util.*.  So here's something that doesn't compile:
 public class Test {
        public int method( List< Integer > list ) {
                return 0;
        }
        public int method( List< Double > list ) {
                return 1;
        }
}



The error is that they "have the same erasure", meaning that the Integer and Double are both replaced with Object, which is how type erasure typically works.  With this in mind, these methods are clearly the same.

Here's something else that doesn't work (bold portions changed from last):
public class Test {
        public int method( List< Integer > list ) {
                return 0;
        }
        public double method( List< Integer > list ) {
                return 1;
        }
}


In this case, the error is actually different: "method is already defined".  This is where things start to get weird.  Return types are not part of method signatures, so that part should be able to be ignored.  Even before type erasure, these methods still have the same type signatures, so this shouldn't work.  Hence, this is why the "method is already defined" is more appropriate than "have the same erasure".


But here's something that does work:
public class Test {
        public int method( List< Integer > list ) {
                return 0;
        }
        public double method( List< Double > list ) {
                return 1;
        }
}


No errors, no warnings, and some simple testing shows that it works exactly like it looks like it should.  The return types aren't part of method signatures, so that can be ignored.  Ignoring return type, this is the exact same method as the first example, which didn't compile for type erasure reasons.  But this inexplicably works.

It turns out that method signatures in bytecode actually do include return types, and that this information can be utilized for overloading, despite the fact that the more basic tutorial-style documentation makes it seem like this doesn't happen.  With this in mind, going through all the examples, the bytecode signatures are as follows (factoring in type erasure):
  1. List -> int; List -> int
  2. List -> int; List -> double
  3. List -> int; List -> double
For #1, this very clearly isn't going to work.  But for #2 and #3, this does appear possible, since there is enough information to differentiate the methods.  #2 doesn't work because the compiler seems to look for duplicate methods before type erasure occurs, and at this stage List< Integer > and List< Integer > are clearly the same type.  #3 gets past this point because List< Integer > and List< Double > are different before type erasure.  Sometime after this point, type erasure occurs, and the conflict of a traditional method signature appears here.  But since return types are technically part of the method signature in bytecode, this isn't an issue - it's possible to store both methods in the bytecode without having a conflict of signatures.  

So the bytecode contains the correct methods.  But how can the compiler know which method is referred to when one calls it?  One could infer this based on how the return type is used, as in double myDouble = method( List< Double > ).  However, we could have just as easily called it like method( List< Double > ), discarding the return value and leaving the compiler without any additional typing information.  So if we can't choose the correct method based on the return type, then how can we choose the method if type erasure makes everything an Object

The answer is that it's only an Object at run time, not compile time.  Since overloading occurs at compile time, this information is available to the compiler.  To my knowledge, however, there is no way to access this information at compile time through code, despite the fact that it is available somewhere.

So what's the problem?  Why can't I overload with different generic type parameters if the compiler can make this judgement call?  The answer is you can, as long as the return types differ.  Again, the problem isn't in the compiler, it's in the actual bytecode/JVM.  It doesn't store generic types in the type signature, so there is no way to differentiate methods by the same name that differ only in generic type parameters.  But since it also stores return type in the method signature (unlike the typical language-neutral definition of "method signature"), it can use this return type to separate the otherwise identical methods.


I've tested this with both Java and Scala, both of which show this behavior, and both of which run on the JVM.  If all of this is correct, then this behavior should be true of all languages that run on the JVM, assuming they allow for method overloading and static typing.  (If they don't then this doesn't apply to you anyway.)


Long story short: return types are part of the method signature in the JVM bytecode.

1 comment: