Thursday, September 23, 2004

Puzzling Through Erasure: answer section

Erasure is the technique that Java 5 uses to implement generics.  Basically, the javac compiler performs its type-checking and then erases, or removes, the generic type information from the runtime representation.  Reification, on the other hand, is the opposite.  A generics implementation based on reification retains the type parameters at runtime as first-class entities, which allows it to perform certain type-based language and reflective operations on type parameters.  Sun chose to implement generics using erasure.  C# implements generics using reification.  A number of questions about erasure and reification have been raised in the context of the new version of the Java language, and misunderstanding and misinformation abounds.  I hope this note will help illuminate the issues.

Why did Sun go with an erasure scheme to implement generics, rather than reifying type parameters as is done in C#? This is perhaps one of the least understood aspects of the Java generics design, but it is a natural conseqence of the design goals of the language extension. The goal was to provide backwards compatibility of both source and object code, and also migration compatibility. The meaning of backwards compatibility is pretty obvious: you want existing source and class files to continue to be legal and continue to mean what they meant before. Although the definition of backwards compatibility is clear, it is actually very hard to change the language without violating it.

The other design constraint is migration compatibility, and this is a big one. I'll explain it by way of an example.

Migration Compatibility

Migration compatibility is the first constraint among the requirements (in section 2.5) for jsr14's design for a generic extension to java:

C1) Upward compatibility with existing code. Pre-existing code must work on the new system. This implies not only upward compatibility of the class file format, but also interoperability of old applications with parameterized versions of pre-existing libraries, in particular those used in the platform library and in standard extensions.

Consider the following software vendors: Company A sells a Java library for manipulating Audio. Company B sells a Java library for controlling electronic Bugles (I think this product would be called Boogie Woogie Boy). Company C sells a software product that is an electronic Bugle which is built on the libraries from A and B. C's customers would normally license all three products.

Along comes JDK5, and these companies naturally want to migrate to generics. Companies A and B are in the comfortable position of being at the bottom of this software stack, and can immediately begin developing new versions of their APIs and libraries that take advantage of the new language features. Being a new release of the product, Company A would naturally also add new features. Their new API is called A2. Customers who require library A, for example because they have not modified their source to take advantage of generics yet, can still license it.

Company B, similarly, ships two libraries B and B2.

What does company C do? First, they cannot migrate to the new version of these libraries until the new libraries are available. And once they are available, C will want to migrate, for two reasons. First, it improves their software development process to use the improved language features. Second, their future customers might license the new (rather than the old) libraries A2 and B2. So company C provides two versions of their product, the old version C based on A and B, and C2 based on A2 and B2.

Shortly after releasing C2 (and settling the inevitable trademark infringement lawsuit with Coca Cola) Company C discovers that it has left a huge number of its customers out in the cold. These are customers who have licensed the new version of one of the libraries A or B, but not new versions of both. Neither of C's products will work with these combinations. So Company C promptly provides C2a and C2b, two new versions of their product for this subset of customers.

Things only get worse as we move up the software stack. First, it will be a very long time before everyone can migrate fully to the new APIs. Second, as a library writer you either require your customers to use the latest version of everything and deprecate your old APIs, or you support an astronomical number of variations of the underlying layers. If you're in the enviable position of controlling the entire software stack then this is no big deal -- except perhaps to your customers. Sun is not in that position and this situation is clearly unacceptable.

This is where migration compatibility comes in. Migration compatibility requires that the generic version of an API be compatible with the old version, in the sense that existing clients will continue to compile (for source) and run (for binaries). This requirement constrains the design space significantly.

In a system that support migration compatibility, Companies A and B can independently retrofit their libraries with generics, and their customers will be able to use the new library without modifying their source code, even without recompiling it. But those customers only gain the benefit of generics in those libraries when they begin placing type parameters on the generic types. But Companies A and B can now ship a single version of their library.

We do not know how to satisfy this constraint in a system that also provides reification, that is, runtime access to type parameters. This is not for lack of trying. In fact, we've explored the design space enough to conjecture that it simply can't be done. The details of that exploration are very interesting but far beyond the scope of this note. Bruce Eckel's injection idea is often proposed as a starting point, as is NextGen. Neither supports migration compatibility.

Backwards compatibility is hard

I mentioned earlier that even though the phrase "backwards compatibility" is easier to understand than migration compatibility, it is not easy to extend the language without breaking compatibility. Let's take, for example, Bruce's suggestion that some future version of Java would begin reifying type parameters and stop using erasure. His idea is that constructor invocations for generic classes would pass one or more extra parameters - type information for the instantiation's type parameters - which would be stored in the instance.

Setting aside the some very difficult implementation and performance issues (do you really want every linked list node to double in size? How do you implement instanceof for generic interfaces? How do you implement array store checks?), this isn't really backward compatible. The fundamental problem is that existing Java 5 class files don't follow that convention, and so won't interoperate with new code. Existing code that instantiates Foo<String> doesn't in fact pass the extra parameter. Existing subclasses of Foo don't pass such data on to the superclass constructor. Existing implementations of a Foo<T> interface don't expect to receive that extra parameter in the constructor.

I would also like to address some other misconceptions Bruce Eckel's two latest blog entries here and here.

Why can't I "new T()"?

For a type parameter T, the reason you can't do a "new T()" is because not every type has a no-arg constructor. I know Bruce is a big fan of what he calls "latent typing", and he would prefer to get such errors at runtime. But Java is a statically typed language, for better or worse, and you simply can't invoke a no-arg constructor on a type that isn't statically known to have one. It's also true that due to erasure there would be no way to generate this code, but for this construct that is beside the point.

What is the point of type parameters if they are erased

Static type checking! To belabor the point, if you're someone who prefers what Bruce calls latent typing, then you care about what you can do with a type at runtime more than the compile-time utility. If that's the case, any extension to the static type system is ho-hum uninteresting and not worth the brain cycles needed to understand it.

But for those of us who want to rely on the compiler to help us get our programs statically correct and prefer our errors be reported early, generics are a huge win. Generics can save you a great deal of work and mental energy that used to be necessary mediating untyped collections with a program that works with concrete types.

How do I create an array of T?

There are two answers - a cheat that isn't typesafe but works in many situations, and a correct answer. Bruce found the cheat, and it's unfortunate that it is widely being copied as the "correct" way to do things. At the risk of sending the wrong message, the cheat is as follows:

 T[] a = (T[]) new Object[N]; // Bad style!

When you do this sort of thing the compiler will rightfully complain. After all, an Object array isn't a T array unless T happens to be Object. You shouldn't ignore this compiler warning. In fact, what you've written here is very dangerous. If you have a method in the API that returns a T[] and you return this value, and you have a client for which T==String who tries to store it into a variable of type String[], you will get a ClassCastException at runtime.

But wait, you say, this kind of thing appears in the Collections implementation! So it must be the approved, accepted, proper approach. No. Mea culpa. I was lazy. It would have been better for me to use an Object array inside the collection, and add a cast (yes, to T) everywhere necessary when an element is removed from the array. In general, this cast is elided by javac and moved into the non-generic client, both by virtue of compiling by erasure. But you still get a complaint from the compiler on each cast. This warning is unavoidable, because at runtime existing clients simply do not provide enough type information to do the cast correctly. That's not due to erasure, it's due to language evolution.

But that's not the correct solution. The correction solution requires that you design your API to be aware of generics (unfortunately not possible when retrofitting an exising class) and relies on using the idea of a type token, which I introduced along with the generification of the core APIs. See section 8 of Gilad Bracha's Generics Tutorial for more details, as well as the checked collection wrappers in JDK5 for a straightforward example. If you're designing an API in which you really, really wish that type parameters were not erased, you can achieve almost the same effect as follows:

  class Generic<T> {
    final private Class<T> tClass;
    public Generic(Class<T> tClass) {
      this.tClass = tClass;
    }
  }

Now you can instantiate using reflection (tClass.newInstance()), create arrays (Array.newInstance), cast (Class.cast), and do instanceof tests (Class.isInstance), though with a slightly different syntax than you might prefer.

This technique allows you to get the effect of what Bruce calls "type injection", almost undoing the erasure yourself. Why do I say "almost"? I'll leave that as a puzzle for you to think about. One hint: just as you feared, it has something to do with erasure.

Why can't I put bounds on an instantiation?

What would it mean? If you were able to make a new List<? extends Foo>(), then you could place into that list a Foo or any subclass of Foo. But a sublcass of Foo is also a Foo, so you could also use a List<Foo> for the same purpose. In short, new List<? extends Foo>() would mean the same thing as new List<Foo>().

To look at it another way, consider method parameters. You can put a type on a method's value parameter, where the parameter's name is defined, but not on an argument to a method call, which is an expression. The same is true of type parameters and arguments: you can put a bound on the type parameter, where its name is defined, but not on the type argument.

Why do you say that C# breaks compatibility with their generics system?

The short answer is: I don't.

Microsoft is essentially deprecating the old collections classes by introducing a new, preferred set of generic collections classes. They are creating the kinds of problems described in the Company A-B-C scenario, only it might just be less of a problem for them because they control so much more of the software stack. I do not claim that they break backward compatibility. In fact, generally speaking I prefer reification when it doesn't conflict with the design goals. The other disadvantage (besides abandoning migration compatibility) that I see in the C# approach is that it is hard to reconcile with both the bipartite type system and wildcards, though perhaps that problem could be solved.

You might not agree with migration compatibility as a goal for the generics extension to Java. That would have been very useful feedback in 1999 when jsr14 was formed, and in fact the issue was discussed at great length at the time. Whatever your feelings, I hope you agree that now, a month before the product ships, is a poor time to reconsider the most fundamental design goals that were set five years earlier for the product and on which the existing design and implementation depends deeply.

9 comments:

Anonymous said...

Thanks for the clarification.

-tinou

Anonymous said...

Hi Neal,

I re-read your explanation. The thing that I would like to know is how common is the migration compatibility issue you talk about. I personally haven't faced this issue (using multiple vendor libraries), and would like to know if this is base on actual cases or more of dealing with potential problems.

- tinou

Neal Gafter said...

If we had actual cases where this had occurred then it would be too late to do anything about it.

Anonymous said...

oops, i re-read my question and of course there are no actual cases. what i was trying to gauge was the reasons/origins behind the upward compatibility requirement, whether it was base on actual past experiences (in another language) or whether this really hasn't been a problem in programming/software but Sun felt it might be a problem.

-tinou

Anonymous said...

The problem of migration is, I think, universal among any developer of a large system. Even if your code is entirely autochthonous, it would not be practical to migrate your whole codebase to effectively a new language in one shot. The existing code must be maintained, and all the changes have to be ported over, and then some day, there has to be a big jump. It just isn't realistic.

What would really happen is that you would establish a separate project for the migration; over a period of time, the original code would drift, some changes would be successfully ported, others wouldn't, and then when the Big Day arrived, a lot of features would be lost and your customers would be screaming. Or more likely, the migration project would simply be cancelled.

Jordan Zimmerman said...

Here and elsewhere I've seen your comment (and others') that the design goals required erasure. If so, then the design goals were severly flawed. Migration is a niche concern totally outweighed by how significantly hobbled Java Generics is with erasure.

Having started to use Java Generics, I realize that it's a lot of trouble with very little benefit. I'm afraid that this is terribly botched.

bolan said...

This is very cool.
------------------

Anonymous said...

Time tells: I'm working with a large 1.4 codebase and I say it's wonderful to just pretend it was generified. OTOH, I feel little use for reified type parameters (very occasionally I have to pass a class object). I say you made a very reasonable choice.

"Unchecked" warnings are too prominent, though. I can't sprinkle @SuppressWarnings everywhere. Must ... tame ... IDE.

Matthias

Craig S. said...

Well, I highly disagree with the decision to use erasure over reification, but it's been in Java for at least 3 different releases now (JRE 1.7 is available now), so I guess it's more or less water over the dam.

I wanted to clarify one point in regards to MS's decision to use reificiation with .NET. While it is true that the decision to use reification does cause "breaking changes" for forward migration of code, MS handled the problem in a way that greatly reduced this problem and did not require significant changes to existing code. As you mentioned, they implemented all new generic collections in a new namespace and prefers that developers use these new collection classes over the older non-generic collection classes. How can they do this and still keep forward migration problems to a minimum? In their BCL, every generic class that implements a generic version of a previously existing collection class implements two interfaces: the non-generic interface and the new generic interface; i.e. ICollection<T> and ICollection, for example. In this way, the BCL generic collections are "forward migration" compatible with older code.

It is also MS's suggestion to API and framework developers that when developing your own generic classes that you implement both the generic interfaces as well as the non-generic interfaces in order to have the highest compatibility.

This was a great decision on their part because it does provide for forward migration (as far as I can see). Since the old collection classes are still present, older code can still use them. And since the new collection classes implement the older interfaces, they can still "act like" older collections, when needed.

The bad part, if you can say that, is that now programmers must be sure to implement both interfaces. But I see this as a very small trade-off in order to have reification which provides me not only with static type-checking at compile time, but also strongly-type objects at runtime.