Thursday, July 05, 2007

Constructor Type Inference

One of the ideas for improving the Java Programming Language is "type inference" on variable declarations. The idea is to simplify a pattern of code that now appears in programs due to generics:

Map<String,List<Thing>> map = new HashMap<String,List<Thing>>();

surely we shouldn't have to give the same type parameters twice? The simplest proposal to relieve this redundancy allows

map := new HashMap<String,List<Thing>>();

This introduces the new colon-equals token and the declaration-assignment statement. The variable appearing on the left-hand-side of the statement is implicitly defined by this statement, and its type is the type of the expression on the right-hand-side. I don't like this proposal. It both goes too far and not far enough.

It goes too far in that it allows the programmer to elide the type in a variable declaration. The type in a variable declaration is valuable documentation that helps the reader understand the program, and this proposal reduces the readability of programs by allowing it to be elided. Worse, it assigns the wrong type to the variable. Following Effective Java (first edition, item 34), the type of a declared variable should be an interface type. This statement form forces the variable to be of the (likely more specific) type of the right-hand-side. Consequently, the programmer may inadvertently depend on features of the concrete implementation class when using the variable. That would make it more difficult to modify the program later by selecting a different implementation type.

This syntax doesn't go far enough because the verbosity of creating generic classes is worth eliminating in other contexts as well. Programmers today work around the verbosity by providing static factory methods corresponding to constructors:

static <K,V> HashMap<K,V> makeHashMap() {
    return new HashMap<K,V>();
}

This addresses the immediate problem:

Map<String,List<String>> map = makeHashMap();

Unfortunately, this idiom replaces one form of boilerplate (in variable initialization) with another: trivial static factories. A generic class is typically created more than once, so adding a single static factory can simplify the code at every creation site. But with language support, we can do better.

I propose a new form of class instance creation expression:

Map<String,List<Thing>> map = new HashMap<>();

Using empty type parameters on a class instance creation expression asks the language/compiler to perform type inference, selecting appropriate type parameters exactly as it would in the invocation of the equivalent trivial static factory.

Type inference today works on the right-hand-side of an assignment. I also propose that we enable this new form to be used in more situations by improving type inference for expressions appearing in other contexts:

  • the argument of a method call
  • the receiver of a method call
  • the argument of a constructor
  • the argument of an alternate constructor invocation

This would enable generic methods to be invoked in these contexts without providing explicit type parameters.

36 comments:

Ricky Clarkson said...

Map<A,B> map=new HashMap(); already works, but gives a compile warning. One could merely disable that warning to achieve the same result.

Anonymous said...

Yes, I like it. Inferring things wrongly without asking me is bad, but inferring things when I've asked it to is good.

Rgds

Damon

Anonymous said...

That would make the life a lot easier. On the other hand, you could introduce your own type, such as

public class NameThingMap extends HashMap <String,List<Thing>> {
...
}

--

BTW, I'm becoming less and less sceptic about the Closures concept. But that's a different topic.

Anonymous said...

Neal,
if I recall it correctly there were plans to drop raw types from future Java versions, thus in

Map map = new HashMap<String,List<String>>

type inference could be applied to supply missing type arguments in variable type declaration. This would allow to infer types not only for new expressions, but for any initializer. Also getting a substitution for supertype from subtype is always possible, while it will not always work in the opposite direction. Also there still could be the language feature to omit type at all, if the class is the same as initializer class.

As for additional type inference for method arguments, to me this is one of the most user-annoying decisions made during initial generics design. It should be possible to infer from method argument in case of unique method overload candidate, overload resolution, on the other side, should probably be done based on type parameter bounds.

Eugene.

Anonymous said...

The advantage of the first proposal is that it is not restricted to instance creation.
The construct can be used to store the return value of a method call for example which I think is more common:
    map := getMyThings();

Steven Brandt said...

I think it would be nice if one could use

Map<> map = new HashMap<String,List<Thing>>();

as well as

Map<String,List<Thing>> map = new HashMap<>();

Maybe you intended this?

(blank) said...

This is awesome. I hate the := syntax because it reminds me of Pascal/Delphi and is too foreign to the C-Java syntax. However, what you said makes perfect sense and am 100% for it.

Neal Gafter said...

@Ricky: the existing warning when you use a raw type sometimes represents a real type error. Suppressing the warning would undermine the type system.

Unknown said...

The following idiom:
...
Map<String,Integer> map = new HashMap<String,Integer>();
for (Map.Entry<String,Integer> entry: map.entrySet()) {

...
could be replaced by:
...
Map<String,Integer> map = new HashMap<>(); // inferred type in the constructor
for (Map.Entry<> entry: map.entrySet()) { // inferred type in the declaration
...


(I've not taken into account the Closures proposal here).

Anonymous said...

Neal,

I have some memory that at some point in the past you favored the syntax:

Map<String,List<Thing>> foo = Map.new();


And to have the compiler generate that new() method. Care to elaborate why you settled on the <> version?

Anonymous said...

This seems like a great improvement! It solves a problem that has been bothering every Java programmer since generics were added to the language, and it does so with little in the way of added complexity. I would love to see it happen soon, so that APIs are not bloated with static factories that become superfluous in conjunction with the proposed syntax.

I am very much in agreement that variable types should be declared, not inferred. In other words, I think the ":=" proposal and others of its ilk would do great harm to the language.

Anonymous said...

I have always disliked the replication of type names in Java, and in C++, where there is a clear type inference possible. I see no reason why delcarations cannot be:

def x = Map<X,Y>() ;

there is the marker showing this to be a declaration but no replication of information.

C++ is heading down this road with the auto type, I think Java should follow the lead.

(Is Java going to become more and more like Scala over time? :-)

Jonathan Allen said...

> Worse, it assigns the wrong type to the variable. Following Effective Java (first edition, item 34), the type of a declared variable should be an interface type.

I don't know if the author is misundertanding the book or if the book itself is off-base, but this recommendation is foolhearty to say the least.

If you are working with local variables, you gain nothing by declaring it to be an interface type. Moreover, you are forced to use casting to access any of the non-interface methods on said type.

If used on a public class variable, that is one marked public or exposed via a setter, it gets far worse. You can never use methods that aren't in the interface, even if you later add them to the type the class actually expects to use. You have basicallly killed any chance for future enhancements.

Interfaces were meant for very generic helper functions and, in Java, to serve as markers. COM has taught us that using interfaces as the only public API on your classes is very painful.

Unknown said...

I propose that every class implicity have a static method overloaded for each constructor named 'new'.
(yes the kewyword to avoid collision)

Using these we get rig of the constructor type parameters by inference:
Map<K,V> map;
map = HashMap.new();
map = HashMap.new(8);
map = HashMap.new(map);

one could also do:
import static java.util.HashMap.new;

Map<K,V> map;
map = new();
map = new(8);
map = new(map);

But only for one class, otherwise the inference would probably stop working..

Unknown said...

Looks good Neal. You got my vote. I really hope we don't get some "var :=" stuff into the language.

Cheers,
Mikael Grev

Brian Duff said...

@Jonathan

It always struck me that the way this particular item is described in Effective Java is less than ideal, even though I completely agree with what I think is its primary point.

Although it refers to interfaces, I think it's using the term "interface" in a slightly different sense from the Java interface construct. I believe the spirit of this point in EJ is, in a nutshell: "code against the most abstract type you can get away with".

In the collections API, the most abstract type you can get away with often happens to be an actual interface. For example, you construct an ArrayList, but store it in a variable of type List or Collection. You do this essentially because of encapsulation. If nothing that uses that variable needs to know that your object is an ArrayList (or even a List), why expose that implementation detail?

But this idea can equally be applied to abstract (or even concrete) super types.

Anonymous said...

What about simply:


Map<String, List<String>> map = new HashMap();



I know, this is possible today, but it generates a compile warning "unchecked conversion". Nevertheless compiler can infer now the parameterized types to the constructor, code above won't generate compile time warning above.

Any thoughts?

Stefan Schulz said...

"the existing warning when you use a raw type sometimes represents a real type error. Suppressing the warning would undermine the type system."

Maybe, I am missing the point here. If I write 1.5+ code, and let the RHS infer generic types, why wouldn't Ricky's proposal work as desired?
1. If there are no Generics to infer from, a raw type warning may be given.
2. If there are Generics to infer from, the construction will result in the according parameterized type.

Stephen Colebourne said...

Yes, I fully support the <> notation for type inference. I believe it to be in the style of Java, and a simple conceptual language extension.

And its no surprise I like it. After all, I was proposing it back at Javapolis 2006 (whiteboard 8) and on my blog ;-)
Stephen

Unknown said...

I changed my mind and second the proposal with empty type parameters:

Map<K,V> map = new HashMap<>();

Actually, I really like the <> + type inference. Doesn't hide to much and syntax is improved. So count me in for type-inference-where-possible. :-)

Unknown said...

"this proposal reduces the readability of programs by allowing it to be elided"

How can you possibly say that the current, verbose, redundant version is more readable? IMO, inferred types are the way to go. While types are useful when looking at code, there is no reason to have to look at them all the time. There could be an option in IDE's to hide all the type information.

Anonymous said...

Neal,

I realize my comment above was redundant (with Ricky's but didn't really saw his) by proposing:

Map<String, List<String>> = new HashMap();

and you answered with "the existing warning when you use a raw type sometimes represents a real type error. Suppressing the warning would undermine the type system."

but mixing raw types with generics is pointless as it brings no value whatsoever since this is not type-safe. AFAIK the reason of allowing mixing raw types with generics is backward compatibility and incremental migration from old code to static type-safe code. But as far as constructors goes I don't really see the point of allowing this.

I think the sequence <>() is a inexpressive measure of solving this problem.Furthermore var := is a lot worse.

Anonymous said...

The problem with the syntax below is it is ambiguous:

Map<String, List<String>> = new HashMap();

If Java ever gets reified generics (i.e. erasure is removed and the types are kept at runtime) then:

Map<String, List<String>> = new HashMap<>(); and
Map<String, List<String>> = new HashMap();

mean two different things. The former means give me a typesafe map of String->List<Integer> and the later means give me a raw map (i.e. it compiles purely for backwards compatibility).

The last thing we want to do is make syntax changes that shut any doors to future reification of generics.

Anonymous said...

lordpixel,

The whole point is that mixing generics with raw types (especially for constructors) is pointless regardless if reification will be adopted. Why would anyone want to construct a raw type and reference it by a parameterized type? Makes no sense.

(blank) said...

Neal, what do you do with all your language proposals? Are you going to create a JSR and submit them? As far as I can tell, JDK 7 is only getting bug fixes and no language improvements (yet).

koenhandekyn said...

i often thought that java should include a typedef equivalent. this very simple construct would already help (readability) a lot!

this remains complimentary to your proposal which on first sight looks interesting.

going a step further I would even like something like below - as a playthought :

Map<> map = new HashMap<String, new LinkedList<Thing>()>()

let me explain:

1. the inferred type of map is Map<String, List<Thing>>. the rule is that types get inferred automatically to the most specific interface

2. instead of passing a TYPE variable one can pass an INSTANCE variable that is used to derive the type and is at the same time used as a default, so less nullpointer exceptions with mixed collections ;)

Unknown said...

That's the good one. +1
Please no ":=" with implicit typing...
And I really like the fact that the syntax can be used almost anywhere generics declaration are redundant like in Edson comment:

...
Map<String,Integer> map = new HashMap<>(); // inferred type in the constructor
for (Map.Entry<> entry: map.entrySet()) { // inferred type in the declaration
...


For me, this proves the beauty of this language syntax.

Neal Gafter said...

@steven, edson, koen, frederick: No, I am not proposing the <> syntax be allowed on the type of a declaration, such as in a for-each loop or a variable declaration. I am only proposing that syntax for an instance-creation-expression. The reason is that the type of the variable is important documentation about the program, even for local variables.

Anonymous said...

Map<String, List<Thing>> map = new HashMap<>();

is pretty great, it's subtle and shouldn't be in anyone's way.

"I have always disliked the replication of type names in Java, and in C++, where there is a clear type inference possible. I see no reason why delcarations cannot be:

def x = Map<X,Y>();"


The possibility is irrelevant. The declared variable Type is takes precidence over any value assigned to it. A variable can only be declared once, a value can be assigned more then once. And the type is the most solid factor(changes the least); even if you assign the value only once the Type/implentation of the assignment changes more often then the time of the variable.

Map<Type> foo = new HashMap<>()
Map<Type> foo = new TreeMap<>()


As far as single point of definition goes and figuring out where the Genetric's are defined. I'd want the type to be declared as I declare the variable.

As for the for loop as proposed in the comments, the above applies too, although less pressing. However because I'm against the above I can't be in favor of the for syntax as it would be conflicting/inconsistent. (beeing allowed to use <> on the left side).

Anonymous said...

I concur wholeheartedly with the
Map<Foo,Bar> map = new HashMap<>();
syntax proposal!

I think the
def map = new HashMap<Foo,Bar>();
is a *very* bad idea. It pushes the type of map too far into the line making it hard to find. It starts a slippery slope whereon the next request will be for
def map = foo.bar();
where the type of map is to be determined by the return type of foo.bar(), which leads to horrific fragility when someone changes bar(). It also makes you hunt around to find the type when trying to read the code. Further it introduces another keyword, "def".

I also find the ":=" syntax to be a non-starter -- it's just too inconsistent with Java's other syntax.

Unknown said...

Don't you think this is a problem with, say:

Collection<? extends Number> coll = new LinkedList<>()?

How will it work with these?

Neal Gafter said...

avah: It will work precisely the way it would work currently using a static factory; you'll get a LinkedList<Number>.

outlook said...

after typing in "new HashM" you can push control+space and eclipse will fill in the remaining part with the correct type parameter. For my purposes this is sufficient.

Cheers, Oliver

Anonymous said...

The original proposal

map := new HashMap<K,V>;

is not such a bad thing. If accepted ,this can make our code a lot more cleaner (even for cases where we do not use generics). The type of 'map' becomes HashMap which is fine for most of the use cases. Just take any method level variable where we do in-place initialization this can cut down the amount of code drastically.

Unknown said...

I'd like to have full type inference in foreachs like so:

...
for (entry: map.entrySet()) { // inferred type in the declaration
...

Also I like to have inferred return types where possible (i.e. not for abstract or recursive methods)

none@none.com said...

And why not?

id map := new HashMap<String,List<Thing>>();

you can easily solve the lack of type by using a void pointer, and get all methods to solve the id signature.

this is somehow the most nicer way, id map := and you explicitly ask for a type inference, anyway I am not a Java guy, just interested in language semantic, your proposition is the scripting languages way? is Java a scripting language? it might be.

8-)

Best,