Saturday, February 13, 2010

A Syntax Option for Project Lambda (Closures for Java)

It was noted recently on the Project Lambda mailing list that allowing arrays of function type would undermine the type system. The reason for this is a combination of Java's covariant arrays, the natural subtypes among function types (they are covariant on return type and contravariant on argument types), exception checking, and the erasure implementation of generics. We certainly can't remove covariant arrays or checked exceptions, and removing the subtype relationship among function types really reduces their utility. Unfortunately, it is almost certainly too late to reify generics too. Although you can't have an array of function type, there is no problem having, for example, an ArrayList of a function type. So while we might prefer not to have this restriction, it won't be much of a problem in practice.

There are many ways in which arrays of function type would undermine the type system, but to give you some flavor of the problem, the following is a method written assuming that arrays of function type are allowed. This method must fail in some way - either fail to compile, or throw a ClassCastException, or an ArrayStoreException, or something. In all of the implementations being considered, this method would throw IOException, even though that isn't declared in the throws clause of the method. We have undermined Java's exception checking without even a cast!

public void main(String[] args) {
    #void()[] arrayOfFunction = new #void()[1];
    Object[] objectArray = arrayOfFunction;
    objectArray[0] = #(){ throw new IOException(); };
    arrayOfFunction[0].(); // invoke the function out of the array
}

The realization that supporting arrays of function type would introduce a loophole in the type system makes it possible to consider some very nice syntax options that were previously eliminated because there was no syntactic way to write an array of function type. But if we disallow arrays of function type, those options can be considered.

My favorite of the alternatives is based on this grammar

FunctionType:
( TypeListopt Throwsopt ) -> ResultType
Expression:
LambdaExpression
LambdaExpression:
( FormalParameterListopt ) -> Expression

A function type is written with the argument types between parentheses, then a right arrow (dash greater-than), and then the result type.

The most important distinction between this proposal and the existing draft specification is that this proposal places the result type after the argument types, instead of before. The benefits of its simplicity compared to the current specification are most obvious when you combine function types and lambdas with other features.

Here is a simple example to show how the two proposals look. The example is currying. The following method takes as a parameter a function of two arguments, and returns a function of one argument that returns a function of one argument. Applying the resulting function to one value, and then the result of that to another value, should have the same effect as applying the original function to both values. To make it interesting, we make the argument and result types generic, and we allow the function to throw an exception.

Using the currently proposed syntax for Project Lambda, the code would look something like this:

static <T,U,V,X extends Throwable>
##V(U)(throws X)(T) curry(#V(T,U)(throws X) function) {
   return #(T t)(#(U u)(function.(t,u)));
}

On the other hand, with the proposal described above it looks something like this:

static <T,U,V,X extends Throwable>
(T)->(U throws X)->V curry((T,U throws X)->V function) {
   return (T t)->(U u)->function.(t,u);
}

I've intentionally selected an example that uses the new features heavily, so either may take a few moments of studying to understand. But I claim the latter is much easier to follow than the former, even once you've become familiar with the syntax.

You can easily write a function that yields an array as its result

()->int[] arrayLambda = ()->new int[]{1, 2, 3};

With this proposed syntax there is no way to write an array of functions. But that is exactly what we concluded could not be made to work within the type system anyway.

20 comments:

Brian Slesinsky said...

I think Java programmers will have a hard time recognizing the lambdas in either syntax. It doesn't look like an anonymous function unless there are curly braces around the body.

Something like this would at least be somewhat familiar to Javascript programmers:

return #(T t) {
return #(U u) {
return function(t, u);
}
}

Anonymous said...

What is that "arrayOfFunction[0].(); " maybe a function call?

"arrayOfFunction[0].(); " Is too ugly is better "arrayOfFunction[0]();" this is more function oriented less object oriented.

Anonymous said...

Please be careful. You need to remember that many people who don't know what currying is will read this post. They will now go away with the impression that adding closures is all about introducing scary new stuff they don't understand into the language for no reason except to do other stuff they don't understand and therefore have no reason to do.

How about a nice hello world example, say a call to sort() or something like that.

Neal Gafter said...

@Brian: functions are not methods. It is beneficial for the syntax to be readable but distinct from methods.

@Anonymous1: The syntax for invoking a function must be different from the syntax for invoking a method because methods and variables are in distinct namespaces in Java. Explaining that would probably be an article of its own.

@Anonymous2: This is not intended to be an introduction to closures or to motivate the feature. It is intended to explain one aspect of the evolution of the specification.

Ricky Clarkson said...

I'm unsure why there are so many parentheses in there. I'd prefer to read and write it as:

static <T, U, V, X extends Throwable> T -> U throws X -> V curry((T, U throws X) -> V function) {
  return t -> u -> function.(t, u);
}

I slipped type inference in there too, in the method body, but other than that, are the parentheses needed?

Unknown said...

I'm still all for currying and the design and run-time optimisation potential thereof.

Rgds

Damon

Michael Bien said...

It would be better to use realistic identifiers (e.g FooException) instead of one letter identifiers.

Since as soon as the code doesn't fit into one single line the PJD (plain java developer) would try to decompose it into multiple functions.

maybe the readability problem wouldn't be there since nobody would write such compact declarations in java in the first place.

new T(){void r(){out.println("Hello");}}.s();
is a bit unfair ;)

Neal Gafter said...

@Ricky - It certainly would be possible to add additional syntax forms for 1-argument function types (no parens), and for lambdas with one argument where the argument type can be inferred. Whem there is an exception thrown, I think the parens add clarity. I've presented only the basic syntax that covers all situations. It is unlikely that there is enough time before Java SE 7 is feature complete (~90 days now) for type inference to be specified and implemented, but this syntax keeps the door open to doing that later.

@Michael - Type parameters, such as X in my example, are traditionally a single letter, not CamelCase. I'm not sure how you'd break up a currying method into multiple methods.

Brian Slesinsky said...

Closures aren't actually new to programmers who have written scripts in JavaScript, Python, or Ruby. JavaScript functions look much like Java methods. In Python, methods and functions are very similar as well.

Inventing an entirely foreign syntax for a familiar concept does nobody any favors. It would be much better to borrow from mainstream scripting languages, so that people can learn by analogy. Languages are easier to learn when they're similar to something you already know; this is why Java borrowed heavily from C, after all.

Neal Gafter said...

@Brian: The proposed syntax here is familiar to programmers from Groovy, C#, Scala, Haskell, etc.

Anonymous said...

Most programmers with whom I've discussed closures - especially more junior developers - really get confused when the result type is to the right of argument types. In every other place in Java, most notably method signatures, the result is to the left of argument types. I strongly favor the existing proposal that keeps this convention; it makes it much, much easier for new developers to pick up.

Peter Hull said...

Is it OK to have arrays of functions provided that functions can't throw checked exceptions? i.e there would be a compile error at #(){ throw new IOException(); }; , namely unreported exception java.io.IOException; must be caught or declared to be thrown .

Neal Gafter said...

@Peter: No, the problems arise in a number of different ways, even without checked exceptions.

Gili said...

I fail to see how it's too late to reify generics but it's not too late to add closures to Java.

I'd much rather see existing features fixed (Generics) before adding new toys in the form of Closures. Just my 2 cents.

Neal Gafter said...

@Gili: Reifying generics would require dozens of man-years of work, and would be an incompatible change that breaks many applications. Closures is a much smaller change and will break nothing.

John Rose said...

Even beyond the canonical case of curried functions, the postfix option (as you propose) is known to work well in languages which are rich in function types.

The postfix syntax is apparently being added to C++0x. This is more evidence that prefix syntax for function types (return type first) is painful to work with, even when programmers have had a generation to learn it.

But, this syntax doesn't need to be tied to the non-existence of function-arrays. We could take a page from C and use parentheses to distinguish array-of-function from function-returning-array. It's ugly but simple and direct, and everybody knows it from C.

(int)->int[] // fn returning ar (common case)
((int)->int)[] // array of fn (ugly case)

rocket-gwt said...

Wouldnt it be great if "function" was added to the java grammar in such a way that it did not clash with vars, methods etc. If that was done one could remove the need to use some symbol for any new construct sun wish to add to java. I like the fact source that uses "extends" rather than ":" etc. Words IMHO look much more readable than a string if symbols separated by (optional) whitespace. I believe overloading existing java keywords and giving them new meanings is just wrong and amounts to operator overloading.

Each token or symbol in source should try and be unique with a single meaning. I appreciate this cannot be retrofitted but what about keeping to this mantr for future developments?

Andrei Navodnik said...

@Neal:

Just a short note about different proposals for syntax of closures before everything is chiselled in stone.

In my opinion, the proposed (official?) syntax for Lambda Expressions (closures) is not very good. At first sight, it's really a maze of parentheses. Your improvement is much better, although the modified syntax still can not compare with the BGGA 0.5. Of all syntax proposals for Lambda expressions (closures) I like the BGGA 0.5 the most.

My first complaint about existing draft specification and your improvement is the following. If lambda expressions (closures) are *defined as block of code*, then why can't they also *look like a block of code*. For example, I'd like to see that the function types and the closures would be defined with braces, and this is one of the reasons why I like BGGA 0.5 syntax for closures the most.

Another remark is about the function types. You've said without detailed explanation that you favour placing result type after the argument types. Generally speaking, what is the reason that result type couldn't be defined before argument types? Why can't function types look like functions (methods), so that that we could have result type defined before argument types?

I'm curious, if previous problems may be solved with the following idea. When I've studied your proposals for closures it occurred to me that in closures we might use another symbol to separate formal parameter list from other statements or expression. What is the reason that there has to be the same symbol in function type and in closure? For example '=>' in your BGGA 0.5 proposal and '->' in your improvement of existing draft specification? Is maybe the reason to make Java closures look like closures in other languages? Wouldn't it be possible to design function types and closures separately? My proposal would be that the symbol colon is used in closures instead. With this modification, the closures would be in a way similar to enhancement for-loop or assert statement, and, by the way, you've also used colon in control invocation syntax. I'm suggesting to reuse and recycle exiting symbols in Java, instead of adopting the new ones. For example, symbol # can stay inside Javadoc but I wouldn't like to see it inside the Java, and this is another reason why I don't like official draft specification and BGGA 0.6. That's my taste. You may respectfully disagree.

Maybe my ideas would be more clearer if I present a grammar so let me finish with my improvement of official draft specification of function types and lambda expressions.

FunctionType:
  { ResultType <- TypeListopt Throwsopt }

Expression:
  LambdaExpression

LambdaExpression:
  { FormalParameterListopt : Expression }

And the function curry would be written as

static <R,T,U,X extends Throwable>
{{R <- U throws X} <- T} curry({R <- T, U throws X} function) {
  return {T t : {U u :
    function.(t,u)
  }};
}

Maybe I digress. Could the same idea - using colon in closure definition and proposed function type syntax - be also applied to the BGGA 0.5? One advantage would be that this modification would enable single syntax for restricted and unrestricted closures. Another advantage, as I can see it, is that this modification would enable assignment of restricted closure to unrestricted function type. Last but not least, there would be also less typing. Would that be possible? Of course, in that case, you would have to invent syntax for unrestricted function types, maybe using symbol '<--'?

Please, accept my arguing as purely aesthetic point of view. I have to add that I'm not a programming language designer, so my ideas might face with insurmountable technical hurdles, I'm not aware of. If there such hurdles exist I'd like to know about them. If there are none then I'm just curious what other developers might think about. But in the end, let me say, that I'll accept any syntax *you* will put a seal on.

Andrei Navodnik said...

A few examples to clarify syntax in previous post (I couldn't include them before, because I have already reached the limit of the post):

{ String <- boolean } toString = { boolean b : b == true ? "true" : "false" };
{ int <- int } negate = { int x : -x };
{ int <- } answer = { : 42};
{ void <- } helloWorld = { : System.out.println("Hello World"); };
{ int <- String, String } comparator = { String s1, String s2 : s1.length - s2.length };
{ void <- Exception throws Exception } rethrow = {Exception e : throw e; };

Hypothetically speaking, would it be possible to modify BGGA 0.5 specification so that above syntax, ideas would be accepted? I assume, of course, that the above syntax is valid.

Neal Gafter said...

Andrei: you have a strange definition of "short".

I don't find your proposal particularly compelling. I find reading "curry" with your syntax about as confusing as using the draft spec syntax, and less readable than the syntax with the result on the right. "Looking like methods" is a not an advantage because there are significant differences between methods and lambdas.

Lambda expressions do not necessarily surround a "block" of code. This proposal illustrates that *expressions* are a primary use case. If I were to modify BGGA along the lines of the proposal I describe here, I'd also add "block expressions" from http://www.javac.info/closures-v06b.html