Sunday, October 22, 2006

Closures for Java (v0.3)

This post discusses a draft proposal for adding support for closures to the Java programming language for the Dolphin (JDK 7) release. It was carefully designed to interoperate with the current idiom of one-method interfaces. The latest version of the proposal and a prototype can be found at http://www.javac.info/.

We've just completed a major revision and simplification of the Closures for Java specification. Rather than post the specification on this blog, it is on its own web page, here. There are two versions of the specification: one with function types and one without. There is a pulldown menu at the top of the specification that makes it display only the text relevant to the version of the specification you want to see. Keeping the specification in this form will allow us to more easily maintain the two parallel specifications so we can compare them and delay a decision on this issue until later. These specifications are the starting point for our prototype.

There are two significant changes in this revision. First, there is a completely new syntax for closures and function types. Using the new syntax, with functon types, you can write

{int,int => int} plus = {int x, int y => x+y};

As you can see, we're proposing to add the new "arrow" token => to the syntax. Just to be perfectly clear, this code declares a variable of function type

{int,int => int}

which is a function that takes two ints and yields an int. The variable is named "plus" and it is assigned the closure value

{int x, int y => x+y}

which is a closure that receives two ints (named x and y) and yields their sum.

The second major change has to do with the treatment of "restricted" closures. We've done away with the "synchronized" parameters from the previous revision of the specification. Instead, you can inherit from a marker interface to restrict the closure conversion. If you don't use the marker interface, then closures are not restricted when converted to that type.

Another important change is to the meaning of a function type. It is now defined to be a system-provided interface type, and it is provided in a way that gives the required subtype relations among function types. That means that in order to invoke a value of function type, instead of simply placing arguments in parens after the function value, you use its "invoke" method. This significantly simplifies the name lookup rules for variables of function type. In fact, now there are no special rules at all.

As always, your feedback and ideas are welcome.

20 comments:

Stepan said...

It it joke? :)

Previous version (0.2) was much better at my opinion. It seemed almost perfect.

Proposed syntax with => token looks strange, because it is not similar with current Java constructs. Everywhere in Java, arguments are in round braces (in method and constructor calls), and it's possible to keep this for closures. Also, I think it's much nicer to keep arguments ouside curly brackets. Also, your post about Tennent's Correspondence Principle for closures was great, and new specification throws this out. Please, revert syntax to 0.2.

Also I think that function types are not usable, because it's easy to define interface in each place where you need function type. Also function types make code less readable. Interfaces makes code self-documenting. Compare

{ T => U } (if we have new syntax)

with

interface Transformer<T, U> {
U transform(U t);
}

It's absolutely clear, what second type intended for, and it's not possible to find, what first does.

Hope, that function type will be rejected in the final specification.

Stepan said...

Also, example with Closeable in specification has error. It should be:

<T extends java.io.Closeable, throws E>
void closeAtEnd(OneArgBlock<? super T,E> block, T t) throws E {
try {
block.invoke();
t.close(); // this line is missing
} finally {
try { t.close(); } catch (IOException ex) {}
}
}

Note missing line. close() causes OutputStream to be flushed, and if flush() fails (because of network error, for example), user of original code won't see any error, but data won't be written on disk.

Examples should be error-free, because many people learn to code reading examples.

Richard Birenheide said...

Stepan,

in my opinion, the intrinsic error is to catch the exception silently in the finally block. At least log it, but preferable, if closeAtEnd is assumed to be an API function it should throw IOException and leave the treatment to the client.

Neal,

generally I would agree with Stepan, that the syntax would be bit hard to get used to. I'd rather prefer round brackets for the arguments.

Russel Winder said...

I suspect the:

{ int , int => int } plus = { int x , int y => x + y }

will irritate a lot of people due to having to replicate the type names. If the types are specified prior to the closure name, then why replicate in the closure body, just have:

{ int , int => int } plus = { x , y => x + y }

Of course this is getting quite close to the Groovy syntax:

plus = { x , y -> x + y }

or if you really want static types then:

plus = { int x , int y -> x + y }

francis said...

Thanks for the great job !!

Calum MacLean said...

To Stepan re closure syntax:

I actually wasn't too keen on the previous syntax for closures, using parentheses for the arguments: (int x){ x+2 }. The reason I didn't like it was that I found it difficult to visually parse, as it seemed to be two code fragments stuck together rather than a single unit. So in the middle of a big chunk of code, it wasn't always easy to pick out.

With the new syntax, the closure is all contained with braces: { int x => x + 2 } , which for me is easier to visually parse as a single unit, as it's always delimited by the surrounding braces.

Yes - it is different from current Java, but I think the syntax is fairly simple and understandable, and might be familiar from other languages. And for a new construct such as this, there's bound to be some tradeoff between ease of use and consistency with current syntax.

Calum MacLean said...

Re functional types.
I'm not convinced yet one way or the other (presumably the spec writers aren't either...).

However, I don't think it's as bad as Stepan suggests. In an actual usage of a transformer object, you might have:

public void addTransformer({ T => U } transformer) { ...

So it's clear here what the usage is.

Personally, I think that functional types can be useful to define a quick function inline. Currently, it can be a bit of a pain to have to define a new interface every time you want to do this - it seems rather wasteful and arbitrary. You can then potentially end up with lots of interfaces defined in different places which have the same parameter types and return types; but they've all got to be defined separately, and they're all incompatible.

The downside can be similar to what Russel said.
Also, if you end up passing this function object through your system quite a bit, then it's a) verbose to declare it and b) arguably Stepan's argument applies more here.

Question for Neal: would it be possible to define a class which implements a function type?
For example, if you want a reusable transformer class, can you "implement" the function type { T => U } ?

As I said at the start, I'm a bit undecided about all this.

Neal Gafter said...

Yes, you can "implement" a function type. That's what the spec means when it says they are interface types. Perhaps an example would be in order:

class Add implements {int,int=>int} {
public int invoke(int x, int y) {
return x+y;
}
}

andrewmu said...

I'd be strongly in favour of the functional specification. It looks more concise (e.g. in the closeAtEnd example) and, I think, would very much improve the expressivenesss of the language.
I am involved in a large enterprise system and we use Java in a functional style for map and filter operations.

Anonymous said...

the new syntax is look like "equal or greater than". How about change it to ->

Xavi Miró said...

The new syntax of function types is much clearer than the previous one. This way one can quickly see that it is a block, with its arguments and what it returns...in my opinion a good choice, although I still prefer the use of interfaces only, because I think it fits better with the current Java language. On the other hand, the function types help to have a shorter notation, so I wouldn't decide myself on one choice or the other (whether using function types or not).

I must admit it's still hard for me to understand all the proposal, specially the use of the Unreachable type (maybe the formal mathematical way of writing the proposal is the reason, I'm not used to the compiler language notation).

And also the last example with those two closeAtEnd invocations separated by a space and followed by the block of code is a bit unnatural for me. For this usage of closures I would prefer the proposal of automatic resource cleanup by Josh Bloch.

I see the effort in improving the proposal and I think it's becoming clearer and better in general.

Jochen "blackdrag" Theodorou said...

nice.. looks like the proposal is becoming usable more and more. I was a bit surprised about the step to use "=>", but as a Groovy user and one of the people who said "yes" to the "->" syntax I am of course lucky to see that here.

As already mentioned you should think of going completely to "->" because "=>" looks much like "equal or bigger". Of course that is ">=" in Java since neither the types nor the "=>" is optional until now there is no real problem with that. But think of it. If you want to remove the types from the parameter list, then you might run into issues with ">=". Sadly the "=>" can't be removed completly in general, because it collides with the array initialization syntax.

Anyway, I am looking forward to the day being able to this:
{int=>int} x = {long a => (int) a*2}

Anonymous said...

agree,
'=>' looks like a typo of '>=' consider:
{int x, int y => x >= y}

Eirik said...

This is one among the class of problems that have been solved before. The question is how to transfer best parts of previous solutions to the java language without bringing along any serious flaws.

I guess what you want is A) to be able to send a "method parameter" to a method, and B) to have a way to declare the local code fragment that you want to send to the other method.

Simula-67 solved B) by allowing declaration of methods inside methods, just like classes can be defined anywhere: Any block-level named construct allows the declaration of any construct, limiting the visibility of the declaration to the enclosing block. A method descared inside a method is just as local as the variables.

Part A) was solved by allowing method parameters to methods: On the parameter list, something similar to an interface method declaration occurred. This introduced the name of the method-parameter in the scope of the defining method.

When calling a method that takes a method (closure) parameter, the name of the method was put on the arguments list, without any parametheses or similar. The callee can then call the passed closure as if it was a local method.

The beatuy of this solution is that there are no function pointers: No references to the closure can be passed out of the thread (or call-stack). Since the scope of the caller _must_ be on the stack when the closure executes, access to local non-final variables in the closure (inner method) is safe.

I suggest you look into this solution, having its elegance in simplicity. It will solve the common-case-closures problem in a way that does not introduce any "strange" syntax: it will only allow existing syntax in two new places. Passing closures out of the stack must still be solve as today, by creating objects, since these are the only things that you can keep a reference to.

Neal Gafter said...

Eirik: we have looked at this solution. Unfortunately, a solution along those lines fails to address many of the use cases for closures. For example, the approach can't be used for control abstraction because, among other reasons, the lexical binding of the "return" statement is not captured from the enclosing context.

Ralf Ullrich said...

I think v0.3 might have problems with Generics. Consider this interface:

interface Filter<E>{
boolean accept(E elem);
}

Which would be used like this:

public static boolean <E> addFiltered(
Collection<? extends E> src,
Collection<? super E> dst,
ElementFilter<? super E> filter) {
boolean rv = false;
for(E elem : src) {
if (filter.accept(elem)) {
rv |= dst.add(elem);
}
}
return rv;
}

Now if you want to express this with functional types as of v0.3 you would have to write something like this:

public static boolean <E> addFiltered(
Collection<? extends E> src,
Collection<? super E> dst,
(? super E => boolean) filter)

which is similar to a method declaration like:

void foo(? super E arg) {...}

and that is considered invalid syntax, because its effects usually are captured by the rules to select the correct method to be invoked. However with functional types you need to be able to express this wildcard, for the same reasons you need to do it with interfaces, as in the example at the beginning. The same would apply to extends-wildcards, but those can be worked around through type variables.

I hope it's clear what I meant, english is not my native language.

cu

Neal Gafter said...

ralf: the spec already addresses this. a function type is covariant in its return type and contravariant in its argument types, so there is no need to use wildcards. In fact, a function type is an abbreviation for an interface type in which wildcards are used for the type parameters expressing the reference-typed arguments and returns.

Axel said...

I understand you do not want to get into scheme-style continuations. However, I don't like the name of the "RestrictedClosure" tag - even though the possibilities of a "RestrictedClosure" closure are "restricted", the receiver's usage of the closure actually becomes unrestricted.

At first I hoped for the "RestrictedClosure" tag to be related to stack discipline and escape analysis. As I realize now it's not directly related to optimization or implementation - even an untagged closure may be invoked out of scope. So each non-local break or return will need to be able to detect whether its original lexical scope still exists, unless the compiler is able to optimize this away.

I suppose non-local exits would have to be implemented using Exceptions? After all, a "break" statement might pop off several stack frames. Java code executing in these frames would still expect try/finally blocks to be executed. This might make break and friends too expensive to be used frequently in practice.

Have you thought about typing rules that enforce stack life times? (like VAR parameters in Pascal, or "yield" style iterators) Break and Return are stack-oriented anyway, and pretending that they are not would lead to continuations, which you are not willing to introduce; so the consistent thing to do might be special syntax and/or typing rules for blocks that can only be invoked or passed on to other block parameters. Certainly sufficient for loops.

axel said...

PS... typo: In all of the "closeAtEnd" examples, it should be "block.invoke(t);" instead of "block.invoke();".

Doesn't affect the point, but it's remarkable how easily such a typo is missed. Maybe "invoke" is a bit too clunky after all, syntax-wise?

Berin Loritsch said...

Umm, what's wrong with the established syntax for declaring closures in other languages:

something.each({|item, withThis|
item.doSomething(withThis);});

If you really want to specify types, then why not just include the type within the Pipe delimiters. It is easy to visually parse, and it has been around since SmallTalk days.

I really hate this double declaration thing. It's obtuse and it is remeniscent of C++ "Closures" which are not closures. They are objects which override the operator() method.

Unless Java seeks to be more SmallTalk oriented with its implementation of closures, the cure is worse than the disease. Why do we need to declare the "prototype" of closures to begin with?