Thursday, September 07, 2006

Closures for Java (version 0.1)

This post describes a draft proposal for adding support for closures to the Java programming language for the Dolphin (JDK 7) release. It was carefully designed to interoperate with the current idiom of one-method interfaces. The latest version of the proposal and a prototype can be found at http://www.javac.info/.

This revision of the proposal drops local functions and the special nonlocal return syntax, adds support for generic type parameters that can be used to represent thrown exceptions, and has only one form of closure literal. It also adds an abbreviated invocation statement that supports very convenient control abstraction. Soon I'll present a parallel proposal based on interfaces (that is, without function types) and contrast the two proposals.

Gilad Bracha, Neal Gafter, James Gosling, Peter von der Ahé

Modern programming languages provide a mixture of primitives for composing programs. C#, Javascript, Ruby, Scala, and Smalltalk (to name just a few) have direct language support for function types and inline function-valued expression, called closures. A proposal for closures is working its way through the C++ standards committees as well. Function types provide a natural way to express some kinds of abstraction that are currently quite awkward to express in Java. For programming in the small, closures allow one to abstract an algorithm over a piece of code; that is, they allow one to more easily extract the common parts of two almost-identical pieces of code. For programming in the large, closures support APIs that express an algorithm abstracted over some computational aspect of the algorithm. We propose to add function types and closures to Java. We anticipate that the additional expressiveness of the language will simplify the use of existing APIs and enable new kinds of APIs that are currently too awkward to express using the best current idiom: interfaces and anonymous classes.

1. Function Types

We introduce a new syntactic form:

Type
Type ( TypeListopt ) FunctionThrowsopt
FunctionThrows
throws ThrowsTypeList
ThrowsTypeList
Type
ThrowsTypeList
ThrowsTypeList VBAR Type
VBAR
|

These syntactic forms designate function types. A function type is a kind of reference type. A function type consists of a return type, a list of argument types, and a set of thrown exception types.

Note: the existing syntax for the throws clause in a method declaration uses a comma to separate elements of the ThrowsTypeList. For backward compatibility we continue to allow commas to separate these elements in methods and constructors, but in function types we require the use of the '|' (vertical-bar) character as a separator to resolve a true ambiguity that would arise when a function type is used in a type list.

2. Namespaces and name lookup

The Java programming language currently maintains a strict separation between expression names and method names. These two separate namespaces allow a programmer to use the same name for a variable and for methods. Variables of closure type necessarily blur the distinction between these two namespaces, because they can be used in an expression and they can be invoked.

When searching a scope for a method name, if no methods exist with the given name then variables of the given name that are of function type are considered candidates. If more than one exists (for example, function-typed variable declarations are inherited from separate supertypes), the reference is considered ambiguous.

We allow a function to be invoked from an arbitrary (for example, parenthesized) expression:

Primary
Primary Arguments

3. Closure Literals

We also introduce a syntactic form for constructing a value of function type:

Expression3
Closure
Closure
FormalParameters { BlockStatementsopt Expressionopt }

Example

We can write a function that adds two to its argument like this:

    int(int) plus2 = (int x){ x+2 };

4. The type of a closure

The type of a closure is inferred from its form as follows:

The argument types of a closure are the types of the declared arguments.

If the body of the closure ends with an expression, the return type of the closure is the type of that expression; otherwise if the closure can complete normally, the return type is void; otherwise the return type is null.

The set of thrown types of a closure are those checked exception types thrown by the body.

Example

The following illustrates a closure being assigned to a variable with precisely the type of the closure.

void(int) throws InterruptedException sleeper = (int t) { Thread.sleep(t); };

5. Subtyping

A function type T is a subtype of function type U iff all of the following hold:

  • Either
    • The return type of T is either the same as the return type of U or
    • Both return types are reference types and the return type of T is a subtype of the return type of U, or
    • the return type of U is void.
  • T and U have the same number of declared arguments.
  • For each corresponding argument position in the argument list of T and U, either both argument types are the same or both are reference types and the type of the argument to U is a subtype of the corresponding argument to T.
  • Every exception type in the throws of T is a subtype of some exception type in the throws of U.

6. Exception checking

The invocation of a function throws every checked exception type in the function's type.

7. Reflection

A function type inherits all the non-private methods from Object. The following methods are added to java.lang.Class to support function types:

  public final class java.lang.Class ... {
public boolean isFunction();
public java.lang.reflect.FunctionType functionType();
public Object invokeFunction(Object function, Object ... args)
throws IllegalArgumentException | InvocationTargetException;
}
public interface java.lang.reflect.FunctionType {
public Type returnType();
public Type[] argumentTypes();
public Type[] throwsTypes();
}

Note: unlike java.lang.reflect.Method.invoke, Class.invokeFunction cannot throw IllegalAccessException, because there is no access control to enforce; the function value designates a closure, which does not have access modifiers. Access to function values is controlled at compile-time by their scope, and at runtime by controlling the function value.

8. The type of null

We add support for null and the type of null in function types. We introduce a meaning for the keyword null as a type name; it designates the type of the expression null. A class literal for the type of null is null.class. These are necessary to allow reflection, type inference, and closure literals to work for functions that do not return normally. We also add the non-instantiable class java.lang.Null as a placeholder, and its static member field TYPE as a synonym for null.class.

9. Referencing names from the enclosing scope

Names that are in scope where a closure is defined may be referenced within the closure. We do not propose a rule that requires referenced variables be final, as is currently required for anonymous class instances.

Note: a local variable that is referenced within a closure is in scope in the body of the closure. Consequently the lifetime of objects referenced by such variables may extend beyond the time that the block containing the local declaration completes.

10. Non-local control flow

One purpose for closures is to allow a programmer to refactor common code into a shared utility, with the difference between the use sites being abstracted into a closure. The code to be abstracted sometimes contains a break, continue, or return statement. This need not be an obstacle to the transformation. A break or continue statement appearing within a closure may transfer to any matching enclosing statement provided the target of the transfer is in the same innermost ClassBody.

A return statement always returns from the nearest enclosing method or constructor.

If a break statement is executed that would transfer control out of a statement that is no longer executing, or is executing in another thread, the VM throws a new unchecked exception, UnmatchedNonlocalTransfer. (I suspect we can come up with a better name). Similarly, an UnmatchedNonlocalTransfer is thrown when a continue statement attempts to complete a loop iteration that is not executing in the current thread. Finally, an UnmatchedNonlocalTransfer is thrown when a NamedReturnStatement attempts to return from a function or method invocation that is not pending in the current thread. Likewise, an UnmatchedNonlocalTransfer is thrown when a return statement is executed if the method invocation to which the return statement would transfer control is not on the stack of the current thread.

11. Closure conversion

We propose the following closure conversion, to be applied only in those contexts where boxing currently occurs:

There is a closure conversion from every closure of type T to every interface type that has a single method with signature U such that T is a subtype of the function type corresponding to U.

It is a compile-time error if the closure being converted contains a break, continue or return statement whose execution could result in a transfer of control outside the closure. It is a compile-time error if the closure being converted refers to a non-final local variable or parameter declared in an enclosing scope.

We will want to generalize this rule slightly to allow the conversion when boxing or unboxing of the return type is required, e.g. to allow assigning a closure that returns int to an interface whose method returns Integer or vice versa.

Note: the closure conversion is applied only to closures (i.e. function literals), not to arbitrary expressions of function type. This enables javac to allocate only one object, rather than both a closure and an anonymous class instance. The conversion avoids hidden overhead at runtime. As a practical matter, javac will automatically generate code equivalent to our original client, creating an anonymous class instance in which the body of the lone method corresponds to the body of the closure.

Closure conversion applies only to closures that obey the same restrictions that apply to local and anonymous classes. The motivation for this is to help catch inadvertent use of non-local control flow in situations where it would be inappropriate. Examples would be when the closure is passed to another thread, or stored in a data structure to be invoked at a later time when the method invocation in which the closure originated no longer exists.

Examples

We can use the existing Executor framework to run a closure in the background:

  void sayHello(java.util.concurrent.Executor ex) {
ex.execute((){ System.out.println("hello"); });
}

Here is a definition of an API that locks and then unlocks a java.util.concurrent.Lock, before and after a user-provided block of code:

public static <E extends Exception>
void withLock(Lock lock, void() throws E block) throws E {
try {
lock.lock();
block();
} finally {
lock.unlock();
}
}

This can be used as follows:

withLock(lock, (){
System.out.println("hello");
});

You might find this syntax a bit tedious. Not to worry - see below.

12. Abbreviated invocation syntax.

A new invocation statement syntax is added to make closures usable for control abstraction:

AbbreviatedInvocationStatement
Primary ( ExpressionList ) Block
AbbreviatedInvocationStatement
Primary ( Formals ) Block
AbbreviatedInvocationStatement
Primary ( Formals : ExpressionList ) Block

This syntax is a shorthand for the following statement:

Primary ( ExpressionList, ( Formals ) Block );

Note that this syntax is only a convenience to make some kinds of closure-receiving APIs more convenient to use to compose statements.

We could use the shorthand to write our previous example this way

withLock(lock) {
System.out.println("hello");
}

This is not an expression form for a very good reason: it looks like a statement, and we expect it to be used most commonly as a statement for the purpose of writing APIs that abstract patterns of control. If it were an expression form, an invocation like this would require a trailing semicolon after the close curly brace. Forgetting the semicolon would probably be a common source of error. The convenience of this abbreviated syntax is evident for both synchronous (e.g. withLock) and asynchronous (e.g. Executor.execute) use cases.

13. Definite assignment

The body of a closure may not assign to a final variable declared outside the closure.

A closure expression does not affect the DA/DU status of any variables it names.

14. Exception type parameters

To support exception transparency, we add a new type of generic formal type argument to receive a set of thrown types. [This deserves a more formal treatment] What follows is an example of how the proposal would be used to write an exception-transparent version of the withLock() example we've been using. The syntax is quite tentative at this point (ideas are welcome).

public static <throws E extends Exception>
void withLock(Lock lock, void() throws E block) throws E {
try {
lock.lock();
block();
} finally {
lock.unlock();
}
}

This method uses a newly introduced form of generic type parameter. The type parameter E is declared to be used in throws clauses. If the extends clause is omitted, a type parameter declared with the throws keyword is considered to extend Exception (instead of Object, which is the default for other type parameters). This type parameter accepts multiple exception types. For example, if you invoke it with a block that can throw IOException or NumberFormatException the type parameter E would be IOException|NumberFormatException. In those rare cases that you want to use explicit type parameters, the keyword throws is required in the invocation, like this:

Locks.<throws IOException|NumberFormatException>withLock(lock) {
whatever();
}

You can think of this kind of type parameter accepting disjunction, "or" types. That is to allow it to match the exception signature of a closure that throws any number of different checked exceptions. If the block throws no exceptions, the type parameter would be the type null.

Type parameters declared this way can be used only in a throws clause or as a type argument to a generic method, interface, or class whose type argument was declared this way too.

15. Implementing a Function Type

A programmer may write a class that implements a function type. The implementation is written as a public method with the keyword do in place of the method's identifier. For example

class MyBlock implements void() {
public void do() {
// implementation of the function
}
}

In this way programmers can, for example, create implementations of function types that are serializable.

Acknowledgments

The authors would like to thank the following people whose discussions and insight have helped us craft, refine, and improve this proposal:

Lars Bak, Cedric Beust, Joshua Bloch, Martin Buchholz, Danny Coward, Erik Ernst, Christian Plesner Hansen, Doug Lea, "crazy" Bob Lee, Martin Odersky, Tim Peierls, John Rose, Ken Russell, Mads Torgersen, Jan Vitek, and Dave Yost.

12 comments:

Jochen "blackdrag" Theodorou said...

Looks like you made great progress. But I still have questions about NLRs.

You wrote: "A return statement always returns from the nearest enclosing method or constructor."

Enclosing method is what? The method calling the closure, the method the closure is defined in, the method that the closures is called with? What if the closure is used in other closures?

Maybe an example

void foo(void() closure) {
closure();
}

void bar(void() closure) {
foo(closure);
}

void doCall(){
bar ((){return;});
}

so "return" does return from foo, bar or doCall? I guess it is doCall. At last I would expect this.

Then break/continue... I guess the normal loops will be especially instrumented, yes? But how? Again an example:

void inner(void() closure){
closure();
}

void foo(List list2, void() closure) {
for (el:list2) {
inner(closure);
System.out.println(".");
}
}

void bar(List list1, List list2, void() closure) {
for (el : list1) {
foo(el,closure);
}
}

void doCall(List list1, List list2){
for (int i=0;i<10; i++) {
bar (list1,list2,(){continue;});
}
}

Which of these loops will be "continued"? In my eyes it can only be the loop in doCall. Of course you can also define that the most inner loop is affected, but I really don't like that idea. That is too much of a side effect I think. Of course this would also mean, that I can't use break/continue with a custom loop method, because there is no visible loop. And of course... what if the loop is implemented using recursive calls? For example writing a depthsearch algorithm working on trees. Will the implementor have to catch some kind of Excpetion to react to the continue? And how will he know his loop is ment?

The idea of storing the closure in a field or list or something and then using them in the loop leads me to closure conversion:

"It is a compile-time error if the closure being converted contains a break, continue or return statement whose execution could result in a transfer of control outside the closure. It is a compile-time error if the closure being converted refers to a non-final local variable or parameter declared in an enclosing scope."

So I can use break/continue and store the closures in lists and fields or whatever, but I can't use them when an interface is expected? And I can't use non-final variables too? This rule looks a bit strange for me. sure, you want to avoid multithreading issues and such, but don't you get the same issues with the function types? Can't the user go around the limitation by simply encapsulating the closure in another closures? I mean something like:


void sayHello(Executor ex) {
String message = "hello";
final ()void closure = {System.out.println(message);}
ex.execute((){ closure()});
message = "unsynchronized hello?"
}


Or do you want to disallow this somehow? I could extend the example with a closure given as parameter, a closure returned after put into the Executor. Well, I am sure you will find a better solution than doing a runtime check.

And one more question regarding Exceptions... this throw E mechanism is enough to throw an arbitrary, not predefined set of Exceptions? Would be very nice. But I guess it is not possible to for exmaple store an closure in a list of closures where all the closures do or do not throw different exceptions... yes? Of course that limits the usage again.

So, I see much improvement here. I still se problems with NLRs and Exceptions, but maybe this is to misunderstandings.

Anyway, this is a 0.1 draft, not a final. I will follow the evolution here.

PS: again sorry, but the pre tag is not allowed in comments

Roman Elizarov said...

Neal, you said that the helper method approach doesn't address the asynchronous use cases, where you don't want to allow nonlocal control-flow. I still think that non-local return problem shall be somehow addressed via type system.

All ways of normal and abrupt completion of a block of code (including return type, thrown exceptions, breaks, and continues) shall be expressed in its type. For example, it can be done via magical exceptions, so that the following closure:

() { if (condition) break loop; if (other_cond) throw new IOException; -1}

will have a type of “int() throws break loop | IOException”. With this approach helper methods do address asynchronous use case. Consider an asynchronous use-case of Executor.execute(Runnable). Now, you write the following helper method for execute:

void execute(Executor ex, void() block) {
ex.execute(new Runnable() {
void run() {
block();
}
});
}

And use it with execute(ex) { do_something; }. See that you cannot use any non-local control-flow in do_something, because it will add “checked exceptions” into block’s signature, but checked exceptions are not supported by your helper method. On the other hand, helper method for SwingUtilites.invokeAndWait can declare its support for generic checked exceptions:

<E> void invokeAndWait(void() block throws E) throws E {
final ResultHolder<E> res = new ResultHolder<E>();
SwingUtilities.invokeAndWait(new Runnable() {
void run() {
try {
block();
} catch (E e) {
res.setResult(e);
}
}
});
if (res.getResult() != null)
throw res.getResult();
}

Lo and behold! You are now able to use invokeAndWait as any other control construct with non-local control flow, non-final variable access and all the other goodies.

Rémi Forax said...

Hi, neal,
two questions :
- do you have read my answer to
function that doesn't returns normally on my blog
- is there a reason to use implements instead of extends when implementing a function type ?

Rémi

Tom Hawtin said...

Abbreviated invocation syntax:

Rather than the quasi-control flow syntax, how about a syntax which acknowledges the differences and uses the methods parentheses?

lock(lock:
System.out.println("hiya");
);
lock(lock: System.out.println("hiya"););

Perhaps even:

seed.weakUpdate((int oldseed):
(oldseed * multiplier + addend) & mask
);


Definite assignment:

Seems a little odd that this code would be illegal (although it is odd code):

boolean executed;
void() fn = (){
executed = true;
...
System.out.println(executed);
}; // Semicolon?
...
executed = false;
...
System.out.println(executed);


Exception type parameters:

Wouldn't it be better to extend generics generally with |, rather than making it a special case? Certainly the new throws bit can be inferred.

Neal Gafter said...

Tom: allowing "|" generally in type arguments introduces syntactic ambiguities: one can tell that "ID<ID> ID" is a declaration and not an expression, but "ID<ID|ID> ID" can be either a declaration or an expression. Furthermore, adding general disjunctive types to the type system creates some unsolved problems for generic type inference, which is one reason they weren't there before. It was our intent to keep it very simple.

Neal Gafter said...

blackdrag: "enclosing method" means exactly that: enclosing the appearance of the return in the text of the program. Similarly, "break" or "continue" break out of the switch/for/do-while that statically encloses its appearance; in a recursive program it is the same instance of that statement in which the closures expression was evaluated. In short, the behavior precisely matches the intuitively natural expectation; you don't need to think in terms of its implementation to understand the behavior.

Yes, the limitations imposed by the closure conversion can be explicitly worked around by writing wrapping code. If you are doing the synchronization yourself, for example, then you won't mind having multiple threads sharing data. Just as today, you can shoot yourself in the foot by sharing data across threads without proper synchronization; the constraints on the closure conversion just makes it harder to do by accident.

If you're storing closures in a list, you will have to select a common supertype of them to declare as the list element type. The subtyping rules for function types allow you to take the disjunction of the thrown exception types: that is, an element of the list can throw any exception type that might be thrown by one of them. That is the natural type you would want the list elements to have anyway.

Avner said...

Maybe the distinction between local and non-local return will be clearer if it is made explicit in the API itslef. Use a new keyword (for example, inline) to designate that control statements in the input code block(s) are non local.
For example: when you call void execute(void () func) then return statements are local (like in void exeute(Runnable obj), but when you call inline void withLock(Lock lock,void () func) then return statements are non-local .
Of course the code of withLock is not really inlined, but semantically it behaves as if it was.
I think people would intuitively expect void execute(void () func) to behave in the same way as void exeute(Runnable obj) and will find non-local returns surprising. Making this behavior explicit with a keyword like inline can help to make this new behavior clear. Also on the provider side, it will be obvious for someone writing an asynchronous invocation not to mark it as inline and so NonlocalTransfer exceptions will be less likely.

Jochen "blackdrag" Theodorou said...

Neal, form my experience with Groovy I can tell, that people want to make things like this (pseudo syntax):

list.each ((foo){
if (foo>6) continue
else println foo
}

which means print all elements of the list except the element with the value 6. If I understand you right, then the above is invalid. We have a loop-method, the "each", but we don't have a loop the continue can control.

If we look at this:

for (foo:list) {
if (foo>6) continue
else println foo
}

we see the constructs are not equal. As I said, I very well understand this way to solve it, I just thought you all would find a better way, than I myself.

If break/continue isn't allowed without a classic loop construct, then I will have to write code like this to go around the problem:

do {
list.each (()(element){
if (element==null) break
}
} while (false)

That will allow me to stop the processing int the each method... interesting way of catching methods ;) Ah, yes, no way to enable continue for the each method with this. My solution for groovy was to allow a break to influence the loop method. So the code would work in groovy without the surplus loop. Of course it doesn't allow continue then, unless a classic enclosing loop is given.

Well, I will follow your progress

Anonymous said...

I'm glad to see clarification about object lifecycles and scope, in the note under point 9.

Some questions:

1. Will this "new type of generic formal type argument" be available only for closures or for interfaces (anonymous classes, etc.), too?

2. When you say "that throws any number of different checked exceptions" do you mean zero or more exceptions or one or more exceptions? I mean, is the "null" type redundant? Necessary?

Thanks,
Antonio

Neal Gafter said...

Antonio: the "new type of generic formal type argument" will be available everywhere type arguments are allowed.

Any number of checked exceptions means zero or more. null means no checked exceptions.

Roman Elizarov said...

The ugliest thing about this proposal is the following piece of code:

void() fun = () {};

int method1() {
  fun = () { return 1; };
}

void method2() {
  fun();
}

This code will compile just fine according to proposal, but invoking method1() and then method2() will cause UnmatchedNonlocalTransfer to be thrown. This code should not compile in the first place. Assignment in method1 should not be allowed at compile-time. Closures with non-local control flow should only be allowed as arguments to exception-transparent methods to guarantee that non-local control flow does not escape its intended scope. All intended use-cases of closures will be covered in this case without UnmatchedNonlocalTransfer exception ever happening or being needed at all.

lf said...

I'm looking forward to this -- Java is getting less painful year by year :-)

I noticed the new syntax makes it possible to emulate "tuple varargs":

MapBuilder.with("foo", 42) ("bar", 27) ("baz", 123).build()

Yay!