Monday, December 03, 2007

Restricted Closures

Note: this discusses a feature of the Closures specification that was published back in February, but which is likely to change in an upcoming revision.

The Closures for Java specification, version 0.5, contains a special marker interface java.lang.RestrictedFunction. When a closure is converted to an interface that extends RestrictedFunction, this prevents the closure from doing certain operations. Specifically, it prevents accessing mutated local variables from an enclosing scope, or using a break, continue, or return to a target outside the closure. The idea is that APIs that are intended to be used in a concurrent setting would want to receive restricted rather than unrestricted closures to prevent programmers from shooting themselves in the foot.

Two weeks ago Mark Mahieu contacted me regarding his experience with the closures version of the fork-join framework. Because I had ported that API before I had implemented any of the operations that would be restricted, and before RestrictedFunction itself, I had simply not provided any restrictions at all. Mark was wondering how to do it:

I hadn't looked at the jsr166y javadoc before you linked to it on your blog, so I had the chance to compare the two versions on equal terms, and I can honestly say that I found the closures version of the API to be much more approachable at first blush. I also suspect that the majority of the Java programmers I work with would feel the same way, once comfortable with function type syntax.

One thing I did wonder was whether a method like ParallelArray.combine() could be declared as:

public <U,V,C extends {T,U=>V} & RestrictedFunction> ParallelArray<V> combine(ParallelArray<U> other, C combiner) { ... }

but my reading of the specification suggests that the type C won't be a valid target for closure conversion. Maybe I'm being greedy, but in certain cases (jsr166y being a good example) I'd ideally want both the clarity provided by using function types in place of a multitude of interfaces, and the compile-time checking afforded by RestrictedFunction. Having said that, I think the additional type parameter above negates those gains in clarity somewhat, even if it were an option.

I responded, describing what I had been planning to do in the next minor update of the spec:

I expect to make that work. However, I hope it won't be necessary. I expect to support function types like

{T,U=>V}&RestrictedFunction

directly. For example

public <U,V> ParallelArray<V> combine(ParallelArray<U> other, {T,U=>V}&RestrictedFunction combiner) { ... }

You will be allowed to intersect a function type with non-generic marker interfaces such as RestrictedFunction, Serializable, etc. Unfortunately, I will have to rev the spec to support this.

Since that time I've been discussing this issue with a number of people. Some, who believe that the concurrent use cases are primary, or who believe that "Mort" programmers will blithely copy-and-paste code from anonymous inner classes (which have different semantics) into closures, suggest that the default is backwards: closures and function types should be restricted unless specific action is taken to make them otherwise. Reversing the sense of the marker interface doesn't work (it violates subtype substitutability), but there may be other ways to accomplish it. On the other hand, there are others who believe the synchronous use cases, such as control APIs, are primary (even when used in a concurrent setting), and prefer not to see the language cluttered with support for the restictions at all. Instead, they would prefer that any such restrictions take the form of warnings (which the programmer might suppress or ask javac to escalate to errors). I have sympathy for both camps.

Another possibility would be to produce a warning whenever you use a nonlocal transfer at all and do away with RestrictedFunction. The way to suppress the warning would be with a @SuppressWarning("nonlocal-transfer") annotation. Could we make it an error instead of a warning? This may make the interface easier to read, but it doesn't give the API designer any way to express a preference. It may make control APIs painful to use.

Finally, it would be possible to use a different syntax for restricted and unrestricted function types and closures. For example, one using the => token would be restricted, not allowing nonlocal transfers. One using a different token such as ==> or #> would be unrestricted, allowing nonlocal transfers. The idea is that if you want an unrestricted closure, you'd have to use the slightly more awkward syntax, and the receiving type must also be of the unrestricted variety. The control invocation syntax would be defined in terms of the unrestricted form. This enables API designers to express a preference for whether or not clients would be allowed to write unrestricted closures (and therefore, whether or not they would be allowed to use the control invocation syntax).

This can be made to work using only concepts already in the spec. The unrestricted form of a function type would be defined as an interface type as in the current spec. The restricted form would be the same but with RestrictedFunction mixed in. With this approach there is no need for the explicit "&" conjunction-type syntax for function types.

27 comments:

Rémi Forax said...

In the same time, currently in Java if you want to express a type that implement two interfaces you need to create a type variable and sometimes rely on capture which is far from ideal.

Suppose that i have two interfaces, I and J that respectively contains a method i() and j() and two classes A and B, each one implements the two interfaces. What is the type of XXX in that code ?

boolean test = ...
XXX value = (test)?new A():new B();
value.i();
value.j();

Rémi

vbb said...

Why are you using a special marker interface and not an annotation ?

Neal Gafter said...

@vbb: annotations are not an appropriate means to modify the language semantics.

Hamlet D'Arcy said...

Regarding how to declare a restricted vs. an unrestricted function type...

Using ==> or #> for an unrestricted function seems to be more an abbreviation than an abstraction. Since the symbols don't reveal much intention, I would favor the longer RestrictedFunction declaration because it says more to the reader.

I'm all for terser/shorter syntax as long as it is provided through an abstraction of concepts rather than an abbreviation.

Thanks for the post!

Howard L said...

If you required enclosed variables to be qualified with the enclosing name when there was ambiguity, like inner classes do, and required non-local returns, breaks, continues to be qualified then you wouldn't need restricted closures at all.

Neal Gafter said...

Howard: we can't require qualification of local variables from enclosing scopes (and the current inner class syntax doesn't either), because the block scopes in which those variables are defined don't have names.

Tom said...

One other thing potentially missing is to change not just the type signature at the method definition, but also the place where the closure instance is defined. That is, in C# you have ref and out parameters, but the args also need marked:

void loadText(out String text) {...}

This causes an error if you don't say "out" when you call it:

loadText(out myString);

This gets around the crazy case from C++ where ref parameters can wreak silent havoc. Maybe not so vital here for restricted closures (since it's just a matter of compile time errors anyway?), but something like that would allow for much clarity when copying and pasting or quickly converting old anonymous class code.

So, for instance, if you used the "==>" syntax at the parameter, you'd also have to use it when define calling the method (or assigning the var or whatever). Maybe this wouldn't quite do as exact recommendation. Just a general idea to keep in mind.

Rémi Forax said...

neal wrote:
> @vbb: annotations are not an appropriate means to modify the language semantics.

I agree if you talk about 1.5 annotations, but why not using annotation on type (JSR308) ?
Even if in my opinion reading your posts you doesn't really like this JSR, i think it worths to consider it.

Rémi

Bruce Chapman said...

1) If the syntax favours synchronous cases or asynchronous cases over the other for a closure then that is wrong, they are equal but different.

On the other hand, favouring one case over the other for function type syntax is less evil, because they'll be written less often.

Certainly if something is to be favoured it is the closure over the function type because that is what Mort will write mostly.

2) You can pass a RestrictedFunction|RestrictedClosure (whatever the name is) to a method that does not REQUIRE a restricted one. But not vice-versa. So there is a subtyping relationship here, which is exposed quite nicely with the {=>}&RestrictedClosure syntax.

You can pass a {=>}&RestrictedClosure or a {=>} to a method which expects a {=>}, but you must pass a {=>}&RestrictedClosure to a method which expects a {=>}&RestrictedClosure.

For function types, the intersection type using '&' feels right.

3)The compiler could infer which type a closure was, but then when it tries to resolve a method, it won't find one (or its arguments will not match) when you pass an unrestricted closure to a method that expects a restricted one.

Probably the compiler error will say it can't find a method with those arguments, or it will say you can't call (.. signature with {+>}&RestrictedClosure) with arguments ... {->} which is all a bit tough for Mort to work out what he did wrong. Whereas if the surface syntax uses different tokens for the two types, then the compiler will tell Mort that he cannot access a non final field (or return, break, continue) within a restrictedClosure, and can show him the actual location of the actual error. Of course getting a useful compiler error message here relies on Mort using the correct closure token, Hopefully its obvious to him, otherwise he's back at square one.

Helping the compiler tell you accurately what you got wrong is a useful consideration when deciding whether to have one or two surface syntaxes.

4) It might be worth considering whether there are ANY useful use cases for an unrestricted closure to be defined or passed in a situation other than with control invocation syntax. It might be that the only times you'd want to use return, break, continue, or modify a variable from the enclosing scope, is when you'd also want to use control invocation syntax. It might be OK to say the only way to define an unrestricted closure is with Control invocation syntax. I'm not convinced either way on that. For Further study.

If that turns out to be true, then a possible solution is
* use &RestrictedClosure with a function type.
* all closures literals are restricted
* control invocation statements are a shorthand for unrestricted closure.

Advantages:
* Compiler can give Mort a sensible error message.
* Only one closure literal token (this helps Mort too).
* Uses borrowed syntax (from generics) in function types to be explicit about the subtyping relationship.
* Might satisfy both the "concurrent use cases are primary", and the "synchronous use cases are primary" perspectives in Neal's post.

5) re: @SuppressWarning("nonlocal-transfer")

YUCK!! - see comment 1.

Ivan Memruk said...

+1 for the annotation, interfaces are for defining method contracts and using them as "markers" is an obsolete workaround now that annotations exist. it's a matter of time to replace Serializable with @Serializable as well..

Eugene Vigdorchik said...

Seems like having the language construct for distinguishing restricted and unrestricted closures is too heavyweight. Instead why not have a compiler warning flag like -warnunrestrictedclosures the same way -warnunchecked works now? The default for this flag could be inverted however.

And from IDE side there could always be editor markings for nonlocal flow transfer and mutable state change.

Tom said...

Bruce, concepts like map, reduce, filter, and friends would be super common, synchronous use cases that wouldn't want to use the control invocation syntax. Maybe make these rules:

1) Function types on vars indicate unrestricted by default, for convenience.
2) Named interfaces are restricted by default (for backwards compatibility).
3) You can say "&RestrictedFunction" or "&UnrestrictedFunction" if you want to override the defaults.
4) Closure instance expressions. Hmm. Argh. Either say, "Who cares?", or it really would be nice to throw in a new context-sensitive keyword like "async" at the beginning instead of all this bother. Yowsers. Saying "&RestrictedFunction" at the end of a closure expression sounds ugly.

Tom said...

Thought a bit more, and distinguishing at the closure expression is overkill. Any errors would be compile-time errors anyway.

So just default named interfaces to restricted use and function types to unrestricted, allowing "&RestrictedFunction" if needed.

Václav Slováček said...

Hello Neal, I was just wondering. Is there a way how can someone (for example me) submit a proposal how to solve closures in Java? Thank you.

Mark Mahieu said...

@bruce chapman: excellent point about compiler error messages. The two-token approach does seem to open the door for javac to use dedicated error messages referring to 'restricted'/'unrestricted' and/or the tokens themselves, which could well be clearer than a general 'incompatible types' or whatever. Not that it couldn't do so with the alternative options, but it feels more appropriate once you introduce dedicated syntax.

In fact (hijacking one of Peter Ahé's suggestions), perhaps we could eventually see error messages like this:

Test.java:3: attempt to assign a closure literal of unrestricted type {int,int #> int} to a variable with restricted type {int,int => int}
    {int, int => int} fn = {int a, int b #> a + b};
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I don't follow your point about Mort having to use the correct token in order to get a useful error though.

marius said...

I love the currying proposed by Mark Mahieu. Makes a lot of sense.

Bruce Chapman said...

Tom said...

Bruce, concepts like map, reduce, filter, and friends would be super common, synchronous use cases that wouldn't want to use the control invocation syntax.


But although those are all synchronous uses, they are probably NOT the sorts of places where you'd want to perform any of the restricted operations, so the compiler would treat it as a restricted closure and pass it to the method which accepts unrestricted closures. No problem.

A problem only arises if you want to perform some unrestricted operation (such as no local transfer or refers to a non-final local variable declared in an enclosing scope) in the closure. In the case of map, reduce, filter etc, I am finding it hard to think of a reason why you'd want to do any of those operations except read the value of a non-final local variable from enclosing scope. In which case you'd either make it final, or copy it to a final variable. Doing that is possibly simpler than having two different syntaxes.

However I think there is value in distinguishing the two kinds of closure expressions so that the compiler can give you the right error message when you mess up. Java syntax generally leans toward making the developer intent explicit. I think the two kinds of closure should be explicit in some way.

Neal Gafter said...

Václav Slováček: If you have ideas for improving our closures spec and prototype, you're welcome to email me at neal AT gafter DOT com. If you have completely different ideas for how to approach the problem, I suggest you write a draft spec and/or implement a prototype.

Jean-Baptiste said...

Neal,

It really seems like people involve in the decision on java evolution are really living in an other dimension than me.

I worked with Java for several years, building business applications in different domains and mainly helping developers to solve problems with it.

I am sure you already know it but java is not only used by geeks or extreme coders but also by average developers that try to manage it. Generics and annotations were huge steps in decreasing langage simplicity and increasing confusion. If the complexity continues to increase i am afraid that companies'll decide to not deploy Java 7 or maybe choose a another solution to implement their applications. Currently in the company where i work we are only starting to deploy Java 5.

Another point it that for me java was an object language, aimed to implement all in an object way. It's not so pure as Smalltalk but it's object. Adding functional concepts in it could certainly be interesting and powerfull but it changes the implementation way. And i'm not sure it's easy to mix these two conceptual approach.

Please keep in minds those points.

Jean-Baptiste

5er said...

Regarding restricted closures and function types my humble opinion is that if closures in a form similar to BGGA get enough support to be included in the JDK 7 they should introduce as small a syntax change to the language as possible. Number of different tokens should be minimized. For this reason I think a '=>' vs '==>' (or '#>') is an unnecessary complication.

Instead by complicating compiler a bit the same effect can be achieved by something like that:
- The declaration of variables and method parameters should use the following syntax:
{T,U=>V} & RestrictedFunction combiner = ...;
- In the absence of the "restricted context" the closure instances should be created to implement RestrictedFunction if they don't mutate state from outer context or use non-local returns. If they do those nasty things they should not implement RestrictedFunction.
- In the "restricted context" (by inferring from the type of method parameter or type of variable in assignment) the compiler should try to compile an inferred restricted closure. If it succeeds OK, if it does not (the closure mutates state from outer context or uses non-local returns) the compiler can produce meaningful compile errors (restricted closure can not do this and that... vs. can not apply method parameters or assign to variable).

What do you think?

Neal Gafter said...

5er: that is one of the alternatives, but it makes many (restricted) APIs harder to read.

Cow_woC said...

@Neal,

This is the same argument as operator overloading. Some of us believe that such constructs are best defined by the JCP, not by developers.

That is, I'd rather see Sun add a destructor-like construct for resource management than full-fledged closures.

Neal Gafter said...

@Cow_woC: the difference is that with operator overloading, the programmer changes the meaning of existing language constructs (operators). Control APIs are only invoked as method calls.

Lasse said...

If I understand this correctly, we have unrestricted closures for synchroneous use (control structures) and restricted closures for what we would traditionally use anonymous classes for.

That type of distinction always lights a warning in my head. Are these two really supposed to be modelled by the same concept, or is the distinction a sign of a deeper difference?
I'd say it it is, and they aren't.

Closures is one of the most powerful language features, the most powerfull abstraction. You can implement anything using it. But that also means it can easily become a golden hammer.

If we look at language-defined control structures, e.g., "for" and "while", they define control flow, but abstract over the expressions, statements and declarations that are executed during that control flow.

The most direct implementation of that is using call-by-name semantics to pass unevaluated expressions and unexecuted statements to the control structure, and let it evaluate and execute them when appropriate.

Using call-by-name parameter passing so that the only thing one can do with the parameter is to evaluate it, means that the delayed expressions and statements are only denotable, and neither storable nor expressible, so they can't escape the method call they are passed to. This is exactly what the synchroneous use of closures reflect, but doesn't enforce.

Passing of declarations so that they are visible to both the control structure and the delayed expression and statement arguments, could be handled by a call-by-reference parameter passing semantics. The variable is declared in the scope of the calling method, and is normally available there, but is also passed to the control structure method to modify.

With suitable syntax changes, one could call a method on a Collection<T> taking one declaration of a variable of type T, one boolean expression and one statement as, e.g,:
myColl.where(Foo foo: elem.isBar()) { elem.print(); }

Obviously this will have a lot of the same concerns as closures, since it too needs to evaluate expressions in the syntactic scope where they were created, but we do know that that scope is still alive.

Also, there is no problem distinguishing between local and non-local returns, since closures doing local returns are matched by expressions, whereas those doing non-local returns are matched by statements. Closures can do both, and can act as both statement and expression, but I can't think of any single closure that want to do both.

I'm not saying that this suggestion is better than closures (obviosuly it's more restricted), but I do think it is a closer match for the existing language-defined control structures. It has the features that I see in the closure based implementation of control structures, and only some of the same challenges.

I guess my point is that I think closures are trying to do too much with one feature. That gives a feature that works one way in some cases and another in other cases, instead of two features that does just the right thing.

/LRHN

Neal Gafter said...

@Lasse: We looked at this approach and it appears to be much more complicated of a change, and less expressive as well; doing things as you suggest requires deep changes to the both the language and VM and their type systems (to prevent "blocks" from being stored). I believe it would be worse, not better, to have two different meanings for "return" rather than having it always return from the enclosing method. We need to define the language to flag or prevent likely programmer errors but keep a uniform semantics.

Neal Gafter said...

@lasse: Consider a "concurrent loop" API, which has the loop body executed concurrently in different threads. This is one of the more important use cases for control abstractions - a point I will expand on at more length soon - but it can't be handled by the kind of restricted approach you suggest.

Lasse said...

I can see the point of the concurrent control structure - after thinking about it for a while.
I still don't like the two different types of return, but it might be the lesser evil.
(Still, how does a closure within a closure return a value from the outer one? And if it can't, it's the kind of asymmetry that irks me :)

Concurrent control strucutres are probably also one of the hardest problem types to solve elegantly and generally at the same time. It is both synchronous and concurrent, so it is the creator of the control structure who must ensure that no closure escapes and no thread survives the control structure execution, no matter what happens.

Another thought:
In the current specification, there appears to be no way for a concurrent control structure to allow non-local returns from its concurrent threads. A non-local return (or other non-local control transfer) in a different thread from the one creating the closure, will cause a UnmatchedNonlocalTransfer exception. Even if the control structure implementation catches the exceptions that happen in its threads, it has no way of knowing which non-local control transfer was attempted.

Perhaps it would be useful to have the failure handling add a closure to the exception, one that could repeat the failed non-local transfer attempt (and do nothing else). It would have type {=>Unreachable}. That way the concurrent control structure could handle the exception, clean itself up, and re-execute the non-local transfer in the correct thread.

That is, if a concurrent control structure can use non-local returns. If it can't do that anyway, then it might not need to be a control structure, and we can use call-by-name anyway *evil grin*.

Regards
/L