Monday, March 17, 2008

Closures: Control Abstraction, Method References, Puzzler Solution

Closures: Control Abstraction, Method References, Puzzler Solution

The Java Closures prototype now supports control abstraction and implements restricted closures and function types. The syntax has changed slightly. Also, as hinted in the draft JSR proposal, there is now support for eta abstraction, which is called method reference in Stephen Colebourne's FCM proposal. We haven't updated the specification, so this will serve as a brief tutorial on the changes until we do. I don't know if this will be the syntax we will end up with, but it will do for now. Finally, we look at solutions to the closure puzzler in my previous post.

Control Abstraction

The first thing you'll notice when using the new prototype is that the compiler gives a warning when a closure uses a local variable from an enclosing scope:

Example.java:4: warning: [shared] captured variable i not annotated @Shared
        Runnable r = { => System.out.println(i); };
                                             ^

There are a few ways to make this warning go away:

  • declare the variable final; or
  • annotate the variable @Shared; or
  • make sure the variable is not the target of any assignment expression; or
  • put @SuppressWarnings("shared") on an enclosing method or class; or
  • use an unrestricted closure, by using the ==> token instead of the => token (when possible).

The => token builds a restricted closure that triggers this warning. Restricted closures also do not allow a break or continue statement to a target outside the closure, nor a return statement from the enclosing method. You will rarely want to write an unrestricted closure; many (but not all) of the things you need to do with an unrestricted closure can be expressed more clearly with a control invocation statement instead.

You're not allowed to assign an unrestricted closure to a restricted interface. A number of existing JDK interfaces, such as java.lang.Runnable, have been modified to be restricted.

Error: cannot assign an unrestricted closure to a restricted interface type
Runnable r = { ==> System.out.println(i); };
^

In the less common case that you're writing a method intended to be used as a control API, you can write a function type with the (new) ==> token to designate an unrestricted function (interface) type. Let's do that to write a method, with, that will automatically close a stream for us. The idea is to be able to replace this code

FileInputStream input = new FileInputStream(fileName);
try {
// use input
} finally {
try {
input.close();
} catch (IOException ex) { logger.log(Level.SEVERE, ex.getMessage(), ex);
}
}

with this

with (FileInputStream input : new FileInputStream(fileName)) {
    // use input
}

which is an invocation of the following method

public static void with(FileInputStream t, {FileInputStream==>void} block) {
try {
block.invoke(t);
} finally {
try {
t.close();
} catch (IOException ex) {
logger.log(Level.SEVERE, ex.getMessage(), ex);
}
}
}

This is among the simplest control APIs, but it has some limitations:

Completing the API by repairing these defects is left as an exercise to the reader. A solution will be discussed in my JavaOne talk Closures Cookbook.

Method References

A natural companion to closures is a way to refer to an existing method instead of writing a closure that accepts the same arguments and just invokes the method. This is sometimes known as eta abstraction or method references. We expect closures in their final form to include support for this convenient feature, which is why it is called out in the draft JSR proposal. The latest version of the prototype supports this, with a syntax based on javadoc conventions. Here are a few examples:

{ int => Integer } integerValue = Integer#valueOf(int);
{ Integer => String } integerString = Integer#toString();
{ int, int => int } min = Math#min(int, int);
{ String => void } println = System.out#println(String);
{ => String } three = new Integer(3)#toString();
{ Collection<String> => String } max = Collections#max(Collection<String>);
{ => Collection<String> } makeEmpty = Collections#<String>emptySet();
Runnable printEmptyLine = System.out#println();

Writing code as a method is sometimes more convenient than writing it as a closure:

void doTask() {
    // a complex task to be done in the background
}


Executor ex = ...;
ex.execute(this#doTask());

Puzzler Solution

A couple of weeks ago we looked at a Java puzzler involving closures, and a number of people discussed the underlying issue. My favorite is David's post "Color-flavor locking breaks chiral symmetry". Lessons include not exposing public fields (accessors are better) and being careful to avoid cyclic initialization dependencies.

The enum language feature provides support for one solution to the puzzle: specialize each instance of the enums.

import java.util.*;

enum Color {
    BROWN {
        public Flavor flavor() {
            return Flavor.CHOCOLATE;
        }
    },
    RED {
        public Flavor flavor() {
            return Flavor.STRAWBERRY;
        }
    },
    WHITE {
        public Flavor flavor() {
            return Flavor.VANILLA;
        }
    };
    abstract Flavor flavor();
}

enum Flavor {
    CHOCOLATE {
        public Color color() {
            return Color.BROWN;
        }
    },
    STRAWBERRY {
        public Color color() {
            return Color.RED;
        }
    },
    VANILLA {
        public Color color() {
            return Color.WHITE;
        }
    };
    abstract Color color();

}

class Neapolitan {

    static <T,U> List<U> map(List<T> list, {T=>U} transform) {
        List<U> result = new ArrayList<U>(list.size());
        for (T t : list) {
            result.add(transform.invoke(t));
        }
        return result;
    }

    public static void main(String[] args) {
        List<Color> colors = map(Arrays.asList(Flavor.values()), { Flavor f => f.color() });
        System.out.println(colors.equals(Arrays.asList(Color.values())));

        List<Flavor> flavors = map(Arrays.asList(Color.values()), { Color c => c.flavor() });
        System.out.println(flavors.equals(Arrays.asList(Flavor.values())));
    }
}

Another elegant solution, due to 5er_levart, uses closures:

enum Color {
    BROWN({=>Flavor.CHOCOLATE}),
    RED({=>Flavor.STRAWBERRY}),
    WHITE({=>Flavor.VANILLA});

    private final {=>Flavor} flavor;

    public Flavor flavor() { return flavor.invoke(); }

    Color({=>Flavor} flavor) {
        this.flavor = flavor;
    }
}

enum Flavor {
    CHOCOLATE({=>Color.BROWN}),
    STRAWBERRY({=>Color.RED}),
    VANILLA({=>Color.WHITE});

    private final {=>Color} color;

    public Color color() { return color.invoke(); }

    Flavor({=>Color} color) {
        this.color = color;
    }
}

In both solutions the idea is to compute the value lazily, a key technique to break dependency cycles.

23 comments:

Anonymous said...

Regarding the new, default restricted syntax...

Is this even closures anymore? Why even separate the two constructs? Java is the only language I know that actually has closures which only encapsulate constants from the enclosing scope. Not only will the subtle distinction lead to added confusion for those new to closures, but also serve as a frustrating constraint to those of us who've grown accustomed to closures in other languages like Scala or Ruby.

Aside from that, I'm quite fond of the current closures proposal and the syntax for method reference literals. It's unfortunately quite verbose, but that's to be expected when trying to add something like closures to a language like Java.

Neal Gafter said...

@Daniel: The main reason to separate the two constructs is to help detect errors. Either form of closure can use variables from enclosing scopes without restriction; at worst you get a warning, not an error.

Matthias said...

> The latest version of the prototype supports [method references], with a syntax based on javadoc conventions.

Excellent!

Dhanji R. Prasanna said...

Neal - is it that restricted types are more "likely" to be used in concurrent services (such as schedulers)?

I love the eta abstraction, this will improve the interoperability with existing APIs immensely.

This certainly is a more elegant solution than anonymous instances with CICE, which I don't believe is a closures proposal at all:

http://www.wideplay.com/thoughtsoncice

Dhanji.

Ricky Clarkson said...

I strongly recommend making these warnings into errors. There are no cases, as far as I can see, where it would be impossible to remove the warning (by changing => to ==> or by adding @Shared or final), so it seems strange to allow such easy-to-fix code.

I would rather the language didn't have restricted closures, and have those programmers who have need of visual cues for shared variables use an IDE that fontifies or colours them differently.

I know the arguments against relying on the IDE, but for this I can't think of a good reason for embedding the constraint in the language itself (even less so as a warning).

Unknown said...

Would Java Closures allow some limited currying? Your example sounds like currying to me.

{ => String } three = new Integer(3)#toString();

Alex Miller said...

Great stuff, Neal. Questions:

1) It makes perfect sense that you can't pass an unrestricted closure to a restricted interface. It seems you also cannot do the reverse, which might be useful. For example, I might want to define an API that allows for unrestricted closures but not all closures passed to it need to be. In that case, I'd prefer to allow unrestricted but have restricted work too. Possible? I get a NoSuchMethodError trying it now.

2) On method references, it seems like this#method() throws a compiler error. Any reason that shouldn't work? Seems as reasonable to me as "new Integer(3)#toString()". Also, it seems like #method() by itself should naturally use an implicit this.

Anonymous said...

Where is the best place to ask questions or make suggestions regarding the syntax?

Personally, I find the FCM syntax much easier to understand, for someone who had not encountered closures before. Maybe the current sytax used in the prototype makes sense to someone coming from Ruby or Haskell, but to your average Java developer I think the FCM sytax makes much more logical sense :P Are there maybe any technical problems (e.g. ambiguities) with the FCM sytax that would make it unusable?

Thanks :-)

Anonymous said...

What could be a reason that closures do not write first the return type and then a new keyword like "closure" or "block" and then the arguments in brackets, like methods in Java do. Like this:
{ int closure( String, int) } or
{ int block( String, int) }

I think this would be much more readable to the average programmer and Java is a little verbose, but very readable, anyway. Remember, you need to win the hearts of the masses for closures. No need to save a few keystrokes in Java.

Unknown said...

Neal,

Along with method references are you planning on supporting constructor and field references like FCM? I have found them to be very useful in my experiments with FMC (http://aberrantcode.blogspot.com/).

I am all for making a devision between restricted and unrestricted closures. In fact I think the ==> and => syntax is too subtle. Also if the majority of the time one is going to be using restricted closures and break and continue are not allowed then can the return statement come back? Can it mean "return from the closure" and not the method invoking the closure? This would allow for early returns. It would also remove the "last line no semicolon means return" syntax. Which is not something you see in java today.

If use of unrestricted closures are uncommon then a slightly more complicated syntax for "return from the enclosing method" does not seem too unreasonable.

Collin

JP said...

Neal,

Method references are good valuable addition to our prototype.

Good job Neal.

Thanks
Prashant

Jordan Zimmerman said...

Neal, in the past you've said that you don't care that much about the ultimate syntax (if I remember correctly). However, I'll bet that the prototype will have enough inertia that it will become the syntax.

So, a plea from an architect whose team will struggle with this - please make the syntax more distinct. For example the difference between "=>" and "==>" is too subtle for most people to see.

Robby O'Connor said...

I agree with Jordan -- the syntax needs to be more distinct...

V. Sevel said...

Hi Neal,

1 - Method References: this is a very nice addition. It will definitely be a big part of the success of this proposal.

Note: You did mention this#doTask(). Is that for clarity, or do you forbid #doTask()? I think we should be able to write the later.

2 - Syntax: the notation {int=>void} is hard to read for people used to Java. Traditionally, the return type appears first, then arguments. I believe the learning curve will be easier with some syntax like:
* void(int) instead of {int=>void}
* void() instead of {=>void}
* String(int) instead of {int=>String}
When I read the closure types as they are today, I always have to process it in reverse order to mentally rebuild the java signature I am used to.

Going one step further, people are used to define arguments inside ( ) and code inside { }.
{int time=> System.out.println(“waiting ”+time)}
could be rewritten as
(int time){ System.out.println(“waiting ”+time) }.
One again, I think people will find this easier to read.

3 - Unrestricted closures: The difference between restricted and unrestricted is too subtle for (almost) everybody to grasp. Learning curve is once again why closures are going to succeed or fail. Smalltalk like most languages have unrestricted languages, and I have yet to remember painful experiences with unrestricted closures that should have been restricted. We sure had plenty of bugs (dynamic typing, due to a lack of unit testing in these old times), but almost never due to somebody returning abusively from a block. I believe this is a case where you have to trust a little bit more the developer and testing. Let the developer figure out his mistake, instead of making the closures more complex.
* Clients will be tempted to always use unrestricted closures because it removes the need for final or @Shared. Remember the first time you wrote an inner class and realized that any local/parameter variable should be declared final? What a pain! Hopefully, closures will relax this annoying constraint.
* Library writers will be tempted to accept unrestricted closures to make sure they did not miss a legitimate need from the client. This is the path of least resistance.
I think you should abandon restricted closures all together.

4 – Exception transparency: I think the <throws E> is too hard to grasp. This is like writing generics++. This signature is far too hard to read for anybody on a day to day basis: “static <throws E> void performTwice({ => void throws E } block) throws E”. Especially when you realize that it is not even passing or returning anything. I am probably being too naïve, but could not we have the compiler do the job for us; that is automatically include in the signature any exception raised by the blocks. In other words: add for us the “<throws E>” and “throws E” parts?

I would like to write:
static void performTwice(void() block) {

}

Can the compiler be smart for us?


Other than that, I am hopeful we will see one day closures in Java. This is one of the things I have been missing from day one.

Neal Gafter said...

@v.sevel:

(1) the part before the # is required. Otherwise, within an instance method, you can't distinguish a method reference that binds the current object from one that doesn't.

(2) That was an earlier syntax for closures. I believe I explained why we changed. Learning curve is important, but so is productivity. New language features may increase the learning curve if more than offset by a corresponding increase in productivity.

(3) Perhaps.

(4) No, that doesn't work. Not all methods that receive a function type invoke it before returning.

kirillkh said...

re gafter: I think having method references is essential for completeness.


re v.sevel and gafter:
(3) strongly agree
(4) I have another problem with this. Why do we have to write <throws E>? What is the difference from the usual case of passing generic type into method? I would prefer the syntax to remain the usual <E>, as it is less verbose. Also, what if you want to use E for something else, besides throwing it out of closure/method? E.g. to declare a local variable:
E myvar = ...

Neal Gafter said...

kirillkh: One difference is in type inference: with a throws type variable, the lub() operation (JLS 15.12.2.7) produces a disjunction type rather than an intersection type. That's part of what makes exception transparency work. These types may be used to declare variables without any problem.

V. Sevel said...

Neal,

since you actually read comments ;)

2 - I tried to find the rationale for the syntax in a previous post, but could not find it. Would you have a link? I remember you said you had not made up your mind about a particular syntax, but you had to choose one to begin with. In a gesture of openness, it might be actually valuable to separate the syntax questions from the others (method reference, restricted, transparency, ...) by organizing a poll, and letting the community decide what flavor of syntax people like best.

4 - but the compiler knows what blocks are passed and actually used within a method. so looking at the signature of each block, it should be able to figure out the exceptions the method is implicitly throwing in addition to the ones explictly mentionned by the developer in the signature of the method. I have a hard time figuring out why I have to help the compiler with the "throws E". But this is way above my head, and I can definitely trust you on this.

5 - while I am on the subject, I am not too fond about the shortcut that allows the last block in the argument list to be specified outside of the () like in:

with (FileInputStream input : new FileInputStream(fileName)) {
// use input
}

I find it clearer to treat the block as another true object, passed as an argument like any other. In this case I would rather do:

with (FileInputStream input : new FileInputStream(fileName), {
// use input
})

I believe I can already do the latter. I am arguing about allowing the former. I understand you are trying to improve readability, but it gives a false sense of the block not being a closure, but block of code that follows control structures like for, while, ... This is one of the case where people will have to mentally process that ‘with’ is not a keyword, and that the { } section is not a block, but a closure. Allowing the shortcut is one more mental path, and I vote for one way of doing things, specifically when the one way is not that bad, and remove ambiguities.

6 – loop constructs - Finally, as much as I liked the return from Smalltalk blocks, I have never missed not being able to do a continue or a break from a while:. We would continue by doing a ifTrue: on the body of the block, and we would break, by implementing the body in a different method that we would be able to return from. I do not think at all that the break and the continue are essentials to a closure implementation. Especially if they imply one more use of the 'for' keyword.

I am perfectly happy with writing the forEach example from the spec this way:

forEachEntryDo(map.entrySet(), { Map.Entry<String,Integer> entry =>
String name = entry.getKey();
int value = entry.getValue()

if ("end".equals(name)) return;
if (!name.startsWith("com.sun.")) System.out.println(name + ":" + value);
})

There is now magic here, just plain closure stuff.

And new operations we have been truly missing since the beginning of java are just as easy:

List<String> set = collect(map.entrySet(), { Map.Entry<String,Integer> entry => entry.getKey() + " is named " + entry.getValue(); })

List<Map.Entry<String,Integer>> set = select(map.entrySet(), { Map.Entry<String,Integer> entry => entry.getKey().startsWith("com.sun."); });

Map.Entry<String,Integer> first = detect(map.entrySet(), { Map.Entry<String,Integer> entry => entry.getKey().startsWith("com.sun."); });

int count = inject(0, map.keySet(), { String name, int c => c + name.length(); })

If we get there, I believe the essential will have been done. I do not believe withLock, Runnable or swing listeners are the primary use cases. I believe collecting and selecting on collections is. How many loops are in a program? How many of the others?

thanks for reading.

vincent

Neal Gafter said...

@V. Sevel:

(2) There are a few reasons for the syntax change between the spec in versions 0.2 and 0.3. Top among them was to avoid confusion with methods, which have different scoping behavior for names, "return", etc. The issue of syntax will be taken up in the JSR expert group, supposing one is ever formed. I expect "->" as a replacement for "=>" is a likely outcome. Polls are a notoriously ineffective way to design programming languages.

(4) Using the body (implementation) of a method to define its type simply doesn't work when you start using interfaces, where the implementations are not available to the compiler.

(5) The block *is* a block, not a closure, though it is used as part of a closure passed to a method invocation. Like all method invocations, in order to understand its semantics you will want to refer to the documentation of the method being invoked. Similarly, well-named methods will allow the reader to build an intuition about how they behave so that it will not be necessary to refer to the documentation repeatedly.

(6) Smalltalk does not have break and continue. Java does. Perhaps we would not miss them in a Java-like language that omits them, but in the context of Java it makes sense to recognize them.

Anonymous said...

Any chance of doing some type inference for which method is selected?

e.g. instead of
{ int => Integer } integerValue = Integer#valueOf(int);
{ Integer => String } integerString = Integer#toString();
{ int, int => int } min = Math#min(int, int);
{ String => void } println = System.out#println(String);
{ => String } three = new Integer(3)#toString();
{ Collection<String> => String } max = Collections#max(Collection<String>);
{ => Collection<String> } makeEmpty = Collections#<String>emptySet();
Runnable printEmptyLine = System.out#println();

write instead

{ int => Integer } integerValue = Integer#valueOf;
{ Integer => String } integerString = Integer#toString;
{ int, int => int } min = Math#min;
{ String => void } println = System.out#println;
{ => String } three = new Integer(3)#toString;
{ Collection<String> => String } max = Collections#max;
{ => Collection<String> } makeEmpty = Collections#<String>emptySet;
Runnable printEmptyLine = System.out#println;

What do you think? Is it technically possible (ie. omitting the argument types doesn't cause ambiguities)? Is it desireable?

This is the sort of thing you can do when assigning a method reference to a delegate variable in c#.

Neal Gafter said...

@Ben - may be technically possible, but it would conflict with being able to omit the ".invoke" when using it.

Anonymous said...

Hi Neal,

This proposal is great, I'm especially looking forward method references.

I have a question though. Aberrant already mentionned it but didn't get any reply.

Your proposal is referring to FCM's method references which include:
- static method reference (A1)
- instance method references (A2)
- bound method references (A3)
- constructor references (A4)

This leaves out field literals.

Are you implicitely including them or are you deliberately keeping them out of your proposal ?

If so, could you please give us the reason ?

Kind regards,

Cédric Vidal

UrbanVagabond said...

Method references sound like a great idea but I don't like the syntax. I think it should look like

#Type.method(int)

or even (the horror, the horror)

&Type.method(int)

this makes it clear that what is meant is "reference to the following method: ..."

Also, the following is very strange:

Collection#<List>(int)

shouldn't it be

Collection<List>#(int)

or better

&Collection<List>.(int)

Also, do you allow partial method references? E.g.

Type#method(int, 23)

which means the same as

lambda(int x): Type.method(x, 23)

(sorry, I haven't yet figured out the "Java" way of specifying lambda exprs)