Monday, October 02, 2006

Iterative control abstraction: user-defined loops

A couple of people have written about using Java closures to write methods that act as looping constructs. For example, iterating through a map is currently an ugly exercise in boilerplate:

Droid seekDroid(Map<String,Droid> map) {
    for (Map.Entry<String,Droid> entry : map.entrySet()) {
        String droidName = entry.getKey();
        Droid droid = entry.getValue();
        if (droid.isSought()) {
            System.out.printf("'%s' is the droid we seek.%n", droidName);
            return droid;
        }
    }
}

It has been proposed that we add looping methods to Collection and Map that take advantage of closures, but it would be incompatible to add methods to an existing interface. We can accomplish the same thing using static methods. With one small additional twist to the closures proposal, we could make it even easier to write a method that iterates over maps. Here is what we want our client to look like:

import static java.util.Collections.eachEntry;
...
Droid seekDroid(Map<String,Droid> map) {
    for eachEntry(String droidName, Droid droid : map) {
        if (droid.isSought()) {
            System.out.printf("'%s' is the droid we seek.%n", droidName);
            return droid;
        }
    }
}

To enable this, the following would be added to java.util.Collections:

public interface Block2<K,V,throws E> {
    void invoke(K k, V v) throws E;
}
public static <K,V,throws E>
void for eachEntry(Map<K,V> map, Block2<K,V,E> block) throws E {

    for (Map.Entry<K,V> entry : map.entrySet()) {
        block.invoke(entry.getKey(), entry.getValue());
    }
}

Notice the "for" qualifier on the declaration of eachEntry, and the for keyword on its invocation. Methods defined this way and then invoked using the control abstraction syntax would require the use of the for keyword on the invocation. The compiler arranges such control abstractions to act like loops, by properly interpreting break and continue statements within the body of the closure. Although this example doesn't use break or continue statements, the mere presence of the for keyword on the invocation helps the reader understand the code. In this way user-defined control constructs could be written that appear and act as loops.

27 comments:

Roman Elizarov said...

Now that is cool. By the way, there is some parallel between syntactic sugar for variable arity arguments that were added in Java 5.0 and nominal closure proposal that you are working on. Both have a special treatment for last method argument. However, in a variable arity syntax this is signified by a special treatment at call site (support for variable arity itself) and by a special declaration of the last method argument (ellipsis). Maybe for consistency abbreviated invocation syntax needs similar treatment. That is, it shall be allowed only for methods that were specially retrofitted with their last parameter declared in some special way. Strategically placed keyword "do" might do the trick.

Quintesse said...

for eachEntry(String droidName, Droid droid : map

Can't say I like this much. We've already got 2 different forms of FOR:

1. for (int i; ...; ...)

2. for (Object o : ...)

To add this third form just confuses the situation:

3. for something(Object o1 , Object o2 : ...)

I keep trying to read the something() as a plain method call possibly returning an iterator but the 2nd "parameter" is of course impossible.

I don't know it seems like there is too much "overloading" taking place of "visual structures" that were once simple.

I put the words between quotes because I have no good way of describing them differently.

What I mean is that loops and method calls are structures that are visually easily recognized and distinguished. In this example those structures are mixed and their meaning subtly changed which seems very confusing to me.

But maybe it just something you have to get used to.

Damon Hart-Davis said...

Hi,

I'm really drinking the Standard ML Kool-Aid today...

Why not just create a Collection{T2} = Collections.map(Collection{T1}, fn map {T1 -> T2}) method.

I liked the application of a mapper to a collection in SML. Very simple and potentially automatically embarrassingly parallel...

Rgds

Damon

Jochen "blackdrag" Theodorou said...

I have two things:

1) Am I guessing right, that the loop method must be void? I mean, I guess so, "for" is a statement, no expression. But what is the usage of such looping methods? If I can use only statements, then I won't have really a usage in combination with asynchronous closures. Ever thought about making for/while/do expressions? That would not break compatibility with older versions. But of course that would introduce the problem of break/continue/return in a totally new matter.

So what is the difference between your idea and a normal loop? I can't use map/find because the loop is no expression. In these cases I have to use a surplus local variable instead, even if the result is being processed directly. Using asynchronous closures here is not impossible but kind of useless. I mean you can't have break/continue/return then, and you can't have the nice access to local variables, which is needed because the loop is no expression.

2) Ever thought about removing the "synchronized" requirement from your closure proposal and use "for" instead. Not only I think that "for" is much better to read, you also don't have to use "synchronozed" in such a strange matter.. maybe I would use "do" instead of "for". Anyway, if you really ditch the "synchronized" and use "for" instead I would see this idea as a good progress for the closure proposal. Of course only if the eachEntry method can be used without "for" as normal method call expression in case I don't handle synchronous closures.

Russel Winder said...

Isn't the whole point of using closures that you invert the iteration construct and so make code more declarative -- which leads to less error. So in Groovy you can have:

def seekDroid ( map ) {
def droid = map.find { droid ->
droid.value.isSought ( )
}
if ( droid != null ) {
printf ( "'%s' is the droid we seek.%n", droid.value.name ) ;
}
droid
}

I know you said you want to avoid adding methods to Map but perhaps it is this decision that is leading to difficulties?

Of course, you could leave Java as is and use Groovy for closure-based programming :-)

Roman Elizarov said...

Comments are a bad place to have flame ware, but I’ll still make a comment on comments posted here. IMHO, it is a very bad to turn Java into functional language by adding full-blown closures with map, reduce, and all other functional constructs. Unfortunately, that is what so many people are trying to do. However, Java is so good and popular as a language because it is an imperative language. Let keep it this way, but let us extend this language by adding our own imperative constructs, like our own loops, etc.

Tom Hawtin said...

That is some ugly syntax. ;)

What's with the apparent requirement for a separate static method and subject? No chance to pass around or decorate. Why not go for an Iterable type approach in keeping with the enhanced for loop. So we can use it as:

import static java.util.Collections.entriesOf;
...
Droid seekDroid(Map<String,Driod> map) {
for (String droidName, Droid droid : entriesOf(map)) {
if (droid.isSought()) {
System.out.printf("'%s' is the droid we seek.%n", droidName);
return droid;
}
}
}

Much nicer, no?


The library implementation below is longer, but that's mostly down to supporting serialisation.

package java.lang;
public interface Iteration<B, throws EXC> {
void iterate(B block) throws EXC;
}
package java.util;
public class Collections {
...
public interface EntryConsumer<K,V,throws EXC> {
void consume(K key, V value) throws EXC;
}
public static <K,V,throws EXC>
Iteration<EntryConsumer<K,V,EXC>,EXC> entriesOf(Map<K,V> map) {
return new MapEntriesOf<K,V,EXC>(map.entrySet();
}
private static class MapEntriesOf<K,V,throws EXC>
implements
Iterationable<EntryConsumer<K,V,EXC>,EXC>,
Serializable
{
private static final serialVersionID = 1234567890123456789L;
private final Set<Map.Entry<K,V>> entrySet;
private MapEntriesOf(Set<Map.Entry<K,V>> entrySet) {
this.entrySet = entrySet;
}
public void iterate(EntryConsumer<K,V,EXC> consumer) throws EXC {
for (Map.Entry<K,V> entry : entrySet) {
consumer.consume(entry.getKey(), entry.getValue());
}
}
}
}

FWIW, I think the enhanced for loop syntax is the wrong way around. The declaration gets 'executed' every iteration, but the iterable expression does not. It should have looked something like:

List<String> strings; ...
for (strings : final String str) { ...}

Bit late now.

BTW: I can't see how in the suggested syntax the compiler knows what to do with continue. Not supporting continue is my favoured option. My first attempt at the above had Iteration returning an Iterator-like object that could be controlled by code generated within the calling method (like classic JSP tags). However, as anyone who has written class JSP tags will know, that causes even more boilerplate.

Josh Bloch said...

Like many of the examples of control abstraction closures, I don't find this compelling. Droids probably know their own names. So just iterate over the map values:

Droid seekDroid(Map<String,Droid> map) {
    for (Droid droid : map.values()) {
        if (droid.isSought()) {
            System.out.printf("'%s' is the droid we seek.%n", droid.name());
            return droid;
        }
    }
}

In fact, don't pass in the Map to the function, just pass in the Collection:

Droid seekDroid(Collection>Droid< droids) {
    for (Droid droid : droids) {
        if (droid.isSought()) {
            System.out.printf("'%s' is the droid we seek.%n", droid.name());
            return droid;
        }
    }
}

It is much less common to iterate over a Map's entries than its keys or values. And when you need to do it, it isn't that bad. Calling it "an ugly exercise in boilerplate" is unfair to the Collections API.

Incidentally, for those who haven't seen it yet, here is a much simpler closure proposal.

Don't code like my brother.

Neal Gafter said...

Like many of Josh's comments on closures, rather than address the requirement of a particular use case he imagines an alternative use case in which a solution isn't necessary. In my example, different maps may have different names for the same droid; a name is not an inherent property of the droid. While there are certainly alternative use cases that don't benefit from this solution, as Josh rightly points out, that doesn't negate the importance of addressing these use cases.

Neal Gafter said...

tom: regarding "I can't see how in the suggested syntax the compiler knows what to do with continue"

The block closure passed to a "for" method invocation is treated as follows: a "continue" appearing in it brances to the bottom of the closure body. A "break" branches out of the invocation just past the invocation statement. In this way breaks and continues appearing in the body act as they would in a loop.

Ed Burnette said...

Ewwww. Josh's looks much nicer. While less powerful it's more in the spirit of Java and lets programs be more concise. While you're at it, see if you can keep me from having to type (or read):

SomeLongDeclarationHere x = new SomeLongDeclarationHere(...);

This is especially bad with generic map types. Type inference anyone? Please? I'd like my assignments to go back to being on one line.

If you want to do something, fix the warts, don't make new ones.

Josh Bloch said...

Perhaps I didn't make myself clear. I don't think this use case is realistic. Moreover, I don't think the language is improved by allowing programmers to define their own loop constructs. The purpose of the for-each construct was to allow programmers to define their own "loopable" types, which drastically reduces the need for defining custom loop constructs. So the benefit in allowing custom loop constructs is small, but the costs are not.

One of Java's historical strengths has been its transparency: What you see is pretty much what you get. As a consequence, any Java programmer can easily read and maintain code written by any other. We take this property for granted, but there are many languages that do not have it. Allowing programmers to define their own looping constructs encourages the formation of dialects, reducing transparency. And it adds to the complexity of a language that historically prided itself on simplicity.

That is not to say that I am against providing such facilities in general. On the contrary, I think they are a great boon to many existing languages, such as Lisp and Smalltalk. I would seriously consider providing them in a new language being designed from scratch. But I do not think they belong in Java at this point in its development.

francis said...

After defining our own for-xxxx statement, will customizable if-xxxx statament come to earth from sun???

Neal Gafter said...

francis: I realize you're being sarcastic, but the basic closures proposal already allows you to do that kind of thing. The only reason something special is worth doing for loops is that they introduce a new binding for the break and continue statements.

Neal Gafter said...

francis: One more thing. I don't work for Sun.

Mark Mahieu said...

I can certainly think of a recent use case which I think mirrors your example to some extent - one in which a parent entity maintained a map of child entities for all defined primary keys (a finite set), but where there was not necessarily an explicit child entity for each possible key and so there was some rule whereby the parent would use a default child entity if one did not exist for a particular key, potentially resulting in multiple primary keys mapping to the same value.

The implementation was such that all possible key/value pairs were actually held in the Map (the rule was applied at the time the values were stored rather than retrieved), and both the parent's idea of what the key was, and the value's other attributes, could be used when queried. Perhaps most pertinently, the entities could be queried by some property other than the key, such as a the value's 'date' attribute being within a particular range (which is far more difficult to use as a key).

So there's code that queries by the Map's key, there's code that queries by the values' properties, and there's code that queries by both, and they all look different.

In retrospect I'm sure it could have been implemented in a better way, but in any case the 80/20 rule applies here: I use the collections API constantly, and 80% (or more) of that usage is trivial and easy to understand. The code that fits into the 'other 20%' is much like the paragraphs above which describe my use case; overly verbose, probably not sufficiently clear, and ultimately far more likely to be misunderstood and to contain inaccuracies.

Finding ways to cut through the unnecessary verbosity and to unify solutions to similar problems must be the goal (ie. querying the data - not dealing with the fact that it's in a java.util.Map) and like many people I do see something like myCollectionOrMap.find(a closure) and myCollectionOrMap.each(a closure) as being a potentially simpler way of dealing with the more complicated (more bug-ridden) use cases.

Where your idea actually fails for me though is its lack of API discoverability; to improve upon the collections API in a way that the majority of Java programmers will routinely take advantage of, it needs to be discoverable by an IDE so that it may be presented to them when they hit that '.' after the identifier (or something similar). Sad fact is that too many of the candidates I've interviewed for a Java programmer role have a 'deep understanding of the collections API' which begins and ends at ArrayList (or worse, Vector) - and too few of the decent ones have the time or opportunity to fully explore what's there; most of the code that ends up in production is written under pressure, and as IDEs have become more 'intelligent' they've narrowed the scope of commonly used documentation from the package-level to the type-level.

In my experience many programmers don't even use many of of the existing static methods on java.util.Collections when they should, because they either didn't know about them, or 'forgot'. Me included I expect.

So even if everyone loved the syntax, and the idea, I think there's a bigger problem to solve before an idea like this would actually make enough of a difference to be worthwhile. Personally I'm undecided about the syntax, and I have no backwards-compatible solutions to the discoverability problem I think it suffers, but I like the idea.

Reinier Zwitserloot said...

Is all this -really- neccessary? You can get 95% of all this (extensive!) functionality with fancy notation for anonymous classes and coding 'java-style' instead of looking at java as if it's some sort of domb stepchild of LISP.

Take a look at CICE for example (http://crazybob.org/2006/10/java-closure-spectrum.html).

I like that -way- better. Java6 is moving toward more flexibility on the JVM front with eg invokeVirtual. I'd strongly prefer java, the language, keeps its current vibe, but JVM, the platform, gains abilities like this by way of supporting entirely different languages.

A lot of enterprisey folks have already gotten a shocker with java5 (which only recently entered the J2EE scene, really). Shaking up things with yet another MAJOR re-structuring of the JLS itself, with, in fact, -far- further reaching effects than the java5 changes ever had... I dunno. Actually, I -do- know. I'm against it.

Jochen "blackdrag" Theodorou said...

Neal, I hope this is no offending question.. but what do you think about C# 3.0 lambda functions? I mean they are readable, they are easy and they won't really require new keywords. Of course something like Func can not exist in Java, because an interface that I can use like Func<A> and Func<A,B> is not doable with Java generics. Or am I wrong here?

Another thing is method pointers. Maybe you want to read this small article from Sundararajan

They could be added to Java very easily, no new keyword required, typing can be done at compile time.

But this leads back to the single interface for all types of closures (and method pointers). At the beginning I had the hope that adding closures to Java would help to prevent the interface explosion. But now I don't think this will happen, because when we do that basing on interfaces, then we don't get a general interface for all types of closures.

The names are always implying a functionality and I think that you wouldn't consider two interfaces as equal if one is named Runnable and the other is named DiskFormatingTaks. Even if they have both run methods of the same kind.

Of course that means we forget about break/continue/return, (a)synchronous closures and such. The thing gets even better with currying..

But as I said, the central problem is the inavailability of the generic Func interfaces. And the way covariance is hanlded in generics doesn't make the life more easy

Neal Gafter said...

Jochen: I think this is the most important basic question on this line of designing closures: do you add function types at the same time? As you observe, there are good reasons to do so. There are other reasons that you didn't touch upon, and I'll talk more about those later.

We've played with the syntax a lot, and it isn't clear we've settled on the best one yet.

There is no reason to abandon break/continue/return with lambdas. They are still necessary in a language in which these control statements are scoped, as they are in Java. Otherwise you're only closing over part of the lexical environment.

Jochen "blackdrag" Theodorou said...

I am surely nobody of the "don't do it, because it is too difficult" fraction ;)

I think adding function pointers or something that does look like a function automatically sets the semantics for break/continue/return. return = return from the function/closure, not from the method like you suggested. break/continue only valid if in the function is a loop and break/continue are inside this loop.

Only when seeing closures as blocks it makes sense to give break/continue/return a different meaning. But if you really want that, then you shouldn't restrict the access to lokal variables and don't make a difference between the synchronous or asynchronous case when accessing local variables.

But normal blocks don't have parameters! And closures without parameters are kind of half done only. When defining a closure like (String str, int i){str+i}, then it looks for me like a method, where I removed the name, the modifier and the return type.

I know Smalltalk, Ruby and of course Groovy as examples of languages with block closures. And all these languages are defining the parameters inside the block. I really think that is because of good reasons. They are not really functions, they are blocks you can give parameters - so the semantic is slightly different. The way Java allows to initialize arrays makes it a bit difficult here. Groovy removed the array initialization syntax in favor of the block based closures and uses the list syntax for arrays instead. A compiler with a big lookahead can surely solve that without problems, but I guess you don't want that in Java.

But even if all these problems are solved, for example by directly using the {|(parameterlist|statementlist} syntax from Ruby, we still face that fundamental problem of interface explosion.

A simple convert function would help here much, but overloading the cast operator is a no go. It violates the rule, that a cast doesn't change the reference. But any function to make that cast is too much.

Neal Gafter said...

Josh wrote: "I don't think this use case is realistic."

Perhaps Josh would find this use case more realistic if he knew that of the 21 for-each loops appearing in his recent joint programming project with me and Lexi, 4 of the loops (19%) were of this form.

Josh Bloch said...

I'm an 80-20 kind of guy: I target the 80%, not the 20, err, 19%:)

Seriously, I'm not saying no one ever iterates over the keys and values in a Map, just that it's far less common than iteration over iterables.

Tom Palmer said...

Hmm. After thinking some more, I'll go back to anti-CICE and back to supporting the for loop construct. For example, here's the CICE version of the Droid thing (and I've frequently looped on entries, too, plus all the other alternatives):

Droid seekDroid(Map<String,Droid> map) {
public Droid result = null;
eachEntry(map, Block2<String, Droid>(String droidName, Droid droid) {
if (droid.isSought()) {
System.out.printf("'%s' is the droid we seek.%n", droidName);
result = droid;
return false;
}
});
return result;
}


And eachEntry would also have to be more complicated.

In all, the for loop thing is far more readable and comprehensible than this CICE version. But I still dislike the "throws E" boilerplate. Exceptions in Java are crazy.

But in general, current lack of easy abstraction in Java leads to more concrete code. And more code duplication. But more concrete code also has its benefits. Just DRY folk like me have trouble swallowing it.

And the for loop is interesting still because it's the one control structure in Java that is callbacky in style (well, since Java 5 for loops, at least). I might prefer no need for "for" in front, though, so more like your older examples.

Juan C Nuno said...

I have to say that the direction the Java language is taking is leaving a sour taste in my mouth. I loved how the language used to be clear. I felt it was one of its strongest suits. How I could look at arbitrary code from where ever and know EXACTLY what it did.

As the language evolves I find myself having to look at the language specification more. Which I don't think is good. It's slowly turning into C++, which frankly is a mess of a language.

Of course, this is only my humble opinion.

theEdge said...

It seems to me the this proposal is from guys that don't like Object Oriented development.

The more I look at this the less I like it, and the more it looks like an excuse not to design classes and objects properly.

I can see that there will be things that can be done quicker this way but at the risk of it being a lot dirtier too.

At one time I had some contact with C and was familiar with the idea of passing function pointer into another function (eg sort) for invokation (which is sort of what you are trying to do here).

I have now been doing commercial Java Development for 8 years and have never missed this type of feature.

I think you are trying to get the language to allow you to do something that shouldn't be done.

jl235 said...

Closures work brilliantly in Ruby so I love the idea of having them in Java, primarily to use instead of for loops. However this doesn't look natural to me. I'd much rather see:

for eachEntry(String droidName, Droid droid : map) {
// do something
}

written as:

eachEntry(map) {String droidName, Droid droid =>
// do something
}

To me that clearly shows map being passed into eachEntry and that droidName and droid are parameters of the code block. I don't like the original syntax because it's not a for loop, it's a method taking a map and a block. As such it is obscuring what is actually going on.

Anonymous said...

I know it's blasphemy, but I think it is time for a new Collections package. It is a normal package after all (java.util) and has several other flaws due to its legacy:
- Stack is not an interface
- Map#get takes an Object, not a subclass of K
... to name just a few.

The Jakarta Commons Collections (see http://commons.apache.org/collections/) have a lot of nice Ideas (even something the named "Closure" ;-) that are worth a thought.

Regards