Monday, December 18, 2006

Closures Talk and Spec Update

This post discusses a draft proposal for adding support for closures to the Java programming language for the Dolphin (JDK 7) release. It was carefully designed to interoperate with the current idiom of one-method interfaces. The latest version of the proposal and a prototype can be found at http://www.javac.info/.

My 2006 JavaPolis talk Closures for Java has now been published - slides synchronized with a soundtrack - and can be viewed online at the new Javapolis website, Parleys.com. If you're still wondering why all the fuss about closures, I recommend you listen to the talk.

We've also just published an update to the specification, version 0.4 . The specification seems to be setling down quite a bit, as the changes are minor:

  • The throws clause of a function type is now placed between the curly braces.
  • The description of function types has been rewritten to emphasize more clearly that function types are interface types rather than a separate extension to the type system.
  • The closure conversion can now convert a closure that has no result expression to an interface type whose function returns java.lang.Void. This change helps support completion transparency. A completion transparent method is one written such that the compiler can infer that an invocation completes normally iff the closure passed to it completes normally.
  • Some examples have been modified to be completion transparent.
  • null is now a subtype of Unreachable, and a number of small related changes. Thanks to Rémi Forax for pointing out the issues.

I hope the talk, which is less technical than the specification, makes it easier for you to evaluate the proposal and compare it to the alternatives.

Friday, December 15, 2006

Javapolis 2006

I'm on my way back from Javapolis 2006, unfortunately missing the third day of the conference. One of the innovations this year was a bank of ten whiteboards where people brainstormed about the future of the Java language and platform. I had a camera with me but I realized too late that I should have taken snapshots of the contents of all those boards before I left. I hope someone else will. The only snapshots I took were the vote tallies that I discuss below.

There were three issues discussed on those boards that I'd like to share with you.

Closures

The first board contained a discussion of closures, including an informal vote of people "in favor of" and "against" adding support for closures. I gave a talk about closures yesterday afternoon, which explained the goals and design criteria, showed how they fit quite smoothly into the existing language and APIs, and more importantly explained in detail why anonymous instance creation expressions do not solve the problems closures were designed to solve. Before my talk about closures, the vote was about 55% in favor and 45% against. After my talk the vote was 71% in favor and 29% against. I don't think any "against" votes were added to the tally after my talk. There was also a BOF (Birds-Of-a-Feather) session in the evening discussing closures. Of the 20 or so attendees none admitted being against adding closures. I'm sure the fact that the BOF was scheduled opposite a free showing of Casino Royale didn't help, but I had hoped to hear from opponents more about their concerns. We discussed a number of issues and concerns, most of which had been discussed on my blog at one time or another.

One issue that I discussed with a few individuals earlier and then at the BOF was the idea of adding a statement whose purpose is to yield a result from the surrounding closure, which you could use instead of or in addition to providing a result as the last expression in the closure. It turns out that adding this feature make closures no longer suitable for expressing control abstractions. In other words, adding this feature to the language would make closures less useful! This is a very counterintuitive result. I first understood the issue by analyzing the construct using Tennent's Correspondence Principle, but it is much easier for most people to understand when the result is presented as specific examples that fail to work as expected. For now I'll leave this as an exercise to the reader, but I'll probably write more about it later. Incidentally, I believe the folks who designed Groovy got closures wrong for exactly this reason.

There was a video made of my talk that will be posted to the Javapolis website. Ted Neward also interviewed me on video, and Bill Venners interviewed me on audio. As soon as these are available on the web I'll blog pointers to them.

Native XML Support

Mark Reinhold gave a talk on adding native (i.e. language-level) support for XML into Java. Though they were not presented as such, some people prefer to think of the proposal as separable into a language extension part and an API part. The proposed APIs appear to be an improvement over the existing alternatives. However, the language extension for writing XML literals appears to be only marginally more convenient than the XML construction techniques provided by libraries in JDOM. I personally would like to see the new APIs pursued but XML creation provided in the JDOM way. Mark took a vote by show of hands on how people felt about the two issues, but I couldn't see the tally. There was also an informal tally about adding native XML support on one of the whiteboards. The result was 29% in favor and 71% against.

Constructor Invocation for Generics

Another much-discussed issue appeared on one of the whiteboards: the verbosity of creating instances of generic types. This is typical:

Map<Pair<String,Integer>,Node> map = new HashMap<Pair<String,Integer>,Node>();

The problem here is that the type parameters have to be repeated twice. One common "workaround" to this problem is to always create your generic objects using static factories, but the language should not force you to do that. A number of different syntax forms have been suggested for fixing this:

var map = new HashMap<Pair<String,Integer>,Node>();

This unfortuately requires the addition of a new keyword. Another:

final map = new HashMap<Pair<String,Integer>,Node>();

This reuses an existing keyword, but at the same time it also makes the variable final. Another variation on this idea:

map := new HashMap<Pair<String,Integer>,Node>();

In my opinion these three forms all suffer the same flaw: they place the type parameters in the wrong place. Since the variable is to be used later in the program, the programmer presumably wants control over its type and the reader wants the type to be clear from the variable declaration. In this case you probably want the variable to be a Map and not a HashMap. An idea that addresses my concern is:

Map<Pair<String,Integer>,Node> map = new HashMap();

Unfortunately, this is currently legal syntax that creates a raw HashMap. I don't know if it is possible to change the meaning of this construct without breaking backward compatibility. Another possibility:

Map<Pair<String,Integer>,Node> map = new HashMap<>();

You can see clearly by the presence of the empty angle brackets that the type parameters have been omitted where the compiler is asked to infer them. Of the alternatives, this is my favorite. I don't think it will be too hard to implement in javac using the same techniques that work for static factories of generic types.

Tuesday, December 05, 2006

Type Literals

Yesterday I wrote about super type tokens, which is an API that acts like class literals but with two advantages over class literals: they work with generic types, and they provide full generic type information. The disadvantages are that they are more verbose than class literals, and (being a different type than java.lang.Class) they are incompatible with APIs that use type tokens. It turns out that it is possible to get the best of both worlds by adding super type tokens to the language. I would call the resulting construct type literals.

The basic idea would be to retrofit java.lang.reflect.Type with generics, in the same way that java.lang.Class was retrofitted with generics in JDK5. Then the type of List<String>.class would be Type<List<String>>, and the structure of the resulting object would contain all of the generic type information that a Type is capable of expressing. The code generated by javac could very well use the trick outlined in my previous blog post, though I expect there are much more efficient ways of doing it. You would then be able to write

Type<List<String>> x = List<String>.class; 

The "type literal" on the right is currently a syntax error, but this language extension would give meaning to the syntax. How does this fit with the existing meaning of class literals? Very nicely. The type java.lang.Class already extends java.lang.reflect.Type, so a class literal of type Class<String> could be assigned to a variable of type Type<String>. In other words, the following would be legal:

Type<String> x = String.class;  

Adding support for type literals to the language and retrofitting java.lang.reflect.Type would simplify the use of the THC pattern with generic types and make existing type-token-based APIs interoperate nicely with the pattern.

Monday, December 04, 2006

Super Type Tokens

When we added generics to Java in JDK5, I changed the class java.lang.Class to become a generic type. For example, the type of String.class is now Class<String>. Gilad Bracha coined the term type tokens for this. My intent was to enable a particular style of API, which Joshua Bloch calls the THC, or Typesafe Heterogenous Container pattern. For some examples of where this is used see the APIs for annotations:

public <A extends Annotation> A java.lang.Class.getAnnotation(Class<A> annotationClass)

My earliest use of this feature (like my earliest use of all recent Java language features) appears in the compiler for the Java programming language (javac), in this case as a utility called Context, and you can find the code in the open source version. It was a utility that allowed the compiler to be written as a bunch of separate classes that all refer to each other, and solved the hard problem of getting all the parts created in an order such that they can be initialized with references to each other. The utility is also used to replace pieces of the compiler, for example to make related tools like javadoc and apt, the Annotation Processing Tool, and for testing. Today I would describe the utility as a simple dependency injection framework, but that wasn't a popular buzzword at the time.

Here is a simple but complete example of an API that uses type tokens in the THC pattern, from Josh's 2006 JavaOne talk:

public class Favorites {
    private Map<Class<?>, Object> favorites =
        new HashMap<Class<?>, Object>();
    public <T> void setFavorite(Class<T> klass, T thing) {
        favorites.put(klass, thing);
    }
    public <T> T getFavorite(Class<T> klass) {
        return klass.cast(favorites.get(klass));
    }
    public static void main(String[] args) {
        Favorites f = new Favorites();
        f.setFavorite(String.class, "Java");
        f.setFavorite(Integer.class, 0xcafebabe);
        String s = f.getFavorite(String.class);
        int i = f.getFavorite(Integer.class);
    }
}

A Favorites object acts as a typesafe map from type tokens to instances of the type. The main program in this snippet adds a favorite String and a favorite Integer, which are later taken out. The interesting thing about this pattern is that a single Favorites object can be used to hold things of many (i.e. heterogenous) types but in a typesafe way, in contrast to the usual kind of map in which the values are all of the same static type (i.e. homogenous). When you get your favorite String, it is of type String and you don't have to cast it.

There is a limitation to this pattern. Erasure rears its ugly head:

Favorites:15: illegal start of expression
f.setFavorite(List<String>.class, Collections.emptyList());
                          ^

You can't add your favorite List<String> to a Favorites because you simply can't make a type token for a generic type. This design limitation is one that a number of people have been running into lately, most recently Ted Neward. "Crazy" Bob Lee also asked me how to solve a related problem in a dependency injection framework he is developing. The short answer is that you can't do it using type tokens.

On Friday I realized you can solve these problems without using type tokens at all, using a library. I wish I had realized this three years ago; perhaps there was no need to put support for type tokens directly in the language. I call the new idea super type tokens. In its simplest form it looks like this:

public abstract class TypeReference<T> {}

The abstract qualifier is intentional. It forces clients to subclass this in order to create a new instance of TypeReference. You make a super type token for List<String> like this:

TypeReference<List<String>> x = new TypeReference<List<String>>() {};

Not quite as convenient as writing List<String>.class, but this isn't too bad. It turns out that you can use a super type token to do nearly everything you can do with a type token, and more. The object that is created on the right-hand-side is an anonymous class, and using reflection you can get its interface type, including generic type parameters. Josh calls this pattern "Gafter's Gadget". Bob Lee elaborated on this idea as follows:

import java.lang.reflect.Constructor;
import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.ParameterizedType;
import java.lang.reflect.Type;
import java.util.ArrayList;
import java.util.List;

/**
 * References a generic type.
 *
 * @author crazybob@google.com (Bob Lee)
 */
public abstract class TypeReference<T> {

    private final Type type;
    private volatile Constructor<?> constructor;

    protected TypeReference() {
        Type superclass = getClass().getGenericSuperclass();
        if (superclass instanceof Class) {
            throw new RuntimeException("Missing type parameter.");
        }
        this.type = ((ParameterizedType) superclass).getActualTypeArguments()[0];
    }

    /**
     * Instantiates a new instance of {@code T} using the default, no-arg
     * constructor.
     */
    @SuppressWarnings("unchecked")
    public T newInstance()
            throws NoSuchMethodException, IllegalAccessException,
                   InvocationTargetException, InstantiationException {
        if (constructor == null) {
            Class<?> rawType = type instanceof Class<?>
                ? (Class<?>) type
                : (Class<?>) ((ParameterizedType) type).getRawType();
            constructor = rawType.getConstructor();
        }
        return (T) constructor.newInstance();
    }

    /**
     * Gets the referenced type.
     */
    public Type getType() {
        return this.type;
    }

    public static void main(String[] args) throws Exception {
        List<String> l1 = new TypeReference<ArrayList<String>>() {}.newInstance();
        List l2 = new TypeReference<ArrayList>() {}.newInstance();
    }
}

This pattern can be used to solve Ted Neward's problem, and most problems where you would otherwise use type tokens but you need to support generic types as well as reifiable types. Although this isn't much more than a generic factory interface, the automatic hook into the rich generic reflection system is more than you can get with simple class literals. With a few more bells and whistles (toString, hashCode, equals, etc) I think this is a worthy candidate for inclusion in the JDK.

Wednesday, November 22, 2006

Linear Time Sort Puzzler

Interviewing for an engineering position at Google can be grueling and humbling. You get less than an hour with each interviewer, and may get one technical problem after another to solve. If you have performance anxiety you may have a hard time focusing on the problem. The best approach, if you can do it, is to think of the interview as a technically challenging but friendly brainstorming session. If you get hired, that's what your life will be like.

My friend, Jim Davis, told me about someone he interviewed earlier this week. The candidate thought the interview wasn't going well and hoped to impress Jim by sharing an idea he had: a new algorithm for sorting in linear time. It isn't a sort based exclusively on comparisons; it can be proven that any such sort must perform O(n log n) comparisons. It isn't a bucket sort, which is a standard algorithm for this problem (assuming certain characteristics of the input set). But the candidate's algorithm helped Jim understand the candidate's capabilities.

The algorithm works like this: you start by making a linear scan through the n numbers to be sorted and note the largest and smallest of the elements. Using that, you compute a linear function f(x) that will map the smallest input element to zero and the largest to n. Then, for each input element xi, you fork a task that waits f(xi) milliseconds and then outputs xi. After n milliseconds the output contains the input elements in sorted order.

There may be problems when two tasks attempt to output their data at nearly the same time, but let's ignore that for a moment and assume an "ideal" system. If you have a truly concurrent computer, this may run in O(n) time using O(n) processors, thereby consuming a total of O(n2) processing time. There are far better concurrent sorting algorithms around. Much more interesting is the idea of running this algorithm on a sequential computer with a non-preemptive scheduler. The non-preemptive scheduler avoids the problem of two tasks producing data at the same time. The scheduler can also ensure that the tasks run in their proper order, even if the running time of some tasks causes others to start slightly "too late".

Jim explained the problem with the algorithm, concluding "so you've reduced the problem of sorting in linear time to a different linear time sorting algorithm." After Jim explained this remark, the candidate replied "well, the idea isn't really very well thought out." Here is the puzzle: what was Jim talking about?

Monday, November 20, 2006

A Thread Pool Puzzler

I participated in the design and development of a couple of concurrency libraries for shared-memory multiprocessors long before such machines were popular. So when I started using java.util.concurrent I was already somewhat comfortable with the concepts. But when I used it more intensely for production work in the Google Calendar server, I ran into a couple of "gotcha" situations. I'd like to tell you about one in particular, in part because it might help you avoid the problem yourself, and in part because I believe this issue exposes some missing functionality in the concurrency framework.

Many parallel programming problems can be expressed using fork-join parallelism, in which tasks spawn, or "fork", a number of subtasks that can be executed in parallel. The caller then waits for these subtasks to complete by "join"ing with them. Consider the following sequential program. It is an abstract model of some larger program that has three logical layers.

class Program {
    static final int N = 3;
    public static void main(String[] args) {
        doSomeWork();
        loopNtimes(N, new Runnable() {
                public void run() { doLayerOne(); }
            });
        System.out.println();
    }

    static void doLayerOne() {
        doSomeWork();
        loopNtimes(N, new Runnable() {
                public void run() { doLayerTwo(); }
            });
    }

    static void doLayerTwo() {
        doSomeWork();
        loopNtimes(N, new Runnable() {
                public void run() { doLayerThree(); }
            });
    }

    static void doLayerThree() {
        doSomeWork();
    }
    static void loopNtimes(int n, Runnable runnable) {
        for (int i=0; i<n; i++) runnable.run();
    }
    static void doSomeWork() {
        System.out.print(".");
        try { Thread.sleep(500L); } catch (InterruptedException _) {}
    }
}

This program prints 40 dots, taking a half second for each one. It runs to completion in about 20 seconds. Let's rewrite the loops as concurrent (instead of sequential) loops, using an idiom recommended by Martin Buchholz. To do that we replace the method loopNtimes with the following:

    static ExecutorService threadPool = Executors.newCachedThreadPool();
      static void loopNtimes(int n, Runnable runnable) {
        Collection<Callable<Object>> c = new ArrayList<Callable<Object>>();
        for (int i=0; i<n; i++) c.add(Executors.callable(runnable));
        Collection<Future<Object>> futures = null;
        try { futures = threadPool.invokeAll(c); } catch (InterruptedException _) {}
        if (futures != null) for (Future<Object> f : futures) {
            try { f.get(); }
            catch (InterruptedException ex) {}
            catch (ExecutionException ex) {
                ex.printStackTrace();
                System.exit(1);
            }
        }
    }

This requires a couple of other minor changes to the program (two import statements and System.exit(0) at the end of main), but the program now runs in two seconds instead of twenty. So far so good, but if N is larger, say a hundred, this program fails. It throws OutOfMemoryError becuase it tries to allocate too many threads. My first attempt to fix this replaced the thread pool by one containing a fixed number of threads:

    static ExecutorService threadPool = Executors.newFixedThreadPool(100);

This version of the program works and runs in 2 seconds. But why should we use 100 threads? If we imagine that the Thread.sleep statements represent computationally intensive parts of the program, it might make more sense to have a number of threads approximately the same as the number of physical processors. I'm running this on a machine with an Intel Cetrino Duo processor, which acts roughly like 2 processors. Let's be generous, however, and make ten threads. So we modify this version of the program by changing 100 to 10. That won't be as fast as the version with 100 threads, but just how fast will it be?

If you haven't guessed the punch line by now, I'll tell you: with ten threads in the pool the program prints 11 periods and then deadlocks! If you use a debugger to examine the state of the program to figure out what's going on, you'll find the main thread waiting for invokeAll, three threads in doLayerOne waiting for invokeAll, seven threads in doLayerTwo waiting for invokeAll, and there are no threads left to do any of the work of calling doLayerThree. This is a classic thread starvation deadlock.

If you're just trying out this program to see what happens, you might be slightly annoyed and finally give up and hit control-C to quit the Java program, but when our program (Google Calendar) encounters this kind of problem our customers get annoyed, give up, and sign up for a competitor like Yahoo Calendar or 30Boxes. Hey, don't click those links; trust me, you really want Google Calendar. My point is that we can't leave this to chance.

What can or should we do about this problem? The first idea is to change the 10 back into 100, but those numbers are pulled out of thin air. Without analyzing the behavior and interaction of all the places where the thread pool is used, understanding the dynamic performance of the application under real loads, and placing bounds on the number of tasks that will be used at each level in the program's hierarchy, it is difficult or impossible to pick a number that will always avoid this kind of deadlock. Another idea is to use unbounded thread pools, but as we've seen under high load situations those can cause an explosion in the number of threads, resulting in the program failing by running out of memory.

What we did to address this issue is avoid the single monolithic thread pool altogether. Instead, we use a separate thread pool at every level in the hierarchy. In terms of this example, we would have a thread pool for use in main, one for use in doLayerOne, and one for use in doLayerTwo. Every subsystem that requires concurrency gets its own personal thread pool. That way every layer that uses concurrency is guaranteed to make progress when it has work to do, so this kind of deadlock cannot occur. But there is a cost to this as well: balancing the sizes of these thread pools is a black art. During operation we have hundreds of threads, most of which are sitting around doing nothing. Besides being a waste of resources, the generous surplus of "extra" threads make debugging more difficult than it should be. If the system doesn't break down so neatly into layers (perhaps because there are recursive loops in the call cycle of the subsystems) then even this solution can break down and result in thread starvation.

The situation is not entirely hopeless. In my opinion, this kind of thread starvation should never occur because there is always one thread that can contribute processing power toward execution the subtasks: the thread that is waiting for the subtasks to complete. Here's the implementation of invokeAll as it appears in the JDK:

    public <T> List<Future<T>> invokeAll(Collection<Callable<T>> tasks)
        throws InterruptedException {
        if (tasks == null)
            throw new NullPointerException();
        List<Future<T>> futures = new ArrayList<Future<T>>(tasks.size());
        boolean done = false;
        try {
            for (Callable<T> t : tasks) {
                FutureTask<T> f = new FutureTask<T>(t);
                futures.add(f);
                execute(f);
            }
            for (Future<T> f : futures) {
                if (!f.isDone()) {
                    try { 
                        f.get(); 
                    } catch(CancellationException ignore) {
                    } catch(ExecutionException ignore) {
                    }
                }
            }
            done = true;
            return futures;
        } finally {
            if (!done)
                for (Future<T> f : futures) 
                    f.cancel(true);
        }
    }

This code does not use the current thread to do any of the work of invoking the callables. Below is a slightly modified version (I've added a line to the original and refactored it to make it a static method that we can put in the program) that uses the current thread to do any work that another thread hasn't already started. I've highlighted the newly added code:

    public static <T> List<Future<T>> invokeAll(
            ExecutorService threadPool, Collection<Callable<T>> tasks)
        throws InterruptedException {
        if (tasks == null)
            throw new NullPointerException();
        List<Future<T>> futures = new ArrayList<Future<T>>(tasks.size());
        boolean done = false;
        try {
            for (Callable<T> t : tasks) {
                FutureTask<T> f = new FutureTask<T>(t);
                futures.add(f);
                threadPool.execute(f);
            }
            // force unstarted futures to execute using the current thread
            for (Future<T> f : futures) ((FutureTask)f).run();
            for (Future<T> f : futures) {
                if (!f.isDone()) {
                    try { 
                        f.get(); 
                    } catch(CancellationException ignore) {
                    } catch(ExecutionException ignore) {
                    }
                }
            }
            done = true;
            return futures;
        } finally {
            if (!done)
                for (Future<T> f : futures) 
                    f.cancel(true);
        }
    }

Using this version of invokeAll, the program does not experience thread starvation. If the thread pool is reduced in size to just one thread, the program runs to completion in about 11 seconds, because two threads are contributing to doing the work (the main thread and the thread from the pool).

I discussed this issue with Doug Lea, and he warned me that selecting tasks for efficient scheduling in a fork-join concurrency framework is not trivial; the standard solution is to have a double-ended queue for each worker task where it enqueues its subtasks. The worker removes the most recently generated task from this queue for itself to process, thereby simulating a depth-first execution strategy in the single-thread case. When a worker finds itself without any work to do, it steals work from the other end of the queue of another task. That is, it should steal one of the least-recently created (course-grained) subtasks. In addition, it is beneficial to have a mechanism to avoid the queue altogether for the bottom of the call chain. Doug told me that this strategy was pioneered by the Cilk work, but I first learned about this strategy 10 years earlier reading WorkCrews: An Abstraction for Controlling Parallelism by Mark T. Vandervoorde and Eric S. Roberts. My implementation provides exactly this behavior but with a much simpler implementation. The invocation of run executes one of the tasks most recently generated by the current thread. When a thread has no more work to do, it removes work from the queue of the underlying ExecutorService, which is a FIFO queue, and so it takes the least-recently generated task of all workers. On the other hand, because this implementation shares a single queue among all worker threads, there may be additional synchronization overhead compared to the WorkCrews/Cilk solution.

It is possible to use the existing concurrency utilities to work around the problem, if you don't mind the task scheduling being far from optimal. You can do that by setting CallerRuns policy on a ThreadPoolExecutor, and using a synchronous queue:

static ThreadPoolExecutor threadPool =
  new ThreadPoolExecutor(0, 10, 60L, TimeUnit.SECONDS,
                         new SynchronousQueue<Runnable>());
static {
    threadPool.setRejectedExecutionHandler(
        new ThreadPoolExecutor.CallerRunsPolicy());
}

Doug explained to me that the earlier public-domain version of the concurrency utilities had a full implementation of a framework for fork-join parallelism, but they didn't get included in JDK5:

"... The vast majority of such usages are nicest to support as "loop parallelism" utilities. And it is not so much that utilities based on FJ tasks are inconvenient to use that has kept them out, but instead uncertainty about basic APIs until closures are settled. All of the methods for aggregate operations on collections and arrays (applyToAll, map, reduce, mapReduce, findAny, findAll, etc) require function-type arguments (perhaps along with some sort of purity annotation as might be introduced via JSR305) that, depending on how things work out otherwise, would need to be introduced more generally."

Did you think I would get through an entire blog post without mentioning Closures?

Monday, November 13, 2006

Closures Esoterica: completion transparency

The Closures for Java draft specifiation (currently at v0.3) is the product of a lot of work. Everything in the spec has been discussed and scrutinized before being placed into the specification, and is there for a reason. From your point of view, you have seen snapshots of our specification, with very little explanation of why things are the way they are. Changes from one revision to another may seem arbitrary. I am not writing detailed notes on the progress and evolution of the specification and its prototype, because there is too much to explain and too little time would be left for me to work on the specification and prototype. I am, after all, working on this on my own time. I suspect few people would care for that level of detail anyway. Much of the work on the specification takes place during face-to-face meetings, after which I update the specification to reflect our decisions. Some issues get resolved through email discussions. We've been keeping an eye on the comments on this blog, various message boards, mailing lists, and elsewhere for input, and meeting with interested folks, and that has been a very useful part of the process. Thank you for your help!

I'm preparing the next revision of the specification, and we just resolved an issue by email. Unless you know what the issue is, the change in the upcoming version of the specification will appear totally arbitrary. Because the issue was resolved by email, I have a nice record of the problem and its resolution. Here is an edited excerpt of the the start of our discussion:

We'd like programmers to be able to define their own control constructs. One thing that would make this easier would be if programmer-defined control constructs act like built-in ones for the purposes of handling reachability of statements (and "can complete normally"). That's why we added the bottom type java.lang.Unreachable to the specification. But just having that isn't enough. Watch as I try to define a withLock method that is completion transparent.

First, ignoring reachability, the library writer might define

<throws E> void withLock(Lock lock, {=>void throws E} block) throws E { ... }

this achieves exception transparency, but not completion transparency. If the block returns a value, this would work:

<T, throws E> T withLock(Lock lock, {=>T throws E} block) throws E { ... }

this works because a closure that can't complete normally can be converted, via the closure conversion, to {=>Unreachable}. In practice, any specific invocation of withLock will either have a block that doesn't return anything (i.e. =>void) or can't complete normally (i.e. =>Unreachable}. You might think, therefore, that the library writer need only write these two overloaded methods to achieve completion transparency, and let overload resolution take care of the rest.

Unfortunately it isn't that simple. With these two methods, an invocation of withLock using a closure that can't complete normally can be an invocation of either of these methods. That's because the closure conversion can convert a closure that results in Unreachable to an interface whose method returns void. Since both are applicable, the compiler must select the most specific. Neither of these methods is more specific than the other (the closures are unrelated, given our scheme of mapping function types to interfaces), so the invocation is ambiguous.

I don't propose that we mess with the specification for "most specific". I'm afraid that would be a disaster, though maybe you can reassure me that it isn't. Instead, I propose that the closure conversion be allowed to convert a closure that results in void to an interface whose method's return type is java.lang.Void. The generated code would naturally return a null value. Then the library writer would write only the second version, above, and it would work both for the void-returning case and the Unreachable-returning case. I think being able to write control abstractions as a single method (instead of two overloadings) is a significant advantage. Additionally, this API is more flexible because it can be used by programmers to pass a value back through from the closure to the caller.

We discussed this issue and found the proposed resolution our best option. The next version of the proposal will include in the closure conversion a provision allowing conversion of a closure that returns void to an interface type whose method returns java.lang.Void. You can see hints in this email thread that the syntax of a function type has changed slightly (the throws clause has moved inside the curly braces), and that a function type is now specified to be an interface type, rather than having its own type rules. The specification is getting simpler, which is definitely a move in the right direction!

Sunday, November 05, 2006

Reified Generics for Java

Many people are unsatisfied with the restrictions caused by the way generics are implemented in Java. Specifically, they are unhappy that generic type parameters are not reified: they are not available at runtime. Generics are implemented using erasure, in which generic type parameters are simply removed at runtime. That doesn't render generics useless, because you get typechecking at compile-time based on the generic type parameters, and also because the compiler inserts casts in the code (so that you don't have to) based on the type parameters.

Generics are implemented using erasure as a response to the design requirement that they support migration compatibility: it should be possible to add generic type parameters to existing classes without breaking source or binary compatibility with existing clients. I wrote about this two years ago. Migration compatibility is exploited widely in JDK5; all of the collection classes and interfaces use generics, yet existing code using collections continues to work. Without migration compatibility, the collection APIs could not be retrofitted use generics; we would probably have added a separate, new set of collection APIs that use generics. That was the approach used by C# when generics were introduced, but Java did not take this approach because of the huge amount of pre-existing Java code using collections.

While solving one set of problems, erasure adds a set of its own problems. For a type parameter T, you can't write a class literal T.class. You can't use instanceof to test if an object is of type T. And you can't create an array of T. But it gets worse. You also can't write class literals for generic types like List<String>.class, or test if an object is an instanceof List<String>, or create an array of List<String>. The implementation of generics using erasure also causes Java to have unchecked operations, which are operations that would normally check something at runtime but can't do so because not enough information is available. For example, a cast to the type List<String> is an unchecked cast, because the generated code checks that the object is a List but doesn't check whether it is the right kind of list.

It isn't too late to add reified generics to Java. There are two basic approaches. The first uses the language as it exists today but adds the generic type information at runtime. In the ideal world, we wait until every bit of Java code in the world has been modified to use generics safely, and then go through a transition in which we start recording type parameters at runtime by using a new javac and VM. There are two main difficulties with this approach. First, it is not compatible. It is not source compatible because you would no longer be allowed to use raw types (i.e., it is impractical to wait until all code has been generified); it is not binary compatible because new clients would invoke new kinds of constructors for generic classes that also record the type parameters, but existing classes do not have those constructors. (A hybrid approach is being investigated in which the system records type parameters only when they are available, but I haven't yet seen a workable proposal along these lines.) The second problem is more insidious: lots of existing code uses generics but uses them in an unsafe way. For example, I have seen code that creates an ArrayList<Object>, but later casts it to a List<String> where the author knows that it only contains objects of type String. I would not be surprised to find such code in the JDK. Such code must fail at runtime when generics are reified, so some existing (but working) code would be broken.

The other approach modifies the language so that the declaration of a generic parameter distinguishes reified type parameters from erased ones. It is a pure extension of the language. The existing syntax for generics would continue to designate generics by type erasure, but a newly introduced syntax would be used to declare reified type parameters. Perhaps something like this:

class NewCollection<class E> extends Collection<E> { ... }
class NewList<class E> extends NewCollection<E>, List<E> { ... }

The rules for these reified type parameters are straightforward. When using a reified generic class, the type argument has to be a reified type. You would not be allowed to create a NewCollection in its raw form. Code that uses only reified types would never get an unchecked warning from the compiler, because the compiler can always generate the correct checking code. If you have a reference of type Collection that refers to a NewCollection<String>, you would get a runtime error if you attempt to put anything other than a String into it. In this sense reified collections are like java.util.Collections.checkedCollection, only better because the element type can itself be generic and the guarantees are stronger. You can even create an array of E inside an implementation of this type. But there is a disadvantage: the strategy requires a new set of reified Collections APIs parallel to the existing erasure-based ones. This is why C# introduced new collections APIs when they introduced generics.

As the above declaration illustrates, the new reified collection interfaces could be extensions of the existing collection interfaces. Consequently, existing APIs that receive old collections can be passed new collections. For example, you could pass a NewCollection<String> to an API that is declared to receive a Collection<String>. In addition, the new reified collection classes (that is, the implementations) could be extensions of the existing collection classes. That allows the implementation of the old and new collection classes to be shared, and provides a nice migration path. Existing APIs that return old collections can be modified to return new collections. Over time, most libraries can be migrated to use the new APIs without breaking backward compatibility (though it would break unsafe clients). In this way we can have a more gradual migration than would be possible with the first approach.

I don't mean to trivialize the work involved: this requires a significant revision of the VM and language specifications and a significant effort from the teams who implement VMs, java compilers (i.e. javac and Hotspot), and the core JDK libraries. From the programmer's point of view, however, the language change is small and mostly involves removing restrictions.

Approaching the problem this way has a secondary benefit. Because the new reified collection interfaces don't exist yet, it is possible for methods to be added to them when they are introduced. Perhaps sorting methods belong in NewList? If reified generics are added at the same time as (or after) closures, a number of useful methods could be added to take advantage of closures. Filters, mappers, reducers, and visitors are just some of the ideas.

Sunday, October 22, 2006

Closures for Java (v0.3)

This post discusses a draft proposal for adding support for closures to the Java programming language for the Dolphin (JDK 7) release. It was carefully designed to interoperate with the current idiom of one-method interfaces. The latest version of the proposal and a prototype can be found at http://www.javac.info/.

We've just completed a major revision and simplification of the Closures for Java specification. Rather than post the specification on this blog, it is on its own web page, here. There are two versions of the specification: one with function types and one without. There is a pulldown menu at the top of the specification that makes it display only the text relevant to the version of the specification you want to see. Keeping the specification in this form will allow us to more easily maintain the two parallel specifications so we can compare them and delay a decision on this issue until later. These specifications are the starting point for our prototype.

There are two significant changes in this revision. First, there is a completely new syntax for closures and function types. Using the new syntax, with functon types, you can write

{int,int => int} plus = {int x, int y => x+y};

As you can see, we're proposing to add the new "arrow" token => to the syntax. Just to be perfectly clear, this code declares a variable of function type

{int,int => int}

which is a function that takes two ints and yields an int. The variable is named "plus" and it is assigned the closure value

{int x, int y => x+y}

which is a closure that receives two ints (named x and y) and yields their sum.

The second major change has to do with the treatment of "restricted" closures. We've done away with the "synchronized" parameters from the previous revision of the specification. Instead, you can inherit from a marker interface to restrict the closure conversion. If you don't use the marker interface, then closures are not restricted when converted to that type.

Another important change is to the meaning of a function type. It is now defined to be a system-provided interface type, and it is provided in a way that gives the required subtype relations among function types. That means that in order to invoke a value of function type, instead of simply placing arguments in parens after the function value, you use its "invoke" method. This significantly simplifies the name lookup rules for variables of function type. In fact, now there are no special rules at all.

As always, your feedback and ideas are welcome.

Friday, October 13, 2006

Concurrent Loops Using Java Closures

The java.util.concurrent package has very nice support for writing concurrent loops. I used it recently to improve the performance of some code in the Google Calendar Server. In the Calendar Server, we have a class representing an event on a calendar. The following method on an event computed the set of attendees to the event. The old code looked something like this:

public Collection<Principal> getAttendees() {
    List<Principal> result = new ArrayList<Principal>();
    for (EventResponse r : getResponses()) {
        if (r.mayAttend()) {
           
result.add(r.getAttendee());
        }

    }
    return result;
}

The problem with this code, it turned out, was that getAttendee() has to talk to a remote server to look up the Principal in a database, and although the remote server is very fast, it is far enough away that there is some latency. Since this loop is sequential, when there are many responses it spends most of its time waiting for a response from the remote server, over and over again. Consequently the call to getAttendees() was very slow for events with many attendees. One solution to this kind of problem would be to send a single batch request, and that is probably the best long-term solution to the problem, but the server in this case doesn't yet know how to handle batch requests. So our fix to the immediate performance problem was to use java.util.concurrent and make the requests in parallel; this did speed things up significantly:

public Collection<Principal> getAttendees() {
    final List<Principal> result = new ArrayList<Principal>();
    CompletionService<Void> ecs =
        new ExecutorCompletionService<Void>(threadPool);
    for (final EventResponse r : getResponses()) {
        ecs.submit(new Callable<Void>() {
            public Void call() {
                if (r.mayAttend()) {
                    try {

                        Principal attendee = r.getAttendee();
                        synchronized (result) {
                            result.add(attendee);
                        }
                    } catch (Throwable ex) {
                        LOGGER.log(Level.SEVERE, "Uncaught Exception", ex);
                    }
                }
                return null;
            }
        });
    }
    // wait for the tasks to complete
    for (final EventResponse r : getResponses()) {
        try {
            /*discard*/ ecs.take().get();
        } catch (InterruptedException ex) {
            throw new AssertionError(ex); // shouldn't happen
        } catch (ExecutionException ex) {
            LOGGER.log(Level.SEVERE, "Uncaught Exception", ex);
        }
    }
    return result;
}

When I find code like this I have a few reactions. First, my eyes glaze over at the complexity. Then, if I'm interested, I look carefully at the code to try to understand it. After all, I'm likely to want to do something like this again someday. Finally, I bookmark it so I can add it to my bag of tricks.
I don't think writing a concurrent loop should have to be so complex. Here is what I would like to have written:

public Collection<Principal> getAttendees() {
    List<Principal> result = new ArrayList<Principal>();
    for eachConcurrently(EventResponse r : getResponses(), threadPool) {
        if (r.mayAttend()) {
            Principal attendee = r.getAttendee();
            synchronized (result) {
                result.add(attendee);
            }
        }
    }
    return result;
}

You might think that in order to do this kind of thing we would need to add a concurrent looping statement to the language. Actually, it is possible to add the concurrent looping construct as a library method if you have closures in the language! It isn't trivial to write, but that's why we have people like Doug Lea in the world. An API like this "for eachConcurrently" thing should be written once by an expert and placed into the JDK for everyone to use.

What should happen if you use continue, break, or return within the body of this loop, or throw an exception? The continue case is easy: it just completes execution of that one iteration, and the other iterations proceed on their merry way. The semantics of break are a bit subtle, but obvious once you realize this is supposed to act like a loop: it completes the current iteration and cancels any other iterations that have not completed, and control returns to the caller of for eachConcurrently. Handling a return statement is similar: it cancels any uncompleted iterations and returns from the enclosing method, which in this case would be getAttendees. Finally, any exception that propagates out of a loop iteration cancels uncompleted iterations and propagates from the loop.

p.s.: Martin Buchholz offered the follow improved version of the code using java.util.concurrent:

public Collection<Principal> getAttendees() {
    final Collection<Principal> result
        = new ConcurrentLinkedQueue<Principal>();

    final Collection<Callable<Object>> tasks
        = new ArrayList<Callable<Object>>();
    for (final EventResponse r : getResponses())
        tasks.add(Executors.callable(new Runnable() { public void run() {
            try {
                if (r.mayAttend())
                    result.add(r.getAttendee());
            } catch (Throwable ex) {
                LOGGER.log(Level.SEVERE, "Uncaught Exception", ex);
            }}}));
    try {
        threadpool.invokeAll(tasks);
    } catch (InterruptedException ex) {}
    return new ArrayList<Principal>(result);

}

Monday, October 02, 2006

Iterative control abstraction: user-defined loops

A couple of people have written about using Java closures to write methods that act as looping constructs. For example, iterating through a map is currently an ugly exercise in boilerplate:

Droid seekDroid(Map<String,Droid> map) {
    for (Map.Entry<String,Droid> entry : map.entrySet()) {
        String droidName = entry.getKey();
        Droid droid = entry.getValue();
        if (droid.isSought()) {
            System.out.printf("'%s' is the droid we seek.%n", droidName);
            return droid;
        }
    }
}

It has been proposed that we add looping methods to Collection and Map that take advantage of closures, but it would be incompatible to add methods to an existing interface. We can accomplish the same thing using static methods. With one small additional twist to the closures proposal, we could make it even easier to write a method that iterates over maps. Here is what we want our client to look like:

import static java.util.Collections.eachEntry;
...
Droid seekDroid(Map<String,Droid> map) {
    for eachEntry(String droidName, Droid droid : map) {
        if (droid.isSought()) {
            System.out.printf("'%s' is the droid we seek.%n", droidName);
            return droid;
        }
    }
}

To enable this, the following would be added to java.util.Collections:

public interface Block2<K,V,throws E> {
    void invoke(K k, V v) throws E;
}
public static <K,V,throws E>
void for eachEntry(Map<K,V> map, Block2<K,V,E> block) throws E {

    for (Map.Entry<K,V> entry : map.entrySet()) {
        block.invoke(entry.getKey(), entry.getValue());
    }
}

Notice the "for" qualifier on the declaration of eachEntry, and the for keyword on its invocation. Methods defined this way and then invoked using the control abstraction syntax would require the use of the for keyword on the invocation. The compiler arranges such control abstractions to act like loops, by properly interpreting break and continue statements within the body of the closure. Although this example doesn't use break or continue statements, the mere presence of the for keyword on the invocation helps the reader understand the code. In this way user-defined control constructs could be written that appear and act as loops.

Sunday, September 17, 2006

Failure of Imagination in Language Design

Failure of imagination - that is, failure to make language constructs orthogonal because you aren't quite sure how people will need to use them together - is one of the most common and severe errors you can make in programming language design. It is an error that is often impossible to correct after the fact. There is an approach to programming language design that you can take to maximize your opportunities to err in this way: consider the interactions of individual language features and decide on a case-by-case basis if the interaction will be allowed or not (or even what the semantics should be) based on whether or not you can find use cases and which semantics best fit those use cases.

This is an approach favored by expert API designers, because the domain of API design is best suited to this approach and these designers expect their expertise to carry from one domain to the other. In an API, the components of the API fit together in particular and well-defined patterns due to the typing and subtyping relationships between the elements; the job of the API designer is to ensure that the patterns you can build with the API elements satisfy a set of use cases.

In a programming language, on the other hand, the elements that are used to assemble programs are ideally orthogonal and independent. Programmers can assemble the elements in arbitrary ways, often unexpected but surprisingly useful ways. The "patterns" movement for example records some of these ideas by creating a common terminology for newly discovered ways of assembling programs from primitives.

To be sure, use cases also play a very important role in language design, but that role is a completely different one than the kind of role that they play in API design. In API design, satisfying the requirements of the use cases is a sufficient condition for completeness. In language design, it is a necessary condition.

Wednesday, September 13, 2006

Nominal Closures for Java (version 0.2)

This post describes a draft proposal for adding support for closures to the Java programming language for the Dolphin (JDK 7) release. It was carefully designed to interoperate with the current idiom of one-method interfaces. The latest version of the proposal and a prototype can be found at http://www.javac.info/.

This revision of the closures proposal eliminates function types, introduces the Unreachable type, and gives related reachability rules. A closure's "return" type is now called its result type. The abbreviated invocation syntax now takes any statement, not just a block, to make nesting work nicely.

Gilad Bracha, Neal Gafter, James Gosling, Peter von der Ahé

Modern programming languages provide a mixture of primitives for composing programs. C#, Javascript, Ruby, Scala, and Smalltalk (to name just a few) have direct language support for delayed-execution blocks of code, called closures. A proposal for closures is working its way through the C++ standards committees as well. Closures provide a natural way to express some kinds of abstraction that are currently quite awkward to express in Java. For programming in the small, closures allow one to abstract an algorithm over a piece of code; that is, they allow one to more easily extract the common parts of two almost-identical pieces of code. For programming in the large, closures support APIs that express an algorithm abstracted over some computational aspect of the algorithm. This proposal outlines a specification of closures for Java without introducing function types.

1. Closure Literals

We introduce a syntactic form for constructing a closure value:

Expression3
Closure
Closure
FormalParameters { BlockStatementsopt Expressionopt }

Example

We can write a closure that adds two to its argument like this:

interface IntFunction {
int invoke(int i);
}
IntFunction plus2 = (int x){ x+2 };

2. The type of a closure

A closure literal has a "closure type" that exists transiently at compile time. It is converted to some object type at compile-time; See Closure Conversion for details. It is a compile-time error if a closure is not subject to a closure conversion.

The type of a closure is inferred from its form as follows:

The argument types of a closure are the types of the declared arguments.

If the body of the closure ends with an expression, the result type of the closure is the type of that expression; otherwise if the closure can complete normally, the result type is void; otherwise the result type is Unreachable.

The set of thrown types of a closure are those checked exception types thrown by the body.

Example

The following illustrates a closure being assigned to a variable of a compatible object type.

interface VoidIntBlock<throws E> {
void invoke(int i) throws E;
}
VoidIntBlock<InterruptedException> sleeper = (int t) { Thread.sleep(t); };

3. synchronized parameters

We allow the qualifier synchronized to be used on formal method arguments. When a closure is passed to such a parameter, the closure conversion (see Closure Conversion) is allowed to close over its context: the closure may use non-final local variables from enclosing scopes, and may use control-transfer statements such as break, continue, and return that transfer control outside of the body of the closure. An overriding method declaration may only add (i.e. may not remove) the synchronized qualfier to formal parameters compared to a method it overrides; an abstract method that overrides one whose parameter is declared synchronized must itself declare the parameter synchronized. If a closure is passed to a parameter that is not declared synchronized, then the only local variables from enclosing scopes it may access are those that are marked final.

Note: this qualifier is necessary to enable API writers to distinguish synchronous from asynchronous receivers of closures. For compatibility with existing APIs, most of which are asynchronous, the default is not allowing closing over the context. The constraint on overriders preserves the substitutability of subtypes: if a closure is allowed when passed to the superclass method, it must be allowed when passed to the subclass method. A consequence of the overriding constraint is that no existing overridable method can be retrofitted to accept synchronous closures without breaking source compatibility. Static methods such as AccessController.doPrivileged can be retrofitted. The synchronized keyword isn't an ideal choice, but it is the best we've found so far.

Names that are in scope where a closure is defined may be referenced within the closure. It is a compile-time error to access a local variable declared outside the closure from within a closure unless the closure is passed to a parameter that is declared synchronized or the variable was declared final.

Note: a local variable that is referenced within a closure is in scope in the body of the closure. Consequently the lifetime of objects referenced by such variables may extend beyond the time that the block containing the local declaration completes.

4. Closure conversion

A closure literal may be assigned to a variable or parameter of any compatible object type by the closure conversion:

There is a closure conversion from every closure of type T to every interface type that has a single method m with signature U such that T is compatible with U. A closure of type T is compatible with a method U iff all of the following hold:

  • Either
    • The result type of T is either the same as the return type of U or
    • There is an assignment conversion from the result type of T to the return type of U; or
    • the result type of T is Unreachable.
  • T and U have the same number of arguments.
  • For each corresponding argument position in the argument list of T and U, both argument types are the same.
  • Every exception type in the throws of T is a subtype of some exception type in the throws of U.

If the closure is not passed to a parameter that was declared synchronized then

  • it is a compile-time error if the closure being converted contains a break, continue or return statement whose execution would result in a transfer of control outside the closure
  • it is a compile-time error if the closure being converted refers to a non-final local variable or parameter declared in an enclosing scope.

The closure conversion to a non-synchronized parameter applies only to closures that obey the same restrictions that apply to local and anonymous classes. The motivation for this is to help catch inadvertent use of non-local control flow in situations where it would be inappropriate. Examples would be when the closure is passed to another thread, or stored in a data structure to be invoked at a later time when the method invocation in which the closure originated no longer exists.

Note: The closure conversion occurs at compile-time, not at runtime. This enables javac to allocate only one object, rather than both a closure and an anonymous class instance.

We are considering an additional qualifier on non-final local variables that allows a closure to access the variable.

Example

We can use the existing Executor framework to run a closure in the background:

 void sayHello(java.util.concurrent.Executor ex) {
ex.execute((){ System.out.println("hello"); });
}

5. Exception type parameters

To support exception transparency, we add a new type of generic formal type argument to receive a set of thrown types. [This deserves a more formal treatment] What follows is an example of how the proposal would be used to write an exception-transparent version of a method that locks and then unlocks a java.util.concurrent.Lock, before and after a user-provided block of code:

interface Block<throws E> {
void invoke() throws E;
}
public static <throws E extends Exception>
void withLock(Lock lock, Block<E> block) throws E {
try {
lock.lock();
block.invoke();
} finally {
lock.unlock();
}
}

This can be used as follows:

withLock(lock, (){
System.out.println("hello");
});

This uses a newly introduced form of generic type parameter. The type parameters E are declared to be used in throws clauses. If the extends clause is omitted, a type parameter declared with the throws keyword is considered to extend Exception (instead of Object, which is the default for other type parameters). This type parameter accepts multiple exception types. For example, if you invoke it with a block that can throw IOException or NumberFormatException the type parameter E would be IOException|NumberFormatException. In those rare cases that you want to use explicit type parameters with multiple thrown types, the keyword throws is required in the invocation, like this:

Locks.<throws IOException|NumberFormatException>withLock(lock) {
whatever();
}

You can think of this kind of type parameter accepting disjunction, "or" types. That is to allow it to match the exception signature of a closure that throws any number of different checked exceptions. If the block throws no exceptions, the type parameter would be the type null.

Type parameters declared this way can be used only in a throws clause or as a type argument to a generic method, interface, or class whose type argument was declared this way too.

6. Non-local control flow

One purpose for closures is to allow a programmer to refactor common code into a shared utility, with the difference between the use sites being abstracted into a closure. The code to be abstracted sometimes contains a break, continue, or return statement. This need not be an obstacle to the transformation. A break or continue statement appearing within a closure may transfer to any matching enclosing statement provided the target of the transfer is in the same innermost ClassBody.

A return statement always returns from the nearest enclosing method or constructor.

If a break statement is executed that would transfer control out of a statement that is no longer executing, or is executing in another thread, the VM throws a new unchecked exception, UnmatchedNonlocalTransfer. Similarly, an UnmatchedNonlocalTransfer is thrown when a continue statement attempts to complete a loop iteration that is not executing in the current thread. Finally, an UnmatchedNonlocalTransfer is thrown when a return statement is executed if the method invocation to which the return statement would transfer control is not on the stack of the current thread.

7. Definite assignment

The body of a closure may not assign to a final variable declared outside the closure.

A closure expression does not affect the DA/DU status of any variables it names.

8. The type Unreachable

We add the non-instantiable type java.lang.Unreachable, corresponding to the standard type-theoretic bottom. Values of this type appear only at points in the program that are formally unreachable. This is necessary to allow transparency for closures that do not return normally. Unreachable is a subtype of every type (even primitive types). No other type is a subtype of Unreachable. It is a compile-time error to convert null to Unreachable. It is an error to cast to the type Unreachable.

Example

The following illustrates a closure being assigned to a variable of the correct type.

interface NullaryFunction<T, throws E> {
T invoke() throws E;
}
NullaryFunction<Unreachable,null> thrower = () { throw new AssertionError(); };

8. Unreachable statements

An expression statement in which the expression is of type Unreachable cannot complete normally.

10. Abbreviated invocation syntax.

A new invocation statement syntax is added to make closures usable for control abstraction:

AbbreviatedInvocationStatement
Primary ( ExpressionList ) Statement
AbbreviatedInvocationStatement
Primary ( Formals ) Statement
AbbreviatedInvocationStatement
Primary ( Formals : ExpressionList ) Statement

This syntax is a shorthand for the following statement:

Primary ( ExpressionList, ( Formals ) { Statement } );

This syntax makes some kinds of closure-receiving APIs more convenient to use to compose statements.

Note: There is some question of the correct order in the rewriting. On the one hand, the closure seems most natural in the last position when not using the abbreviated syntax. On the other hand, that doesn't work well with varargs methods. Which is best remains an open issue.

We could use the shorthand to write our previous example this way

withLock(lock) {
System.out.println("hello");
}

This is not an expression form for a very good reason: it looks like a statement, and we expect it to be used most commonly as a statement for the purpose of writing APIs that abstract patterns of control. If it were an expression form, an invocation like this would require a trailing semicolon after the close curly brace of a controlled block. Forgetting the semicolon would probably be a common source of error. The convenience of this abbreviated syntax is evident for both synchronous (e.g. withLock) and asynchronous (e.g. Executor.execute) use cases. Another example of its use would be a an API that closes a java.io.Closeable after a user-supplied block of code:

class OneArgBlock<T, throws E> {
    void invoke(T t) throws E;
}
<T extends java.io.Closeable, throws E>
void closeAtEnd(OneArgBlock<? super T,E> block, T t) throws E {
    try {
        block.invoke();
    } finally {
        try { t.close(); } catch (IOException ex) {}
    }
}

We could use the shorthand with this API to close a number of streams at the end of a block of code:

closeAtEnd(FileReader in : makeReader()) closeAtEnd(FileWriter out : makeWriter()) {
// code using in and out
}

Acknowledgments

The authors would like to thank the following people whose discussions and insight have helped us craft, refine, and improve this proposal:

Lars Bak, Cedric Beust, Joshua Bloch, Martin Buchholz, Danny Coward, Erik Ernst, Christian Plesner Hansen, Kohsuke Kawaguchi, Doug Lea, "crazy" Bob Lee, Martin Odersky, Tim Peierls, John Rose, Ken Russell, Mads Torgersen, Jan Vitek, and Dave Yost.

Thursday, September 07, 2006

Closures for Java (version 0.1)

This post describes a draft proposal for adding support for closures to the Java programming language for the Dolphin (JDK 7) release. It was carefully designed to interoperate with the current idiom of one-method interfaces. The latest version of the proposal and a prototype can be found at http://www.javac.info/.

This revision of the proposal drops local functions and the special nonlocal return syntax, adds support for generic type parameters that can be used to represent thrown exceptions, and has only one form of closure literal. It also adds an abbreviated invocation statement that supports very convenient control abstraction. Soon I'll present a parallel proposal based on interfaces (that is, without function types) and contrast the two proposals.

Gilad Bracha, Neal Gafter, James Gosling, Peter von der Ahé

Modern programming languages provide a mixture of primitives for composing programs. C#, Javascript, Ruby, Scala, and Smalltalk (to name just a few) have direct language support for function types and inline function-valued expression, called closures. A proposal for closures is working its way through the C++ standards committees as well. Function types provide a natural way to express some kinds of abstraction that are currently quite awkward to express in Java. For programming in the small, closures allow one to abstract an algorithm over a piece of code; that is, they allow one to more easily extract the common parts of two almost-identical pieces of code. For programming in the large, closures support APIs that express an algorithm abstracted over some computational aspect of the algorithm. We propose to add function types and closures to Java. We anticipate that the additional expressiveness of the language will simplify the use of existing APIs and enable new kinds of APIs that are currently too awkward to express using the best current idiom: interfaces and anonymous classes.

1. Function Types

We introduce a new syntactic form:

Type
Type ( TypeListopt ) FunctionThrowsopt
FunctionThrows
throws ThrowsTypeList
ThrowsTypeList
Type
ThrowsTypeList
ThrowsTypeList VBAR Type
VBAR
|

These syntactic forms designate function types. A function type is a kind of reference type. A function type consists of a return type, a list of argument types, and a set of thrown exception types.

Note: the existing syntax for the throws clause in a method declaration uses a comma to separate elements of the ThrowsTypeList. For backward compatibility we continue to allow commas to separate these elements in methods and constructors, but in function types we require the use of the '|' (vertical-bar) character as a separator to resolve a true ambiguity that would arise when a function type is used in a type list.

2. Namespaces and name lookup

The Java programming language currently maintains a strict separation between expression names and method names. These two separate namespaces allow a programmer to use the same name for a variable and for methods. Variables of closure type necessarily blur the distinction between these two namespaces, because they can be used in an expression and they can be invoked.

When searching a scope for a method name, if no methods exist with the given name then variables of the given name that are of function type are considered candidates. If more than one exists (for example, function-typed variable declarations are inherited from separate supertypes), the reference is considered ambiguous.

We allow a function to be invoked from an arbitrary (for example, parenthesized) expression:

Primary
Primary Arguments

3. Closure Literals

We also introduce a syntactic form for constructing a value of function type:

Expression3
Closure
Closure
FormalParameters { BlockStatementsopt Expressionopt }

Example

We can write a function that adds two to its argument like this:

    int(int) plus2 = (int x){ x+2 };

4. The type of a closure

The type of a closure is inferred from its form as follows:

The argument types of a closure are the types of the declared arguments.

If the body of the closure ends with an expression, the return type of the closure is the type of that expression; otherwise if the closure can complete normally, the return type is void; otherwise the return type is null.

The set of thrown types of a closure are those checked exception types thrown by the body.

Example

The following illustrates a closure being assigned to a variable with precisely the type of the closure.

void(int) throws InterruptedException sleeper = (int t) { Thread.sleep(t); };

5. Subtyping

A function type T is a subtype of function type U iff all of the following hold:

  • Either
    • The return type of T is either the same as the return type of U or
    • Both return types are reference types and the return type of T is a subtype of the return type of U, or
    • the return type of U is void.
  • T and U have the same number of declared arguments.
  • For each corresponding argument position in the argument list of T and U, either both argument types are the same or both are reference types and the type of the argument to U is a subtype of the corresponding argument to T.
  • Every exception type in the throws of T is a subtype of some exception type in the throws of U.

6. Exception checking

The invocation of a function throws every checked exception type in the function's type.

7. Reflection

A function type inherits all the non-private methods from Object. The following methods are added to java.lang.Class to support function types:

  public final class java.lang.Class ... {
public boolean isFunction();
public java.lang.reflect.FunctionType functionType();
public Object invokeFunction(Object function, Object ... args)
throws IllegalArgumentException | InvocationTargetException;
}
public interface java.lang.reflect.FunctionType {
public Type returnType();
public Type[] argumentTypes();
public Type[] throwsTypes();
}

Note: unlike java.lang.reflect.Method.invoke, Class.invokeFunction cannot throw IllegalAccessException, because there is no access control to enforce; the function value designates a closure, which does not have access modifiers. Access to function values is controlled at compile-time by their scope, and at runtime by controlling the function value.

8. The type of null

We add support for null and the type of null in function types. We introduce a meaning for the keyword null as a type name; it designates the type of the expression null. A class literal for the type of null is null.class. These are necessary to allow reflection, type inference, and closure literals to work for functions that do not return normally. We also add the non-instantiable class java.lang.Null as a placeholder, and its static member field TYPE as a synonym for null.class.

9. Referencing names from the enclosing scope

Names that are in scope where a closure is defined may be referenced within the closure. We do not propose a rule that requires referenced variables be final, as is currently required for anonymous class instances.

Note: a local variable that is referenced within a closure is in scope in the body of the closure. Consequently the lifetime of objects referenced by such variables may extend beyond the time that the block containing the local declaration completes.

10. Non-local control flow

One purpose for closures is to allow a programmer to refactor common code into a shared utility, with the difference between the use sites being abstracted into a closure. The code to be abstracted sometimes contains a break, continue, or return statement. This need not be an obstacle to the transformation. A break or continue statement appearing within a closure may transfer to any matching enclosing statement provided the target of the transfer is in the same innermost ClassBody.

A return statement always returns from the nearest enclosing method or constructor.

If a break statement is executed that would transfer control out of a statement that is no longer executing, or is executing in another thread, the VM throws a new unchecked exception, UnmatchedNonlocalTransfer. (I suspect we can come up with a better name). Similarly, an UnmatchedNonlocalTransfer is thrown when a continue statement attempts to complete a loop iteration that is not executing in the current thread. Finally, an UnmatchedNonlocalTransfer is thrown when a NamedReturnStatement attempts to return from a function or method invocation that is not pending in the current thread. Likewise, an UnmatchedNonlocalTransfer is thrown when a return statement is executed if the method invocation to which the return statement would transfer control is not on the stack of the current thread.

11. Closure conversion

We propose the following closure conversion, to be applied only in those contexts where boxing currently occurs:

There is a closure conversion from every closure of type T to every interface type that has a single method with signature U such that T is a subtype of the function type corresponding to U.

It is a compile-time error if the closure being converted contains a break, continue or return statement whose execution could result in a transfer of control outside the closure. It is a compile-time error if the closure being converted refers to a non-final local variable or parameter declared in an enclosing scope.

We will want to generalize this rule slightly to allow the conversion when boxing or unboxing of the return type is required, e.g. to allow assigning a closure that returns int to an interface whose method returns Integer or vice versa.

Note: the closure conversion is applied only to closures (i.e. function literals), not to arbitrary expressions of function type. This enables javac to allocate only one object, rather than both a closure and an anonymous class instance. The conversion avoids hidden overhead at runtime. As a practical matter, javac will automatically generate code equivalent to our original client, creating an anonymous class instance in which the body of the lone method corresponds to the body of the closure.

Closure conversion applies only to closures that obey the same restrictions that apply to local and anonymous classes. The motivation for this is to help catch inadvertent use of non-local control flow in situations where it would be inappropriate. Examples would be when the closure is passed to another thread, or stored in a data structure to be invoked at a later time when the method invocation in which the closure originated no longer exists.

Examples

We can use the existing Executor framework to run a closure in the background:

  void sayHello(java.util.concurrent.Executor ex) {
ex.execute((){ System.out.println("hello"); });
}

Here is a definition of an API that locks and then unlocks a java.util.concurrent.Lock, before and after a user-provided block of code:

public static <E extends Exception>
void withLock(Lock lock, void() throws E block) throws E {
try {
lock.lock();
block();
} finally {
lock.unlock();
}
}

This can be used as follows:

withLock(lock, (){
System.out.println("hello");
});

You might find this syntax a bit tedious. Not to worry - see below.

12. Abbreviated invocation syntax.

A new invocation statement syntax is added to make closures usable for control abstraction:

AbbreviatedInvocationStatement
Primary ( ExpressionList ) Block
AbbreviatedInvocationStatement
Primary ( Formals ) Block
AbbreviatedInvocationStatement
Primary ( Formals : ExpressionList ) Block

This syntax is a shorthand for the following statement:

Primary ( ExpressionList, ( Formals ) Block );

Note that this syntax is only a convenience to make some kinds of closure-receiving APIs more convenient to use to compose statements.

We could use the shorthand to write our previous example this way

withLock(lock) {
System.out.println("hello");
}

This is not an expression form for a very good reason: it looks like a statement, and we expect it to be used most commonly as a statement for the purpose of writing APIs that abstract patterns of control. If it were an expression form, an invocation like this would require a trailing semicolon after the close curly brace. Forgetting the semicolon would probably be a common source of error. The convenience of this abbreviated syntax is evident for both synchronous (e.g. withLock) and asynchronous (e.g. Executor.execute) use cases.

13. Definite assignment

The body of a closure may not assign to a final variable declared outside the closure.

A closure expression does not affect the DA/DU status of any variables it names.

14. Exception type parameters

To support exception transparency, we add a new type of generic formal type argument to receive a set of thrown types. [This deserves a more formal treatment] What follows is an example of how the proposal would be used to write an exception-transparent version of the withLock() example we've been using. The syntax is quite tentative at this point (ideas are welcome).

public static <throws E extends Exception>
void withLock(Lock lock, void() throws E block) throws E {
try {
lock.lock();
block();
} finally {
lock.unlock();
}
}

This method uses a newly introduced form of generic type parameter. The type parameter E is declared to be used in throws clauses. If the extends clause is omitted, a type parameter declared with the throws keyword is considered to extend Exception (instead of Object, which is the default for other type parameters). This type parameter accepts multiple exception types. For example, if you invoke it with a block that can throw IOException or NumberFormatException the type parameter E would be IOException|NumberFormatException. In those rare cases that you want to use explicit type parameters, the keyword throws is required in the invocation, like this:

Locks.<throws IOException|NumberFormatException>withLock(lock) {
whatever();
}

You can think of this kind of type parameter accepting disjunction, "or" types. That is to allow it to match the exception signature of a closure that throws any number of different checked exceptions. If the block throws no exceptions, the type parameter would be the type null.

Type parameters declared this way can be used only in a throws clause or as a type argument to a generic method, interface, or class whose type argument was declared this way too.

15. Implementing a Function Type

A programmer may write a class that implements a function type. The implementation is written as a public method with the keyword do in place of the method's identifier. For example

class MyBlock implements void() {
public void do() {
// implementation of the function
}
}

In this way programmers can, for example, create implementations of function types that are serializable.

Acknowledgments

The authors would like to thank the following people whose discussions and insight have helped us craft, refine, and improve this proposal:

Lars Bak, Cedric Beust, Joshua Bloch, Martin Buchholz, Danny Coward, Erik Ernst, Christian Plesner Hansen, Doug Lea, "crazy" Bob Lee, Martin Odersky, Tim Peierls, John Rose, Ken Russell, Mads Torgersen, Jan Vitek, and Dave Yost.