Wednesday, August 23, 2006

Tennent's Correspondence Principle and Returning From a Closure

I just had a very productive meeting with coauthors of the closure proposal. I think we made a lot of progress addressing most of the specific concerns people have with the proposal (with the exception of subjective comments like "I don't think you should do this kind of thing to Java").

One issue in particular has been nagging us, and in spite of Gilad's insistence that we comply with Tennent's Correspondence and Abstraction Principles (described in detail in the now out-of-print book Principles of Programming Languages by R.D.Tennent), Gilad reluctantly admitted before we published the first draft that there doesn't seem to be a way of doing that for returning from a closure without introducing inconsistencies with the rest of the language. In the end we discovered that there is indeed a way to do it, and it seems to be rather cleaner than any of the alternatives.

Setting aside the technical details of Tennent's approach, the principle applies directly to the closure construct. A closure can be used to abstract over an expression or a statement. The principle dictates that an expression or statement, when wrapped in a closure and then immediately invoked, ought to have the same meaning as it did before being wrapped in a closure. Any change in semantics when wrapping code in a closure is likely a flaw in the language.

To put this more concretely, an expression


should have the same meaning as the expression that wraps it in a closure and then evaluates it:


And a block statement

{ stmt; ... }

ought to have the same meaning as creating and invoking a closure that wraps the statement:

((){ stmt; ... })()

Tennent's principles are very powerful because violations of them tend to show up in the language as flaws, irregularities, unnecessary restrictions, unexpected interactions or complications, and so on. While exploring the language design space for closures, every time we were tempted to violate them we found that there were very real and unpleasant consequences; they are very powerful tools for detecting bad smells in a programming language design. The principles apply no less when extending an existing programming language, because what you are doing in that case is designing the next version of a language. We discovered that when a bad smell is detected by these principles, it is often straightforward to illustrate specific problems. The problems we identified also arise in many supposedly "simpler" proposals that attempt to provide merely a shorthand notation for construcing anonymous class instances. I'll show two specific issues that we identified based on bad smells, examples of the real problems caused, and how we solved them.

The meaning of this

Before we look at the question of how to return from a closure, and to help understand how the principles are applied, let's look at the application of Tennent's principles to the question of the meaning of this within a closure. One possibility would be that it designates the closure object itself. That might be useful if you want to define a recursive closure, as you now have a convenient way of invoking yourself. An alternative is that it designates the anonymous class instance that is created if the closure conversion is applied. That seems natural if you like to think of the closure expression as a shorthand for creating an anonymous class.

Unfortunately, both of these alternatives get you into trouble, and we can show specific ways that the language no longer fits together when you take these approaches. The former approach, for example, doesn't allow you to analyze the types of expressions appearing within the closure, because the closure type depends on the type of the returned expression, but the returned expression might use "this", and so may depend on the type of the closure. The latter approach has a similar chicken-and-egg problem, because the type of interface you're creating may depend on the types within the closure, but the types within the closure depend on the type of the interface you're creating.

To see the problem more simply, one issue is that the expression this no longer designates the same thing when wrapped by a closure. In order to abstract (i.e., wrap a closure around) a piece of code that uses this or uses any names that are inherited from Object, you would have to systematically rewrite those names to qualify them in some way to ensure that they refer to the appropriate object. In other words, you'd have to change every this to EnclosingClassName.this and every toString() to EnclosingClassName.this.toString(). To be most useful as an abstraction facility, programmers should be free to wrap a piece of code in a closure without worrying that its semantics will change subtly. To satisfy Tennent's Correspondence Principle, a closure should introduce into the scope of the closure no names that are not explicitly declared by the user - that is, only the closure's parameters.

Returning from a closure

The principle also applies to the problem of returning from a closure. The current draft proposal suggests that we use the existing return syntax for returning from a closure, but that violates the principles; you can detect the bad smell by observing that this statement

{ if (...) return; }

doesn't mean the same thing as this:

(){ if (...) return; } ()

The former returns to the enclosing method (not shown), while the latter returns from the closure, effectively doing nothing.

One idea, desribed in the further ideas section of the draft spec, is to introduce a new syntax for returning from a closure, and reserve the return statement for returning from a method, function, or constructor. We toyed with the statement syntax

^ expression;

Setting aside aesthetics, this seems to satisfy the requirements, because this isn't legal syntax outside a closure. Existing code does not have statements of this form, so wrapping such code could not change its meaning. However, on further reflection we realized that this syntax and all of the other new kinds of return statements that we considered have the same flaw as using return: while existing code doesn't use this new statement form, once we introduce it into the language we should expect people to start using it in new code. And once they do, any code that includes this statement cannot be wrapped in a closure without changing its meaning. Once we identified the bad smell this way, it wasn't hard to come up with realistic examples where it gets in the way of using the language.

Having eliminated any form of return statement, what are you left with? Referring back to Tennent's principles is instructive: you should be able to replace an expression expr with

(){ expr } ()

This is the proposed syntax for returning a value from a closure. It is simpler than the existing spec because no rules are needed to infer the return type; it is the type of the expression. Of course, we will want to allow a list of statements instead of an expression to allow the construct to close over statements (in which case the closure returns void), and allow a list of statements to precede a trailing expression (in which case the final expression is the result of the closure). An expression without a trailing semicolon isn't a legal statement form, so there is no confusion with any existing constructs. This nicely satisfies Tennent's principles. My only regret is that there is no single way to both designate the returned value and return from a closure before its end, which would be slightly more consistent with Java's existing control constructs.

And more...

We made progress on a number of issues:

  • We know how to address concerns about how the closure conversion will interact with APIs like concurrency and asynchronous notification systems where the closure isn't just called by the current thread; in those APIs you don't want to allow nonlocal control-flow or capturing nonfinal variables.
  • We think we can add support for type inference of exception types to allow writing APIs in which the set of exceptions from a block (closure) that is passed in to the API can propogate out of the API while being strongly type-checked.
  • We think it should be possible to write a class that "implements" a function type, as suggested by "crazy" Bob Lee, so that you can define a custom toString() to make debugging easier or make the implementation serializable.
  • We started looking at a number of specific use cases that have been suggested, such as a library that allows you to execute a block of code while holding a java.util.concurrent.locks.Lock, unlocking it at the end, or a library that closes your streams (or other Closable things) after your block of code is finished.
  • We've also made progress on detailing the set of implementation techniques that might be used to implement the facility.

I'll blog about these later.


Jochen "blackdrag" Theodorou said...

So if I understand you right, your solution for "return" is not to allow it then? I think the same will go for break and continue. But your explanation seems to suggest some kind of inner class for the implementation. At last this is the way we do it for Groovy. But if you do that and if you apply Tennent's Correspondence Principle on code like this:
int i=1; i=2;assert i==2
where you wrap the i=2;
in a closure, then you will have a bigger problem using current inner classes, or you use the compiler to wrap the integer in a reference, that is used for i instead. Of course this means loosing performance and usage of local variables will be significantly slower if a closure does use them, even if the closure is never executed. Maybe Gilad wants to do some magic here? Anyway, why do I say "even if the closure is never executed"? Because people knowing closures might expect code like this to work:

int i=0;
Closure c = {i++}
assert i=0
assert i==2
assert i==3
assert i==4

So to get this working you need reference semantics instead of the normal reference-copy semantic. If you don't want code like that to work, then you are violating Tennent's Correspondence Principle, or not?

Then "this". It's funny. In the Groovy project we decided the next release will map "this" to the class instance the closure was created in. And the reasons are of course the ones you talked about here.

Next point you should think about... In Groovy we decided not to care about that, but maybe your thoughts about this in Java are different. I am talking about the call syntax of closures. Making them look like a method call means to hide methods just as you do hide fields with local variables.

int c(){return 1}
int d(){return 1}
void method(){
(int) c = {2}
() d = {2}

Are these calls all valid? Or to say it different.. can closures act as if you overload/overwrite a method? With my experience in Groovy I can tell that people don't have a problem with the closure hiding the method. In Groovy the call c() is invalid, if the closures requires at last 1 parameter. So there is no overloading. Well, just thoughts for the future design ;)

Ah yes, one more thing. An expression at the end of a closure is used for return? Well, why not, we have the same for Groovy. But why only expressions? Why not statements? They could return null if anything else fails. But of course it wouldn't be to difficult to give the loops a return value too. And then of course... why only in Closures? Do it for methods too! It is very nice not to have to write "return". And.. well I could now make the proposal to remove ";" as well, but maybe this would go too far for Java - Java is not the same as Groovy ;)

Neal Gafter said...

To answer blackdrag's questions: right, the "return" statement is not a local return in a closure, rather it returns from the nearest enclosing method or function. "break" and "continue" mean the same as they always did, even if the target of the break or continue is outside the closure. Don't worry about the compiler/VM magic to make this happen; I'll explain how it all fits together later.

Yes, your example with a closure that increments "i" would work as you suggest, though the type of the closure is void(), not Closure.

You can't assign to a local function's name. Like variables, you can only have one of a given name in scope, so they do not overload.

A closure can, of course, end with a statement. Since it falls off the end of the closure body, it is considered to return void.

Anonymous said...

I'm not sure of the semantics of return in this new proposal.

If I do

map(T(S x) { if(!pred(x)) return; else x; }, listOfS);

does the return statement return break out of the map function returning a partial listOfT?

Also, why not use continuations for non-local transfer of control, or calling a method with null return type?

Finally, how about supporting currying, so that on any closure of type

R(A,B,C) closure = ...

one may call closure.curry(A x) and obtain a closure of type R(B,C), and so on.

The Java compiler could treat curry as a magic function defined on every closure that reduces the arity by 1 and automatically calculates the new return type statically.

Anonymous said...

Please don't take the following personal, but...

It seems to me that this is all about saving some keystrokes writing interfaces and implementations of them... the syntax is very strange and looks quite LISPesque to me.

((){ stmt; ... })()
int(int) plus2b = (int x) {return x+2 ; } ;

Oookey, that's what Java will look like in the future? Are you serious??
(I don't care if the above plus2b will more likely look like ( x+2; ) in the furture - both syntax versions are awkward!)

So if you are about to implement C style function pointers in Java why not also implement operator overloading, too? After all, C# has it! And it's a really nice oportunity to garble up otherwise perfectly readable code. Granted, it's useful only under certain circumstances, e.g. implementing complex numbers or matrix calculations, but it's soooooo nice to have that feature! And Micro$oft has it, too!! Gotta have it!!!
(for the records, that's just irony... no operator overloading in dolphin, please)

What I really dislike about Closures is that they are not OO in any way. Is there some petition I could sign to keep this from happening to Java?

Anonymous said...

Regarding the (yet-undocumented) implementation, I'd like there to be a requirement that the bytecode be always recognizable as a closure. (By counter-example, you can't determine the end of an exception handler in bytecode).) It would also be nice if the static type of the result/expression were known without processing the closure expression itself, and for any closure name to be in the bytecode (though I fully expect the latter, like other local variable names, not to be). With VM support for bytecode processing, users nowadays expect that bytecode processors understand the language that they do. Thanks!

Anonymous said...

Amazing how people think closure syntax is awkward but inner class syntax isn't!

I mean, new Interface() { ... }! Why not "new Foo implements Interface{}" or "new Class implements Interface". The Anonymous inner class syntax omits the name of the implementing class and is awkward as well when people first encounter it.

Closures omit the name of a surrounding Class/Interface/method. What's the big deal?

Howard Lovatt said...

Tennet's principle is a weak argument. If you took it to the extreme we would only have macro processing languages! Or perhaps other horrors like Algol's pass by name!

Stick with the semantics of inner classes, they fit Java like a glove. Shorten the syntax of inner classes instead:

Closures = Poor Man's Inner Class

EB said...

I hear the faint drumbeat of the semantic web approaching and I can't help thinking that Java might end up veering towards logic based programming syntax. Not really a problem, but OO keeps your 'posture' correct by forcing you to keep everything nice and tidy. Also, why are there so many brackets and punctuation marks? Everyone can instantly recognise wordier structures but having to learn the significance of punctuation markers can put people off. In a similar fashion, math books without math symbols are a lot easier to read (funnily enough).

Anonymous said...

"The principle dictates that an expression or statement, when wrapped in a closure and then immediately invoked, ought to have the same meaning as it did before being wrapped in a closure. Any change in semantics when wrapping code in a closure is likely a flaw in the language."

This premise is wrong, precisely because it leads to such a horribly hard to understand result. If a language feature is cripplingly confusing, *that* is likely a flaw in the language. I use closures in C#, and the behavior of return statements has no surprises whatsoever.