Tuesday, January 09, 2007

MethodNamesInPieces

In Smalltalk, the name of a method being invoked is interleaved with the arguments passed to the method. Consequently it is difficult to confuse the order of arguments. In Java, on the other hand, when you invoke a method that accepts three integers it is easy to get the order wrong. The compiler has no way to detect the problem, so APIs must be carefully designed with the artificial constraint that one should avoid "too many" arguments of "compatible" types. In the context of closures, Smalltalk's syntax allows "built-in" statement forms such as if-then-else to be expressed as an ordinary method call. When we were putting together the original version of the closures proposal James Gosling suggested this idea to support do-while and if-else style syntax of user-defined control abstraction methods, something that was mentioned in the further ideas section. We placed this issue on the back burner once we found a nice syntax that works for many of the control-invocation use cases, but a recently submitted comment by Stefan Schulz on my blog reminded me of this issue. His use case is that he'd like to be able to write an API that allows him to refactor this

public String toString() {
    StringBuilder sb = new StringBuilder("[");
    boolean first = true;
    for (String s : someCollection) {
     if (first) {
            first = false;
        } else {
            sb.append(", ");
        }
        sb.append(s);
    }
    return sb.append("]").toString();
}

into this

public String toString() {
    StringBuilder sb = new StringBuilder("[");
    for each(String s : someCollection) {
        sb.append(s);
    } inBetween {
        sb.append(", ");
    }
    return sb.append("]").toString();
}

Presumably, the API method would be defined something like this:

<T> void for each(Iterable<T> it, {T=>void} body) inBetween({=>void} between) {
    boolean first = true;
    for (T t : it) {
     if (first) {
            first = false;
        } else {
            between.invoke();
        }
        body.invoke();
    }
}

A related advantage of the Smalltalk syntax is that operator overloading comes almost for free. If operator overloading is on the table for JDK7, perhaps we can kill two birds with one stone, by making the name before the first argument optional:

static BigDecimal (BigDecimal left) plus (BigDecimal right) {
    return left.add(right);
}
static BigDecimal (BigDecimal left) times (BigDecimal right) {
    return left.multiply(right);
}

This would allow you to write code like this:

static BigDecimal f(BigDecimal x, BigDecimal y, BigDecimal z) {
    return (x) plus ((y) times (z));
}

It's probably a small step from here to allowing arbitrary symbols as operator names and eliding some parens. I don't think anything is required in the VM, as we can encode these method names using some non-identifier character in the VM signature. For example, the above methods could be translated in the VM to methods with the names "each~~inBetween", "~plus", and "~times" (the number of tilde characters is the number of arguments before "parts" of the name in the method signature).

There are difficult syntax issues (for example, the each-inBetween example can also be parsed as two separate statements) and I'm not sure I would recommend any of this, but I wanted to share the idea.

15 comments:

Anonymous said...

The article reminds me the syntax of the MS LINQ:

From {
xxxxxxxx
} Where {
xxxxxxxx
} Select {
xxxxxxxxx
}

Anonymous said...

Or how about this:

public String toString() {
StringBuilder sb = new StringBuilder("[");
int pos = 0;
for (String s : someCollection) {
if (pos++ != 0)
sb.append(", ");
sb.append(s);
}
return sb.append("]").toString();
}

It's short and sweet and demands no language changes.

kristian said...

I actually prefer this:

public String toString() {
  StringBuilder sb = new StringBuilder("[");
  String sep = "";
  for (String s : someCollection) {
    sb.append(sep);
    sb.append(s);
    sep = ", ";
  }
  return sb.append("]").toString();
}

Which avoids the if in the loop.

Chris Quenelle said...

I think the third code block should have body.invoke(t) instead of body.invoke()

Anonymous said...

For information,
there is a RFE 5015163 about adding a method join to java.lang.String and java.lang.StringBuilder.

public String toString() {
return new StringBuilder().append('[').
join(',',someCollection).
append(']').toString();
}

RĂ©mi

Stephen Colebourne said...

Although its not the point of the article, here is the Jakarta Commons Lang solution:
StrBuilder buf = new StrBuilder();
buf.append('[');
buf.appendWithSeparators(someCollection, ", ");
buf.append(']');
Which is pretty simple :-)

On the language change, I definitely think this is worth exploring. You might also want to look at my series of 5 closure blogs starting at http://jroller.com/page/scolebourne?entry=closures_use_cases_looping (each exploring a different use case area). There hasn't been much discussion of use cases (ie. proposed library APIs), and perhaps that would help identify issues with closures and multi-closures.

Anonymous said...

Wow, didn't think this goes into its own blog entry. :)
Of course, there are solutions within the current Java libraries and stuff for the simple example on constructing a comma-separated String. The example was chosen for its simplicity for showing the idea.
Neal, very interesting how you have driven that Smalltalk-borrowed syntax further. Obviously, I was thinking about being able to implement replacements for Java control structures like if-else or do-while using Closures and its short-hand syntax. Omitting the first name at all to open up for operator overloading ... cool.
Going back to the idea's source, in Smalltalk each statement ends having either a full-stop or a semi-colon, that's why it's easy to identify all parts of the statement. In Java it might look foreign to have to end a Closure block by semi-colon (e.g., with(lock){...};), which would serve the same purpose.
Something, which Smalltalk is lacking is the possibility to have a variable number of pieced names. With the regular syntax proposed in Closures one could actually define a Vararg on Closures. Having names in pieces, one could extend this to an optional name for each such argument, so one could actually rebuild statements like a switch on arbitrary objects:
switch(M obj) default({=>} db} case...(M caseObj, {=>} cb)
But maybe this is driving too far from the purpose and goals of Closures in Java.
Cheers.

Laurence said...

Smalltalk's method name and parameter interleaving always struck me as a sort of poor imitation of keyword arguments. It's okay when the prameters have an obvious order, but it falls down when there is no obvious order.

I really wish Java had support for keyword arguments. This would be especially useful in constructors, and even more so if keyword arguments could optionally have default values.

By the way: do the current closure proposals really let you create an Iterable by saying "String s : someCollection"? What's the scope of "s"?

Unknown said...

How about targeting something more like Python:

return ", ".join(s for String s in someCollection)

Anonymous said...

Hi,

Method names in pieces seems like a very nice thing as it allows building language like VERY expressive statements.

As a side note a code structure like:
From {
xxxxxxxx
} Where {
xxxxxxxx
} Select {
xxxxxxxxx
}

can be achieved in Scala by combining closures with infix form invocation of the methods.

Operator overloading is cool! ...I wonder if popular operators cound be overloaded as well (< > etc)

Regards

Anonymous said...

This reminds me of Scala, too. Anyway, I think it's too complicated, so thanks for your own disclaimer. I like the current closures proposal well enough as is. Well, I'd like it better if we could get rid of checked exceptions. I just don't think that will fly with popular opinion. (But maybe I'm wrong. Maybe people would go for it.)

I also like operator overloading since it makes code easier to read. (It can be abused, but I could also name my method things like "abtg()" or whatever. People will eschew bad uses of operator overloading in general.) I like Groovy's way of defining operator overloading, though. Much simpler in my opinion. Anyway, I don't think people will let operator overloading get into Java, either. Too bad, really. (And maybe I'm wrong again.)

Anonymous said...

Please please give us operator overloading. If an API is so hard to get right the first times, at least lets have the means syntax-wise to make up for it. I never understood why String is overloaded though a compiler hack.

Anonymous said...

I'm not that excited about this closure thing because I think it might complicates the Java syntax too much. It looks like the it can create arbitrary-looking Java statement which might confuse a programmer . I hated the macros in C and C++ because it looked so arbitrary that I had no idea until I went through the painful header file definitions.

Anonymous said...

The difference between the here presented MethodNamesInPieces and Smalltalk syntax is that in Smalltalk each such part of a method name (e.g. ifTrue:ifFalse:) expects one argument where MethodNamesInPieces would allow to split a methods signature into semantic blocks. This also differentiates it from keyword arguments.

After some thinking about the Varargs stuff, it is actually not what a programmer needs. Either using keyword arguments or MethodNamesInPieces one would like to be able to define cardinalities on arguments or pieces, providing optional, limited or unlimited definition of typed arguments. Revisiting the switch implementation, one could define something like:
void switch(T obj) case*(T ... cases, C cBlock) default°(C dBlock) {
  for (T[] cases, C cBlock : case) {
    for (T aCase : cases) {
      if (obj.equals(aCase)) {
        cBlock.invoke();
        break; // inner for
      }
    }
  }
  if (default.isDefined()) {
    default.dBlock.invoke();
  }
}
which obviously needs additional syntactic stuff to be added for operating on Pieces. The * and ° are shortcuts for defining [0..*] and [0..1] as cardinality for a Piece, and the default being [1..1]. One could also think of + for [1..*].

Anonymous said...

Just wanted to add up that I wrote down my collected thoughts about multipiece method names at my freshly created blog (free of closures).
Thanks for picking up the topic, Neal.