Wednesday, August 14, 2019

Accidentally Quadratic Constant Folding

I just fixed a bug in the implementation of constant folding in the C# and VB.Net compilers. They used to require O(n^2) space. Now they require O(n) space.

The naive approach to implementing string concatenation for an expression such as "S1" + "S2" + "S3" is to recursively visit the left-hand and right-hand operands and compute their constant string values, and then concatenate these strings to produce the constant string for the concatenation. It seems obviously correct and the only reasonable way to do it. But it has a serious problem.

The concatenation operation is left associative, so for a long sequence of n concatenations, each of size k, we would produce intermediate results of size k, 2k, 3k, and so on up to nk. The total amount of space needed to represent this collection of strings is O(k n^2). For moderately-sized values of n, this would cause the compilers to run out of memory and crash. This is not merely a theoretical problem, as we had been getting customer reports of this problem about quarterly for the past 7-8 years.

Fixing the problem was fairly straightforward, using a technique I learned from the Cedar/Mesa compiler in the early '80s. Rather than representing string constants in the compilers using strings, they are now represented using a data structure called Ropes. Concatenating two ropes requires only a small constant amount of memory.

Any compiler for a language that supports constant folding of string concatenation will run into this problem unless there is a similarly low-space representation of two concatenated strings. This is a technique that should be in the bag of tricks of any compiler writer.

Monday, June 05, 2017

Making new language features #{ Stand Out }

There is a phenomenon I've noticed in the way community members react to proposed new language features. The new feature seems odd, and different, and contrary to the way the language was before. So they want the syntax of the feature to be odd, and different, and clearly distinguish the new feature from all of the old features of the language.

Succumbing to that request is a bad idea. When you do, (some) people are at first happy that they can so easily distinguish this new feature from the rest of the language. But users soon grow tired of the extra syntax surrounding the new feature, and request as yet another new feature that they be able to do the same thing, but without all the boilerplate.

It is better to design the feature so that it fits well with the existing features of the language, even if it might at first seem jarring that things have changed. Users will quickly get over the newness of the changes and learn to understand the language as a whole as it is (after the change).