Wednesday, August 14, 2019

Accidentally Quadratic Constant Folding


I just fixed a bug in the implementation of constant folding in the C# and VB.Net compilers. They used to require O(n^2) space. Now they require O(n) space.

The naive approach to implementing string concatenation for an expression such as "S1" + "S2" + "S3" is to recursively visit the left-hand and right-hand operands and compute their constant string values, and then concatenate these strings to produce the constant string for the concatenation. It seems obviously correct and the only reasonable way to do it. But it has a serious problem.

The concatenation operation is left associative, so for a long sequence of n concatenations, each of size k, we would produce intermediate results of size k, 2k, 3k, and so on up to nk. The total amount of space needed to represent this collection of strings is O(k n^2). For moderately-sized values of n, this would cause the compilers to run out of memory and crash. This is not merely a theoretical problem, as we had been getting customer reports of this problem about quarterly for the past 7-8 years.

Fixing the problem was fairly straightforward, using a technique I learned from the Cedar/Mesa compiler in the early '80s. Rather than representing string constants in the compilers using strings, they are now represented using a data structure called Ropes. Concatenating two ropes requires only a small constant amount of memory.

Any compiler for a language that supports constant folding of string concatenation will run into this problem unless there is a similarly low-space representation of two concatenated strings. This is a technique that should be in the bag of tricks of any compiler writer.