Wednesday, November 22, 2006

Linear Time Sort Puzzler

Interviewing for an engineering position at Google can be grueling and humbling. You get less than an hour with each interviewer, and may get one technical problem after another to solve. If you have performance anxiety you may have a hard time focusing on the problem. The best approach, if you can do it, is to think of the interview as a technically challenging but friendly brainstorming session. If you get hired, that's what your life will be like.

My friend, Jim Davis, told me about someone he interviewed earlier this week. The candidate thought the interview wasn't going well and hoped to impress Jim by sharing an idea he had: a new algorithm for sorting in linear time. It isn't a sort based exclusively on comparisons; it can be proven that any such sort must perform O(n log n) comparisons. It isn't a bucket sort, which is a standard algorithm for this problem (assuming certain characteristics of the input set). But the candidate's algorithm helped Jim understand the candidate's capabilities.

The algorithm works like this: you start by making a linear scan through the n numbers to be sorted and note the largest and smallest of the elements. Using that, you compute a linear function f(x) that will map the smallest input element to zero and the largest to n. Then, for each input element xi, you fork a task that waits f(xi) milliseconds and then outputs xi. After n milliseconds the output contains the input elements in sorted order.

There may be problems when two tasks attempt to output their data at nearly the same time, but let's ignore that for a moment and assume an "ideal" system. If you have a truly concurrent computer, this may run in O(n) time using O(n) processors, thereby consuming a total of O(n2) processing time. There are far better concurrent sorting algorithms around. Much more interesting is the idea of running this algorithm on a sequential computer with a non-preemptive scheduler. The non-preemptive scheduler avoids the problem of two tasks producing data at the same time. The scheduler can also ensure that the tasks run in their proper order, even if the running time of some tasks causes others to start slightly "too late".

Jim explained the problem with the algorithm, concluding "so you've reduced the problem of sorting in linear time to a different linear time sorting algorithm." After Jim explained this remark, the candidate replied "well, the idea isn't really very well thought out." Here is the puzzle: what was Jim talking about?

16 comments:

Matt said...

In the non-preemptive scheduler case the scheduler needs to sort the sleeping threads based on the length of time they are sleeping for in order to wake them up 'in order'.

Anonymous said...

Aside from the fact that the candidate's algorithm is essentially bucksort (with temporal buckets) (s)he missed a trick: why not compute a non-linear function which maps the smallest element to 0, and the largest to logn (or loglogn or whatever), all other details the same. This gets you a sort in O(logn) time and O(nlogn) work on a truly parallel machine ... hurrah!

Matthias said...

Did you already drop the keyword: "scheduler"? If n tasks ask to be rescheduled in f(x), the scheduler has to determine which one to schedule next, and the one after that, ... and thus sort the tasks which it can't do faster than O(n).

Anonymous said...

Computing f(x)?

Manu Thambi said...

I guess he converted the sorting problem into a scheduling problem. The scheduler has to sort the n tasks to schedule them in the proper order, which takes O(nlogn) time.

Anonymous said...

It's in the words: "sorting in linear time" is different from "linear time sorting", at least in this case.

harryhoppel said...

Jim was talking about closures :)

b0b0b0b said...

Computing the perfect hash function.

Sweeeeeeet, I finally answered a puzzler.

JJ said...

The difference is that the new algorithm is O(n) where n is the difference between the smallest and largest values in the set, not the number of items in the set as sorting is typically measured by.

Neal Gafter said...

CORRECT answers from: matt, matthias, manu. Bonus points for harryhoppel. All the other answers are incorrect.

Matthias said...

Can I get a job at Google now?

Neal Gafter said...

Winners: send me your resumes!

Anonymous said...

Take n pieces of spaghetti in your hand, hit them on the table and pick the longest. Now, isn't that sorting in O(1)?

Tim Davis said...

Reply to O(1) spaghetti sorting ... that only works if you can hold n spaghetti strands in your hand at one time. My hand (yours too, I assume...) can only hold a constant number of spaghetti strands. :-)

So there is no such algorithm.

Tim Davis, Univ of Florida

edd said...

You could have multiple number of hands performing the same thing, the same time. That is Linear.

edd said...

I was wondering if you could sort in a different domain. All sorting algorithms, at least the ones i know, sort in time domain.