Thu, 14 Sep 2006

N Squared Problems.

Important note to self : When working on an algorithm which displays N squared (or cubed or worse) behavior in Big Oh Notation fiddling with the algorithm is useless if N is large. The only ways to improve performance is to reduce the complexity to O(NlogN) or O(N) or alternatively, find a way to drastically reduce the size of the N.

This problem came up while working on my current main non-day-job application which will be released under the GPL when its ready. Even on reasonable input data sets, the value of N was well into the tens of thousands. If any significant processing needed to be done, the total time blows out really badly. This was simply not acceptable.

With many problems, processing can be traded off against memory usage. Unfortunately, for this problem, memory usage was also O(N2) and I didn't consider it acceptable for my application to chew up hundreds of megabytes of memory for reasonably small data sets.

What did finally work for this problem was a divide and conquer trick that allowed me to reduce N by two orders of magnitude and do more work on the smaller chunks of data instead.

I'll elaborate on this issue and its solution in the paper I'm writing for OSDC 2006. I'll also be submitting a paper proposal for Linux.conf.au 2007.

Posted at: 16:43 | Category: CodeHacking | Permalink