### Sun, 29 Dec 2013

#### Haskell : The Problem with Integer.

Haskellers may or not be aware that there are two libraries in the GHC sources
for implementing the `Integer` data type.

The first,
`integer-gmp`
links to the
GNU Multiple Precision Arithmetic Library
which is licensed under the GNU LGPL.
On most systems, libgmp is dynamically linked and all is fine.
However, if you want to create statically linked binaries from Haskell source code
you end up with your executable statically linking libgmp which means your binary
needs to be under an LGPL compatible license if you want to release it.
This is especially a problem on iOS which doesn't allow dynamic linking anyway.

The second `Integer` implementation is
`integer-simple`
which is implemented purely in Haskell (using a number of GHC extension) and is
BSD licensed.

So why doesn't everyone just the the BSD licensed one?
Well, `integer-simple` has a reputation for being slow.
Even more intriguingly, I seem to remember Duncan Coutts telling me a couple of
years ago that `integer-simple` was a little faster than `integer-gmp`
when the `Integer` was small enough to fit in a single machine `Word`,
but much slower when that was not the case.
At the time I heard this, I decided to look into it at some time.
That time has come.

A couple of weeks ago I put together some scripts and code to allow me to compile
the two `Integer` implementations into a single program and benchmark them
against each other.
My initial results looked like this:

That confirmed the slowness for multiplication and division if nothing else.

Taking a look at the code to `integer-simple` I found that it was storing
`Word#`s (unboxed machine sized words) in a Haskell list.
As convenient as lists are they are not an optimal data structure for a something
like the `Integer` library.

I have already started work on replacement for both versions of the current
`Integer` library with the following properties:

- BSD licensed.
- Implemented in Haskell (with GHC extensions) so there are no issues with linking to external C libraries.
- Fast. I'm aiming to outperform both
`integer-simple`and`integer-gmp`on as many benchmarks as possible. - Few dependencies so it can easily replace the existing versions.
Currently my code only depends on
`ghc-prim`and`primitive`.

So far the results are looking encouraging.
For `Integer` values smaller than a machine word, addition with my prototype
code is faster than both existing libraries and for adding large integers its
currently half the speed of `integer-gmp`, but I have an idea which will
likely make the new library match the speed of `integer-gmp`.

Posted at: 10:08 | Category: CodeHacking/Haskell | Permalink