m3ga blog
http://www.mega-nerd.com/erikd/Blog/CodeHacking/index.atom
Erik de Castro Lopo
http://www.mega-nerd.com/erikd/Blog/CodeHacking/index.atom
erikd@mega-nerd.com
Copyright 2006-2010 Erik de Castro Lopo
PyBlosxom http://pyblosxom.sourceforge.net/ 1.4.3 01/10/2008
2017-04-30T02:22:00Z
What do you mean ExceptT doesn't Compose?
http://www.mega-nerd.com/erikd/Blog/2017/04/30/what_do_you_mean.atom
2017-04-30T02:22:00Z
2017-04-30T02:22:00Z
<p>
Disclaimer: I work at Ambiata (our
<a href="https://github.com/ambiata/">Github</a>
presence) probably the biggest Haskell shop in the southern hemisphere.
Although I mention some of Ambiata's coding practices, in this blog post I am
speaking for myself and not for Ambiata.
However, the way I'm using <b><tt>ExceptT</tt></b> and handling exceptions in
this post is something I learned from my colleagues at Ambiata.
</p>
<p>
At work, I've been spending some time tracking down exceptions in some of our
Haskell code that have been bubbling up to the top level an killing a complex
multi-threaded program.
On Friday I posted a somewhat flippant comment to
<a href="https://plus.google.com/+ErikdeCastroLopo/posts/TbRiXuNucED">Google Plus</a>:
</p>
<blockquote>
Using exceptions for control flow is the root of many evils in software.
</blockquote>
<p>
Lennart Kolmodin who I remember from my very earliest days of using Haskell in
2008 and who I met for the first time at ICFP in Copenhagen in 2011 responded:
</p>
<blockquote>
Yet what to do if you want composable code? Currently I have<br/>
type Rpc a = ExceptT RpcError IO a<br/>
which is terrible
</blockquote>
<p>
But what do we mean by "composable"?
I like the
<a href="https://en.wikipedia.org/wiki/Composability">wikipedia</a>
definition:
</p>
<blockquote>
Composability is a system design principle that deals with the inter-relationships
of components. A highly composable system provides recombinant components that
can be selected and assembled in various combinations to satisfy specific user
requirements.
</blockquote>
<p>
The ensuing discussion, which also included Sean Leather, suggested that these
two experienced Haskellers were not aware that with the help of some combinator
functions, <b><tt>ExceptT</tt></b> composes very nicely and results in more
readable and more reliable code.
</p>
<p>
At Ambiata, our coding guidelines strongly discourage the use of partial
functions.
Since the type signature of a function doesn't include information about the
exceptions it might throw, the use of exceptions is strongly discouraged.
When using library functions that may throw exceptions, we try to catch those
exceptions as close as possible to their source and turn them into
errors that are explicit in the type signatures of the code we write.
Finally, we avoid using <b><tt>String</tt></b> to hold errors.
Instead we construct data types to carry error messages and render functions
to convert them to <b><tt>Text</tt></b>.
</p>
<p>
In order to properly demonstrate the ideas, I've written some demo code and
made it available in
<a href="https://github.com/erikd/exceptT-demo">this GitHub repo</a>.
It compiles and even runs (providing you give it the required number of command
line arguments) and hopefully does a good job demonstrating how the bits fit
together.
</p>
<p>
So lets look at the naive version of a program that doesn't do any exception
handling at all.
</p>
<pre class="code">
import Data.ByteString.Char8 (readFile, writeFile)
import Naive.Cat (Cat, parseCat)
import Naive.Db (Result, processWithDb, renderResult, withDatabaseConnection)
import Naive.Dog (Dog, parseDog)
import Prelude hiding (readFile, writeFile)
import System.Environment (getArgs)
import System.Exit (exitFailure)
main :: IO ()
main = do
args <- getArgs
case args of
[inFile1, infile2, outFile] -> processFiles inFile1 infile2 outFile
_ -> putStrLn "Expected three file names." >> exitFailure
readCatFile :: FilePath -> IO Cat
readCatFile fpath = do
putStrLn "Reading Cat file."
parseCat <$> readFile fpath
readDogFile :: FilePath -> IO Dog
readDogFile fpath = do
putStrLn "Reading Dog file."
parseDog <$> readFile fpath
writeResultFile :: FilePath -> Result -> IO ()
writeResultFile fpath result = do
putStrLn "Writing Result file."
writeFile fpath $ renderResult result
processFiles :: FilePath -> FilePath -> FilePath -> IO ()
processFiles infile1 infile2 outfile = do
cat <- readCatFile infile1
dog <- readDogFile infile2
result <- withDatabaseConnection $ \ db ->
processWithDb db cat dog
writeResultFile outfile result
</pre>
<p>
Once built as per the instructions in the repo, it can be run with:
</p>
<pre class="code">
dist/build/improved/improved Naive/Cat.hs Naive/Dog.hs /dev/null
Reading Cat file 'Naive/Cat.hs'
Reading Dog file 'Naive/Dog.hs'.
Writing Result file '/dev/null'.
</pre>
<p>
The above code is pretty naive and there is zero indication of what can and
cannot fail or how it can fail.
Here's a list of some of the obvious failures that may result in an exception
being thrown:
</p>
<ul>
<li>Either of the two <b><tt>readFile</tt></b> calls.</li>
<li>The <b><tt>writeFile</tt></b> call.</li>
<li>The parsing functions <b><tt>parseCat</tt></b> and
<b><tt>parseDog</tt></b>.</li>
<li>Opening the database connection.</li>
<li>The database connection could terminate during the processing stage.</li>
</ul>
<p>
So lets see how the use of the standard <b><tt>Either</tt></b> type,
<b><tt>ExceptT</tt></b> from the <b><tt>transformers</tt></b> package and
combinators from Gabriel Gonzales' <b><tt>errors</tt></b> package can improve
things.
</p>
<p>
Firstly the types of <b><tt>parseCat</tt></b> and <b><tt>parseDog</tt></b> were
ridiculous.
Parsers can fail with parse errors, so these should both return an
<b><tt>Either</tt></b> type.
Just about everything else should be in the <b><tt>ExceptT e IO</tt></b>
monad.
Lets see what that looks like:
</p>
<pre class="code">
{-# LANGUAGE OverloadedStrings #-}
import Control.Exception (SomeException)
import Control.Monad.IO.Class (liftIO)
import Control.Error (ExceptT, fmapL, fmapLT, handleExceptT
, hoistEither, runExceptT)
import Data.ByteString.Char8 (readFile, writeFile)
import Data.Monoid ((<>))
import Data.Text (Text)
import qualified Data.Text as T
import qualified Data.Text.IO as T
import Improved.Cat (Cat, CatParseError, parseCat, renderCatParseError)
import Improved.Db (DbError, Result, processWithDb, renderDbError
, renderResult, withDatabaseConnection)
import Improved.Dog (Dog, DogParseError, parseDog, renderDogParseError)
import Prelude hiding (readFile, writeFile)
import System.Environment (getArgs)
import System.Exit (exitFailure)
data ProcessError
= ECat CatParseError
| EDog DogParseError
| EReadFile FilePath Text
| EWriteFile FilePath Text
| EDb DbError
main :: IO ()
main = do
args <- getArgs
case args of
[inFile1, infile2, outFile] ->
report =<< runExceptT (processFiles inFile1 infile2 outFile)
_ -> do
putStrLn "Expected three file names, the first two are input, the last output."
exitFailure
report :: Either ProcessError () -> IO ()
report (Right _) = pure ()
report (Left e) = T.putStrLn $ renderProcessError e
renderProcessError :: ProcessError -> Text
renderProcessError pe =
case pe of
ECat ec -> renderCatParseError ec
EDog ed -> renderDogParseError ed
EReadFile fpath msg -> "Error reading '" <> T.pack fpath <> "' : " <> msg
EWriteFile fpath msg -> "Error writing '" <> T.pack fpath <> "' : " <> msg
EDb dbe -> renderDbError dbe
readCatFile :: FilePath -> ExceptT ProcessError IO Cat
readCatFile fpath = do
liftIO $ putStrLn "Reading Cat file."
bs <- handleExceptT handler $ readFile fpath
hoistEither . fmapL ECat $ parseCat bs
where
handler :: SomeException -> ProcessError
handler e = EReadFile fpath (T.pack $ show e)
readDogFile :: FilePath -> ExceptT ProcessError IO Dog
readDogFile fpath = do
liftIO $ putStrLn "Reading Dog file."
bs <- handleExceptT handler $ readFile fpath
hoistEither . fmapL EDog $ parseDog bs
where
handler :: SomeException -> ProcessError
handler e = EReadFile fpath (T.pack $ show e)
writeResultFile :: FilePath -> Result -> ExceptT ProcessError IO ()
writeResultFile fpath result = do
liftIO $ putStrLn "Writing Result file."
handleExceptT handler . writeFile fpath $ renderResult result
where
handler :: SomeException -> ProcessError
handler e = EWriteFile fpath (T.pack $ show e)
processFiles :: FilePath -> FilePath -> FilePath -> ExceptT ProcessError IO ()
processFiles infile1 infile2 outfile = do
cat <- readCatFile infile1
dog <- readDogFile infile2
result <- fmapLT EDb . withDatabaseConnection $ \ db ->
processWithDb db cat dog
writeResultFile outfile result
</pre>
<p>
The first thing to notice is that changes to the structure of the main
processing function <b><tt>processFiles</tt></b> are minor but <i>all</i> errors
are now handled explicitly.
In addition, all possible exceptions are caught as close as possible to the
source and turned into errors that are explicit in the function return types.
Sceptical?
Try replacing one of the <b><tt>readFile</tt></b> calls with an
<b><tt>error</tt></b> call or a <b><tt>throw</tt></b> and see it get caught
and turned into an error as specified by the type of the function.
</p>
<p>
We also see that despite having many different error types (which happens when
code is split up into many packages and modules), a constructor for an error
type higher in the stack can encapsulate error types lower in the stack.
For example, this value of type <b><tt>ProcessError</tt></b>:
</p>
<pre class="code">
EDb (DbError3 ResultError1)
</pre>
<p>
contains a <b><tt>DbError</tt></b> which in turn contains a
<b><tt>ResultError</tt></b>.
Nesting error types like this aids composition, as does the separation of error
rendering (turning an error data type into text to be printed) from printing.
</p>
<p>
We also see that with the use of combinators like
<a href="https://hackage.haskell.org/package/errors-2.2.0/docs/Data-EitherR.html#v:fmapLT"><b><tt>fmapLT</tt></b></a>,
and the nested error types of the previous paragraph, means that
<b><tt>ExceptT</tt></b> monad transformers do compose.
</p>
<p>
Using <b><tt>ExceptT</tt></b> with the combinators from the
<b><tt>errors</tt></b> package to catch exceptions as close as possible to their
source and converting them to errors has numerous benefits including:
</p>
<ul>
<li> Errors are explicit in the types of the functions, making the code easier
to reason about.</li>
<li> Its easier to provide better error messages and more context than what
is normally provided by the <b><tt>Show</tt></b> instance of most
exceptions.</li>
<li> The programmer spends less time chasing the source of exceptions in large
complex code bases.</li>
<li> More robust code, because the programmer is forced to think about and
write code to handle errors instead of error handling being and optional
afterthought.
</li>
</ul>
<p>
Want to discuss this?
Try <a href="https://www.reddit.com/r/haskell/comments/68kyfx/what_do_you_mean_exceptt_doesnt_compose/">reddit</a>.
</p>
Forgive me Curry and Howard for I have Sinned.
http://www.mega-nerd.com/erikd/Blog/2015/11/16/forgive_me.atom
2015-11-16T11:22:00Z
2015-11-16T11:22:00Z
<p>
Forgive me Curry and Howard for I have sinned.
</p>
<p>
For the last several weeks, I have been writing C++ code.
I've been doing some experimentation in the area of real-time audio Digital
Signal Processing experiments, C++ actually is better than Haskell.
</p>
<p>
Haskell is simply not a good fit here because I need:
</p>
<ul>
<li> To be able to guarantee (by inspection) that there is zero memory
allocation/de-allocation in the real-time inner processing loop. </li>
<li> Things like IIR filters are inherently stateful, with their internal state
being updated on every input sample.</li>
</ul>
<p>
There is however one good thing about coding C++; I am constantly reminded of
all the sage advice about C++ I got from my friend Peter Miller who passed
away a bit over a year ago.
</p>
<p>
Here is an example of the code I'm writing:
</p>
<pre class="code">
class iir2_base
{
public :
// An abstract base class for 2nd order IIR filters.
iir2_base () ;
// Virtual destructor does nothing.
virtual ~iir2_base () { }
inline double process (double in)
{
unsigned minus2 = (minus1 + 1) & 1 ;
double out = b0 * in + b1 * x [minus1] + b2 * x [minus2]
- a1 * y [minus1] - a2 * y [minus2] ;
minus1 = minus2 ;
x [minus1] = in ;
y [minus1] = out ;
return out ;
}
protected :
// iir2_base internal state (all statically allocated).
double b0, b1, b2 ;
double a1, a2 ;
double x [2], y [2] ;
unsigned minus1 ;
private :
// Disable copy constructor etc.
iir2_base (const iir2_base &) ;
iir2_base & operator = (const iir2_base &) ;
} ;
</pre>
Building the LLVM Fuzzer on Debian.
http://www.mega-nerd.com/erikd/Blog/2015/07/21/building-llvm-fuzzer.atom
2015-07-21T10:08:00Z
2015-07-21T10:08:00Z
<p>
I've been using the awesome
<a href="http://lcamtuf.coredump.cx/afl/">American Fuzzy Lop</a>
fuzzer since late last year but had also heard good things about the
<a href="http://llvm.org/docs/LibFuzzer.html">LLVM Fuzzer</a>.
Getting the code for the LLVM Fuzzer is trivial, but when I tried to use it, I
ran into all sorts of road blocks.
</p>
<p>
Firstly, the LLVM Fuzzer needs to be compiled with and used with Clang (GNU GCC
won't work) and it needs to be Clang >= 3.7.
Now Debian does ship a clang-3.7 in the Testing and Unstable releases, but that
package has a bug
<a href="https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=779785">(#779785)</a>
which means the Debian package is missing the static libraries required by the
<a href="http://clang.llvm.org/docs/AddressSanitizer.html">Address Sanitizer</a>
options.
Use of the Address Sanitizers (and other sanitizers) increases the effectiveness
of fuzzing tremendously.
</p>
<p>
This bug meant I had to build Clang from source, which nnfortunately, is rather
poorly documented (I intend to submit a patch to improve this) and I only
managed it with help from the #llvm IRC channel.
</p>
<p>
Building Clang from the git mirror can be done as follows:
</p>
<pre class="code">
mkdir LLVM
cd LLVM/
git clone http://llvm.org/git/llvm.git
(cd llvm/tools/ && git clone http://llvm.org/git/clang.git)
(cd llvm/projects/ && git clone http://llvm.org/git/compiler-rt.git)
(cd llvm/projects/ && git clone http://llvm.org/git/libcxx.git)
(cd llvm/projects/ && git clone http://llvm.org/git/libcxxabi)
mkdir -p llvm-build
(cd llvm-build/ && cmake -G "Unix Makefiles" -DCMAKE_INSTALL_PREFIX=$(HOME)/Clang/3.8 ../llvm)
(cd llvm-build/ && make install)
</pre>
<p>
If all the above works, you will now have working <b><tt>clang</tt></b> and
<b><tt>clang++</tt></b> compilers installed in <b><tt>$HOME/Clang/3.8/bin</tt></b>
and you can then follow the examples in the
<a href="http://llvm.org/docs/LibFuzzer.html">LLVM Fuzzer documentation</a>.
</p>
Haskell : A neat trick for GHCi
http://www.mega-nerd.com/erikd/Blog/2014/10/18/ghci-trick.atom
2014-10-17T22:16:00Z
2014-10-17T22:16:00Z
<p>
Just found a really nice little hack that makes working in the GHC interactive
<a href="https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop">REPL</a>
a little easier and more convenient.
First of all, I added the following line to my <b><tt>~/.ghci</tt></b> file.
</p>
<pre class="code">
:set -DGHC_INTERACTIVE
</pre>
<p>
All that line does is define a <b><tt>GHC_INTERACTIVE</tt></b> pre-processor
symbol.
</p>
<p>
Then in a file that I want to load into the REPL, I need to add this to the top
of the file:
</p>
<pre class="code">
{-# LANGUAGE CPP #-}
</pre>
<p>
and then in the file I can do things like:
</p>
<pre class="code">
#ifdef GHC_INTERACTIVE
import Data.Aeson.Encode.Pretty
prettyPrint :: Value -> IO ()
prettyPrint = LBS.putStrLn . encodePretty
#endif
</pre>
<p>
In this particular case, I'm working with some relatively large chunks of JSON
and its useful to be able to pretty print them when I'm the REPL, but I have
no need for that function when I compile that module into my project.
</p>
Moving from Wai 2.X to 3.0.
http://www.mega-nerd.com/erikd/Blog/2014/06/11/wai_3.atom
2014-06-11T10:16:00Z
2014-06-11T10:16:00Z
<p>
Michael Snoyman has just released version 3.0 of
<a href="http://hackage.haskell.org/package/wai/">
Wai</a>,
the Haskell Web Application Interface library which is used with the
<a href="http://www.yesodweb.com/">
Yesod Web Framework</a>
and anything that uses the
<a href="http://hackage.haskell.org/package/warp">
Warp</a>
web server.
The important changes for Wai are listed this
<a href="http://www.yesodweb.com/blog/2014/05/wai-3-0-alpha">
blog post</a>.
The tl;dr is that removing the Conduit library dependency makes the Wai
interface more easily usable with one of the alternative Haskell streaming
libraries, like Pipes, Stream-IO, Iterator etc.
</p>
<p>
As a result of the above changes, the type of a web application changes as
follows:
</p>
<pre class="code">
-- Wai > 2.0 && Wai < 3.0
type Application = Request -> IO Response
-- Wai == 3.0
type Application = Request -> (Response -> IO ResponseReceived) -> IO ResponseReceived
</pre>
<p>
Typically a function of type <b><tt>Application</tt></b> will be run by the Warp
web server using one of <b><tt>Warp.run</tt></b> or associated functions which
have type signatures of:
</p>
<pre class="code">
run :: Port -> Application -> IO ()
runSettings :: Settings -> Application -> IO ()
runSettingsSocket :: Settings -> Socket -> Application -> IO ()Source
runSettingsConnection :: Settings -> IO (Connection, SockAddr) -> Application -> IO ()
</pre>
<p>
Its important to note that the only thing that has changed about these Warp
functions is the <b><tt>Application</tt></b> type.
That means that if we have a function <b><tt>oldWaiApplication</tt></b> that we
want to interface to the new version of Wai, we can just wrap it with the
following function:
</p>
<pre class="code">
newWaiApplication :: Manager -> Request -> (Response -> IO ResponseReceived) -> IO ResponseReceived
newWaiApplication mgr wreq receiver = oldWaiApplication mgr wreq >>= receiver
</pre>
<p>
and use <b><tt>newWaiApplication</tt></b> in place of <b><tt>oldWaiApplication</tt></b>
in the call to whichever of the Warp <b><tt>run</tt></b> functions you are using.
</p>
When QuickCheck Fails Me
http://www.mega-nerd.com/erikd/Blog/2014/01/08/quickcheck_fail.atom
2014-01-08T10:03:00Z
2014-01-08T10:03:00Z
<p>
This is an old trick I picked up from a colleague over a decade ago and have
re-invented or re-remembered a number of times since.
</p>
<p>
When implementing complicated performance critical algorithms and things
don't work immediately, the best idea is to drop back to the old formula of:
</p>
<ul>
<li>Make it compile.</li>
<li>Make it correct.</li>
<li>Make it fast.</li>
</ul>
<p>
Often than means implementing slow naive versions of parts of the algorithm
first and then one-by-one replacing the slow versions with fast versions.
For a given function of two inputs, this might give me two functions with the
identical type signatures:
</p>
<pre class="code">
functionSlow :: A -> B -> C
functionFast :: A -> B -> C
</pre>
<p>
that can be used interchangeably.
</p>
<p>
When it comes to implementing the fast versions, the slow versions can be used
to check the correct-ness of the fast version using a simple QuickCheck property
like:
</p>
<pre class="code">
\ a b -> functionFast a b == functionSlow a b
</pre>
<p>
This property basically just asks QuickCheck to generate a, b pairs, pass these
to both functions and compare their outputs.
</p>
<p>
With something like this, QuickCheck usually all finds the corner cases really
quickly.
Except for when it doesn't.
QuickCheck uses a random number generator to generate inputs to the function
under test.
If for instance you have a function that takes a floating point number and only
behaves incorrectly when that input is say exactly 10.3, the chances of QuickCheck
generating exactly 10.3 and hitting the bug are very small.
</p>
<p>
For exactly this reason, using the technique above sometimes doesn't work.
Sometimes the fast version has a bug that Quickcheck wasn't able to find.
</p>
<p>
When this happens the trick is to write a third function:
</p>
<pre class="code">
functionChecked :: A -> B -> C
functionChecked a b =
let fast = functionFast a b
slow = functionSlow a b
in if fast == slow
then fast
else error $ "functionFast " ++ show a ++ " " ++ show b
++ "\nreturns " ++ show fast
++ "\n should be " ++ show slow
</pre>
<p>
which calculates the function output using both the slow and the fast versions,
compares the outputs and fails with an error if the two function outputs are not
identical.
</p>
<p>
Using this in my algorithm I can then collect failing test cases that QuickCheck
couldn't find.
With a failing test case, its usually pretty easy to fix the broken fast
version of the function.
</p>
Haskell : The Problem with Integer.
http://www.mega-nerd.com/erikd/Blog/2013/12/29/integer_pt1.atom
2013-12-28T23:08:00Z
2013-12-28T23:08:00Z
<p>
Haskellers may or not be aware that there are two libraries in the GHC sources
for implementing the <tt>Integer</tt> data type.
</p>
<p>
The first,
<a href="http://git.haskell.org/packages/integer-gmp.git"><tt>integer-gmp</tt></a>
links to the
<a href="https://gmplib.org/">GNU Multiple Precision Arithmetic Library</a>
which is licensed under the GNU LGPL.
On most systems, libgmp is dynamically linked and all is fine.
However, if you want to create statically linked binaries from Haskell source code
you end up with your executable statically linking libgmp which means your binary
needs to be under an LGPL compatible license if you want to release it.
This is especially a problem on iOS which doesn't allow dynamic linking anyway.
</p>
<p>
The second <tt>Integer</tt> implementation is
<a href="http://git.haskell.org/packages/integer-simple.git"><tt>integer-simple</tt></a>
which is implemented purely in Haskell (using a number of GHC extension) and is
BSD licensed.
</p>
<p>
So why doesn't everyone just the the BSD licensed one?
Well, <tt>integer-simple</tt> has a reputation for being slow.
Even more intriguingly, I seem to remember Duncan Coutts telling me a couple of
years ago that <tt>integer-simple</tt> was a little faster than <tt>integer-gmp</tt>
when the <tt>Integer</tt> was small enough to fit in a single machine <tt>Word</tt>,
but much slower when that was not the case.
At the time I heard this, I decided to look into it at some time.
That time has come.
</p>
<p>
A couple of weeks ago I put together some scripts and code to allow me to compile
the two <tt>Integer</tt> implementations into a single program and benchmark them
against each other.
My initial results looked like this:
</p>
<img src="http://www.mega-nerd.com/erikd/Img/integer-gmp-simple.png"
border="0" alt="Integer performance (GMP vs Simple)"></img>
<p>
That confirmed the slowness for multiplication and division if nothing else.
</p>
<p>
Taking a look at the code to <tt>integer-simple</tt> I found that it was storing
<tt>Word#</tt>s (unboxed machine sized words) in a Haskell list.
As convenient as lists are they are not an optimal data structure for a something
like the <tt>Integer</tt> library.
</p>
<p>
I have already started work on replacement for both versions of the current
<tt>Integer</tt> library with the following properties:
</p>
<ul>
<li> BSD licensed.</li>
<li> Implemented in Haskell (with GHC extensions) so there are no issues
with linking to external C libraries.</li>
<li> Fast. I'm aiming to outperform both <tt>integer-simple</tt> and
<tt>integer-gmp</tt> on as many benchmarks as possible.</li>
<li> Few dependencies so it can easily replace the existing versions.
Currently my code only depends on <tt>ghc-prim</tt> and
<tt>primitive</tt>.</li>
</ul>
<p>
So far the results are looking encouraging.
For <tt>Integer</tt> values smaller than a machine word, addition with my prototype
code is faster than both existing libraries and for adding large integers its
currently half the speed of <tt>integer-gmp</tt>, but I have an idea which will
likely make the new library match the speed of <tt>integer-gmp</tt>.
</p>
parMap to the Rescue.
http://www.mega-nerd.com/erikd/Blog/2013/01/22/parMap.atom
2013-01-22T11:08:00Z
2013-01-22T11:08:00Z
<p>
I had a long running, CPU intensive Haskell program that I wanted to speed up.
The program was basically a loop consisting of a a small sequential part
followed by a <tt>map</tt> of a CPU intensive pure function over a list of 1500
elements.
</p>
<p>
Obviously I needed some sort of parallel map function and I had a faint
recollection of a function called <tt>parMap</tt>.
The wonderful
<a href="http://www.haskell.org/hoogle/">
Hoogle search engine</a>
pointed me to the
<a href="http://hackage.haskell.org/packages/archive/parallel/latest/doc/html/Control-Parallel-Strategies.html#v:parMap">
<tt>parMap</tt> documentation</a>.
</p>
<p>
Changing the existing sequential <tt>map</tt> operation into a parallel map
required a 3 line change (one of them to import the required module).
I then added "<tt>-threaded</tt>" to the compile command line to enable the
threaded runtime system in the generated executable and "<tt>+RTS -N6</tt>" to
the executable's command line.
The resulting program went from using 100% of 1 CPU to using 560% of 1 CPU on an
8 CPU box.
Win!
</p>
<p>
I wish all code was this easy to parallelize.
</p>
My Space is Leaking.
http://www.mega-nerd.com/erikd/Blog/2012/12/22/my_space_is_leaking.atom
2012-12-22T05:42:00Z
2012-12-22T05:42:00Z
<p>
Over the last couple of days I wrote a small Haskell program to read a large CSV
file (75Meg, approx. 1000 columns and 50000 rows) and calculate some statistics.
Since I would have to do this for much larger files as well, I decided to use
the
<a href="http://hackage.haskell.org/package/csv-conduit/">
csv-conduit</a>
library to read the data and use a function passed to <tt>Data.Conduit</tt>'s
<a href="http://hackage.haskell.org/packages/archive/conduit/latest/doc/html/Data-Conduit-Util.html#v:sinkState">
sinkState</a>
to calculate the statistics.
</p>
<p>
The code was pretty easy to put together, and only came to about 100 lines of
code.
Unfortunately, when I ran the program, it tried to consume all 8Gig of memory
on my laptop and when I actually let it run to completion, it took over an hour
to produce useful output.
</p>
<p>
A bit of quick profiling showed that the problem was with the state used to hold
the statistics.
The state itself wasn't huge, but Haskell's lazy evaluation meant there were a
huge number of thunks (pending calculations) piling up.
</p>
<blockquote>
Aside : Haskell does lazy (more correctly called non-strict) evaluation by
default.
This means that values are calculated when they are needed rather than when
the program hits that point in the code.
For instance if a value is generated by calling a pure function, the GHC runtime
will forgo actually calling the function and replace the value with a thunk
containing the function and it's input parameters.
Later, when the value is actually needed, the runtime will call the function
stored in the thunk.
</blockquote>
<p>
My first attempt to fix this problem was to add some strictness annotations to
my data types, but that didn't seem to help.
I then looked at the
<a href="http://hackage.haskell.org/package/deepseq/">
deepseq</a>
package and tried adding the
<a href="http://hackage.haskell.org/packages/archive/deepseq/1.3.0.1/doc/html/Control-DeepSeq.html#v:-36--33--33-">
<tt>$!!</tt></a>
operator in a few places.
This resulted in a compile error complaining about my data structures not having
an
<a href="http://hackage.haskell.org/packages/archive/deepseq/1.3.0.1/doc/html/Control-DeepSeq.html#t:NFData">
NFData</a>
instance.
A bit of googling for "custom NFData instance" showed up the
<a href="http://hackage.haskell.org/package/deepseq-th/">
deepseq-th</a>
package which uses Template Haskell to generate NFData instances.
</p>
<blockquote>
Aside : For a value to be an instance of the NFData typeclass means that
it can be fully evaluated, ie a thunk to calculate a value of this type can
be forced by deepseq to replace the thunk with the value.
</blockquote>
<p>
About 10 minutes later I had my code working again, but now it processed the same
file in a little over 2 minutes and used less than 0.1% of the 8Gig it was using
previously.
</p>
<p>
I was happy with this.
So happy that I decided to thank the author of deepseq-th, Herbert Valerio Riedel
(hvr) on the #haskell IRC channel.
Herbert was pleased to hear of my success, but suggested that instead of
deepseq-th I try using
<a href="http://hackage.haskell.org/package/deepseq-generics/">
deepseq-generics</a>.
Someone else on the channel suggested that this might be slower, but Herbert
said that he had not found that to be the case.
Switching from one to the other was trivially easy and pleasingly enough the
generics version ran just as fast.
</p>
<p>
That's when José Pedro Magalhães (dreixel in #haskell) said that he had a draft
paper
<a href="http://dreixel.net/research/pdf/ogpi_draft.pdf">
"Optimisation of Generic Programs through Inlining"</a>
explaining how and why this generic implementation is just as fast as the
Template Haskell version.
Basically it boils down to the compiler having all the information it needs at
compile time to inline and specialize the code to be just as fast as hand written
code.
</p>
<p>
Reflections:
</p>
<ol>
<li>Streaming I/O libraries like Data.Conduit (there's more than one) do
give guarantees about space usage so that when you get a space leak the
I/O is probably not the first place to look.
</li>
<li>For small programs its relatively easy to reason about where the space
leak is happening.
</li>
<li>For a relatively experienced Haskeller, following the bread crumbs to
a working solution is relatively easy.
</li>
<li>Code that uses a struct to accumulate state is a common contributor to
space leaks.
</li>
<li>Interacting with the Haskell community can often get a better result
than the first thing you find (eg deepseq-generics instead of deepseq-th).
</li>
<li>Generics can be just as fast as Template Haskell generated code.
</li>
</ol>
Benchmarking and QuickChecking readInt.
http://www.mega-nerd.com/erikd/Blog/2012/01/24/read_int.atom
2012-01-24T00:52:00Z
2012-01-24T00:52:00Z
<p>
I'm currently working on converting my
<a href="http://hackage.haskell.org/package/http-proxy/">
http-proxy</a>
library from using the
<a href="http://hackage.haskell.org/package/enumerator">
Data.Enumerator</a>
package to
<a href="http://hackage.haskell.org/package/conduit/">
Data.Conduit</a>
(explanation of why in my
<a href="http://www.mega-nerd.com/erikd/Blog/CodeHacking/Haskell/telnet-conduit.html">
last blog post</a>).
</p>
<p>
During this conversion, I have been studying the sources of the
<a href="http://hackage.haskell.org/package/warp/">
Warp</a>
web server because my http-proxy was originally derived from the Enumerator
version of Warp.
While digging through the Warp code I found the following code (and comment)
which is used to parse the number provided in the <tt>Content-Length</tt> field
of a HTTP header:
</p>
<pre class="code">
-- Note: This function produces garbage on invalid input. But serving an
-- invalid content-length is a bad idea, mkay?
readInt :: S.ByteString -> Integer
readInt = S.foldl' (\x w -> x * 10 + fromIntegral w - 48) 0
</pre>
<p>
The comment clearly states that that this function can produce garbage,
specifically if the string contains anything other than ASCII digits.
The comment is also correct that an invalid <tt>Content-Length</tt> is a bad
idea.
However, on seeing the above code, and remembering something I had seen recently
in the standard library, I naively sent the
<a href="https://github.com/yesodweb/wai/">
Yesod project</a>
a patch replacing the above code with a version that uses the <tt>readDec</tt>
function from the <tt>Numeric</tt> module:
</p>
<pre class="code">
import Data.ByteString (ByteString)
import qualified Data.ByteString.Char8 as B
import qualified Numeric as N
readInt :: ByteString -> Integer
readInt s =
case N.readDec (B.unpack s) of
[] -> 0
(x, _):_ -> x
</pre>
<p>
About 3-4 hours after I submitted the patch I got an email from
<a href="http://www.snoyman.com/">
Michael Snoyman</a>
saying that parsing the <tt>Content-Length</tt> field is a hot spot for the
performance of Warp and that I should benchmark it against the code I'm
replacing to make sure there is no unacceptable performance penalty.
</p>
<p>
That's when I decided it was time to check out Bryan O'Sullivan's
<a href="http://hackage.haskell.org/package/criterion/">
Criterion</a>
bench-marking library.
A quick read of the docs and bit of messing around and I was able to prove to
myself that using <tt>readDec</tt> was indeed much slower than the code I wanted
to replace.
</p>
<p>
The initial disappointment of finding that a more correct implementation was
significantly slower than the less correct version quickly turned to joy as I
experimented with a couple of other implementations and eventually settled on
this:
</p>
<pre class="code">
import Data.ByteString (ByteString)
import qualified Data.ByteString.Char8 as B
import qualified Data.Char as C
readIntTC :: Integral a => ByteString -> a
readIntTC bs = fromIntegral
$ B.foldl' (\i c -> i * 10 + C.digitToInt c) 0
$ B.takeWhile C.isDigit bs
</pre>
<p>
By using the <tt>Integral</tt> type class, this function converts the given
<tt>ByteString</tt> to any integer type (ie any type belonging to the
<tt>Integral</tt> type class).
When used, this function will be specialized by the Haskell compiler at the
call site to to produce code to read string values into <tt>Int</tt>s,
<tt>Int64</tt>s or anything else that is a member of the <tt>Integral</tt>
type class.
</p>
<p>
For a final sanity check I decided to use
<a href="http://hackage.haskell.org/package/QuickCheck">
QuickCheck</a>
to make sure that the various versions of the generic function were correct for
values of the type they returned.
To do that I wrote a very simple QuickCheck property as follows:
</p>
<pre class="code">
prop_read_show_idempotent :: Integral a => (ByteString -> a) -> a -> Bool
prop_read_show_idempotent freader x =
let posx = abs x
in posx == freader (B.pack $ show posx)
</pre>
<p>
This QuickCheck property takes the function under test <tt>freader</tt> and
QuickCheck will then provide it values of the correct type.
Since the function under test is designed to read <tt>Content-Length</tt> values
which are always positive, we only test using the absolute value of the value
randomly generated by QuickCheck.
</p>
<p>
The complete test program can be found on Github
<a href="https://gist.github.com/1662654">
in this Gist</a>
and can be compiled and run as:
</p>
<pre class="code">
ghc -Wall -O3 --make readInt.hs -o readInt && ./readInt
</pre>
<p>
When run, the output of the program looks like this:
</p>
<pre class="code">
Quickcheck tests.
+++ OK, passed 100 tests.
+++ OK, passed 100 tests.
+++ OK, passed 100 tests.
Criterion tests.
warming up
estimating clock resolution...
mean is 3.109095 us (320001 iterations)
found 27331 outliers among 319999 samples (8.5%)
4477 (1.4%) low severe
22854 (7.1%) high severe
estimating cost of a clock call...
mean is 719.4627 ns (22 iterations)
benchmarking readIntOrig
mean: 4.653041 us, lb 4.645949 us, ub 4.663823 us, ci 0.950
std dev: 43.94805 ns, lb 31.52653 ns, ub 73.82125 ns, ci 0.950
benchmarking readDec
mean: 13.12692 us, lb 13.10881 us, ub 13.14411 us, ci 0.950
std dev: 90.63362 ns, lb 77.52619 ns, ub 112.4304 ns, ci 0.950
benchmarking readRaw
mean: 591.8697 ns, lb 590.9466 ns, ub 594.1634 ns, ci 0.950
std dev: 6.995869 ns, lb 3.557109 ns, ub 14.54708 ns, ci 0.950
benchmarking readInt
mean: 388.3835 ns, lb 387.9500 ns, ub 388.8342 ns, ci 0.950
std dev: 2.261711 ns, lb 2.003214 ns, ub 2.585137 ns, ci 0.950
benchmarking readInt64
mean: 389.4380 ns, lb 388.9864 ns, ub 389.9312 ns, ci 0.950
std dev: 2.399116 ns, lb 2.090363 ns, ub 2.865227 ns, ci 0.950
benchmarking readInteger
mean: 389.3450 ns, lb 388.8463 ns, ub 389.8626 ns, ci 0.950
std dev: 2.599062 ns, lb 2.302428 ns, ub 2.963600 ns, ci 0.950
</pre>
<p>
At the top of the output is proof that all three specializations of the generic
function <tt>readIntTC</tt> satisfy the QuickCheck property.
From the Criterion output its pretty obvious that the <tt>Numeric.readDec</tt>
version is about 3 times slower that the original function.
More importantly, all three version of this generic function are an order of
magnitude faster than the original.
</p>
<p>
That's a win!
I will be submitting my new function for inclusion in Warp.
</p>
<p>
<b>Update : 14:13</b>
</p>
<p>
At around the same time I submitted my latest version for <tt>readInt</tt>
Vincent Hanquez
<a href="https://github.com/yesodweb/wai/pull/34#issuecomment-3626110">
posted a comment on the Github issue</a>
suggesting I look at the
<a href="http://www.haskell.org/ghc/docs/7.2.2/html/users_guide/syntax-extns.html#magic-hash">
GHC MagicHash extension</a>
and pointed me to
<a href="http://tab.snarc.org/posts/haskell/2011-11-15-lookup-tables.html">
an example</a>.
</p>
<p>
Sure enough, using the MagicHash technique resulted in something significantly
faster again.
</p>
<p>
<b>Update #2 : 2012-01-29 19:46</b>
</p>
<p>
In version 0.3.0 and later of the
<a href="http://hackage.haskell.org/package/bytestring-lexing">
bytestring-lexing</a>
package there is a function <tt>readDecimal</tt> that is even faster than the
MagiHash version.
</p>
A Simple Telnet Client Using Data.Conduit.
http://www.mega-nerd.com/erikd/Blog/2012/01/14/telnet-conduit.atom
2012-01-14T02:22:00Z
2012-01-14T02:22:00Z
<p>
Below is a simple telnet client written using Haskell's new
<a href="http://hackage.haskell.org/package/conduit/">
Conduit</a>
library.
This library was written by
<a href="http://www.snoyman.com/">Michael Snoyman</a>
the man behind the
<a href="http://www.yesodweb.com/">
Yesod</a>
Web Framework for Haskell.
</p>
<p>
The Conduit library is a second generation approach to the problem of
guaranteeing bounded memory usage in the presence of lazy evaluation.
The first generation of these ideas were libraries like
<a href="http://hackage.haskell.org/package/iteratee">
Iteratee</a>,
<a href="http://hackage.haskell.org/package/enumerator">
Enumerator</a>,
and
<a href="http://hackage.haskell.org/package/iterIO">
IterIO</a>.
All of these first generation libraries use the the term <i>enumerator</i>
for data producers and <i>iteratee</i> for data consumers.
The new Conduit library calls data producers <i>"sources"</i> and data consumers
<i>"sinks"</i> to make them a little more approachable.
</p>
<p>
The other big difference between Conduit and the early libraries in this space
is to do with guaranteeing early clean up of potentially scarce resources like
sockets.
Although I have not looked in any detail at the IterIO library, both Iteratee and
Enumerator simply rely on Haskell's garbage collector to clean up resources
when they are no longer required.
The Conduit library on the other hand uses
<a href="http://hackage.haskell.org/packages/archive/conduit/latest/doc/html/Control-Monad-Trans-Resource.html">
Resource transformers</a>
to guarantee release of these resources as soon as possible.
</p>
<p>
The client looks like this
(<a href="https://gist.github.com/1596792">latest available here</a>):
</p>
<pre class="code">
import Control.Concurrent (forkIO, killThread)
import Control.Monad.IO.Class (MonadIO, liftIO)
import Control.Monad.Trans.Resource
import Data.Conduit
import Data.Conduit.Binary
import Network (connectTo, PortID (..))
import System.Environment (getArgs, getProgName)
import System.IO
main :: IO ()
main = do
args <- getArgs
case args of
[host, port] -> telnet host (read port :: Int)
_ -> usageExit
where
usageExit = do
name <- getProgName
putStrLn $ "Usage : " ++ name ++ " host port"
telnet :: String -> Int -> IO ()
telnet host port = runResourceT $ do
(releaseSock, hsock) <- with (connectTo host $ PortNumber $ fromIntegral port) hClose
liftIO $ mapM_ (`hSetBuffering` LineBuffering) [ stdin, stdout, hsock ]
(releaseThread, _) <- with (
forkIO $ runResourceT $ sourceHandle stdin $$ sinkHandle hsock
) killThread
sourceHandle hsock $$ sinkHandle stdout
release releaseThread
release releaseSock
</pre>
<p>
There are basically three blocks, a bunch of imports at the top, the program's
entry point <b><tt>main</tt></b> and the <b><tt>telnet</tt></b> function.
</p>
<p>
The <b><tt>telnet</tt></b> function is pretty simple.
Most of the function runs inside a <b><tt>runResourceT</tt></b> resource
transformer.
The purpose of these resources transformers is to keep track of resources such
as sockets, file handles, thread ids etc and make sure they get released in a
timely manner.
For example, in the <b><tt>telnet</tt></b> function, the <b><tt>connectTo</tt></b>
function call opens a connection to the specified host and port number and
returns a socket.
By wrapping the <b><tt>connectTo</tt></b> in the call to <b><tt>with</tt></b>
then the socket is registered with the resource transformer.
The <b><tt>with</tt></b> function has the following prototype:
</p>
<pre class="code">
with :: Resource m
=> Base m a -- Base monad for the current monad stack
-> (a -> Base m ()) -- Resource de-allocation function
-> ResourceT m (ReleaseKey, a)
</pre>
<p>
When the resource is registered, the user must also supply a function that will
destroy and release the resource.
The <b><tt>with</tt></b> function returns a <b><tt>ReleaseKey</tt></b> for the
resource and the resource itself.
Formulating the <b><tt>with</tt></b> function this way makes it hard to misuse.
</p>
<p>
The other thing of interest is that because a telnet client needs to send data
in both directions, the server-to-client communication path and the
client-to-server communication run in separate GHC runtime threads.
The thread is spawned using <b><tt>forkIO</tt></b> and even though the thread
identifier is thrown away, the resource transformer still records it and will
later call <b><tt>killThread</tt></b> to clean up the thread.
</p>
<p>
The main core of the program are the two lines containing calls to
<b><tt>sourceHandle</tt></b> and <b><tt>sinkHandle</tt></b>.
The first of these lines pulls data from <b><tt>stdin</tt></b> and pushes it to
the socket <b><tt>hsock</tt></b> while the second pulls from the socket and
pushes it to <b><tt>stdout</tt></b>.
</p>
<p>
It should be noted that the final two calls to <b><tt>release</tt></b> are not
strictly necessary since the resource transformer will clean up these resources
automatically.
</p>
<p>
The experience of writing this telnet client suggests that the Conduit library
is certainly easier to use than the Enumerator or Iteratee libraries.
</p>
LLVM Backend for DDC : Very Nearly Done.
http://www.mega-nerd.com/erikd/Blog/2011/01/01/llvm_very_nearly_done.atom
2011-01-01T02:54:00Z
2011-01-01T02:54:00Z
<p>
The LLVM backend for
<a href="http://disciple.ouroborus.net/">
DDC</a>
that I've been working on sporadically
<a href="http://www.mega-nerd.com/erikd/Blog/CodeHacking/DDC/llvm_backend.html">
since June</a>
is basically done.
When compiling via the LLVM backend, all but three of 100+ tests in the DDC
test suite pass.
The tests that pass when going via the C backend but fail via the LLVM backend
are of two kinds:
</p>
<ol>
<li>Use DDC's <tt><b>foreign import</b></tt> construct to name a C macro
to perform a type cast where the macro is defined in one of C header
files.
</li>
<li>Use static inline functions in the C backend to do peek and poke
operations on arrays of unboxed values.
</li>
</ol>
<p>
In both of these cases, DDC is using features of the C language to make code
generation easier.
Obviously, the LLVM backend needs to do something else to get the same effect.
</p>
<p>
Fixing the type casting problem should be relatively simple.
<a href="http://www.cse.unsw.edu.au/~benl/">
Ben</a>
is currently working on making type casts a primitive of the Core language
so that both the C and LLVM backends can easily generate code for them.
</p>
<p>
The array peek and poke problem is little more complex.
I suspect that it too will require the addition of new Core language primitive
operations.
This is a much more complex problem than the type casting one and I've only just
begun to start thinking about it.
</p>
<p>
Now that the backend is nearly done, its not unreasonable to look at its
performance.
The following table shows the compile and run times of a couple of tests in the
DDC test suite compiling via the C and the LLVM backend.
</p>
<br/>
<center>
<table class="simple">
<tr>
<th>Test name</th>
<th>C Build Time</th>
<th>LLVM Build Time</th>
<th>C Run Time</th>
<th>LLVM Run Time</th>
</tr>
<tr>
<th align="left">93-Graphics/Circle</th>
<td align="right">3.124s</td>
<td align="right">3.260s</td>
<td align="right">1.701s</td>
<td align="right">1.536s</td>
</tr>
<tr>
<th align="left">93-Graphics/N-Body/Boxed</th>
<td align="right">6.126s</td>
<td align="right">6.526s</td>
<td align="right">7.649s</td>
<td align="right">4.899s</td>
</tr>
<tr>
<th align="left">93-Graphics/N-Body/UnBoxed</th>
<td align="right">3.559s</td>
<td align="right">4.017s</td>
<td align="right">9.843s</td>
<td align="right">6.162s</td>
</tr>
<tr>
<th align="left">93-Graphics/RayTracer</th>
<td align="right">12.890s</td>
<td align="right">13.102s</td>
<td align="right">13.465s</td>
<td align="right">8.973s</td>
</tr>
<tr>
<th align="left">93-Graphics/SquareSpin</th>
<td align="right">2.699s</td>
<td align="right">2.889s</td>
<td align="right">1.609s</td>
<td align="right">1.604s</td>
</tr>
<tr>
<th align="left">93-Graphics/Styrene</th>
<td align="right">13.685s</td>
<td align="right">14.349s</td>
<td align="right">11.312s</td>
<td align="right">8.527s</td>
</tr>
</table>
</center>
<br/>
<p>
Although there is a small increase in compile times when compiling via LLVM, the
LLVM run times are significantly reduced.
The conceptual complexity of the LLVM backend is also low (the line count is
about 4500 lines, which will probably fall with re-factoring) and thanks to
LLVM's type checking being significantly better than C's, I think its reasonable
to be more confident in the quality of the LLVM backend than the existing C
backend.
Finally, implementing things like proper tail call optimisation will be far
easier in LLVM backend than in C.
</p>
<p>
All in all, I think doing this LLVM backend has been an interesting challenge
and will definitely pay off in the long run.
</p>
LLVM Backend for DDC : Milestone #3.
http://www.mega-nerd.com/erikd/Blog/2010/12/01/llvm_milestone3.atom
2010-12-01T09:41:00Z
2010-12-01T09:41:00Z
<p>
After my
<a href="http://www.mega-nerd.com/erikd/Blog/CodeHacking/DDC/llvm_backend2.html">
last post</a>
on this topic, I ran into some problems with the AST (abstract syntax tree) that
was being passed to my code for LLVM code generation.
After discussing the problem with Ben, he spent some time cleaning up the AST
definition, the result of which was that nearly all the stuff I already had,
stopped working.
This was a little disheartening.
That and the fact that I was really busy, meant that I didn't touch the LLVM
backend for a number of weeks.
</p>
<p>
When I finally did get back to it, I found that it wasn't as broken as I had
initially thought.
Although the immediate interface between Ben's code and mine had changed
significantly, all the helper functions I had written were still usable.
Over a week and a bit, I managed to patch everything up again and get back to
where I was.
I also did a lot of cleaning up and came up with a neat solution to a problem
which was bugging me during my previous efforts.
</p>
<p>
The problem was that structs defined via the LLVM backend needed to have exactly
the same memory layout as the structs defined via the C backend.
This is a strict requirement for proper interaction between code generated via
C and LLVM.
This was made a little difficult by David Terei's haskell LLVM wrapper code
<a href="http://www.mega-nerd.com/erikd/Blog/CodeHacking/DDC/llvm_backend.html">
(see previous post)</a>
which makes all structs packed by default, while structs on the C side were
not packed.
Another dimension of this problem was finding an easy way to generate LLVM code
to access structs in a way that was easy to read and debug in the code generator
and also not require different code paths for generating 32 and 64 bit code.
</p>
<p>
Struct layout is tricky.
Consider a really simple struct like this:
</p>
<pre class="code">
struct whatever
{ int32_t tag ;
char * pointer ;
} ;
</pre>
<p>
On a 32 bit system, that struct will take up 8 bytes; 4 bytes for the
<tt><b>int32_t</b></tt> and 4 for the pointer.
However, on a 64 bit system, where pointers are 8 bytes in size, the struct
will take up 16 bytes.
Why not 12 bytes?
Well, some 64 bit CPUs (Alpha and Sparc64 are two I can think of) are not
capable of unaligned memory accesses; a read from memory into a CPU register
where the memory address (in bytes) is not an integer multiple of the size of
the register.
Other CPUs like x86_64 can read unaligned data, but reading unaligned data is
usually slower than reading correctly aligned data.
</p>
<p>
In order to avoid unaligned, the compiler assumes that the start address of the
struct will be aligned to the correct alignment for the biggest CPU register
element in the struct, in this case the pointer.
It then adds 4 bytes of padding between the <tt><b>int32_t</b></tt> and the
pointer to ensure that if the struct is correctly aligned then the pointer will
also be correctly aligned.
</p>
<p>
Because structs are packed in the David Terei's code, the above struct would
require a different definition on 32 and 64 bit systems, ie
</p>
<pre class="code">
; 32 bit version of the struct
%struct.whatever.32 = type <{ i32, i8 * }>
; 64 bit version of the struct
%struct.whatever.64 = type <{ i32, [4 * i8], i8 * }>
</pre>
<p>
where the 64 bit version contains 4 padding bytes.
However, the difference between these two definitions causes another problem.
To access fields within a struct, LLVM code uses the
<a href="http://llvm.org/docs/LangRef.html#i_getelementptr">
<tt><b>getelementptr</b></tt></a>
operator which addresses fields by index.
Unfortunately, the index (zero based) of the pointer is 1 for the 32 bit version
and 2 for the 64 bit version.
That would make code generation a bit of a pain in the neck.
</p>
<p>
The solution is allow the specification of LLVM structs in Haskell as a list of
<tt><b>LlvmStructField</b></tt> elements, using
</p>
<pre class="code">
data LlvmStructField
= AField String LlvmType -- Field name and type.
| APadTo2 -- Pad next field to a 2 byte offset.
| APadTo4 -- Pad next field to a 4 byte offset.
| APadTo8 -- Pad next field to a 8 byte offset.
| APadTo8If64 -- Pad next field to a 8 byte offset only
-- for 64 bit.
</pre>
<p>
Note that the <tt><b>AField</b></tt> constructor requires both a name and the
<tt><b>LlvmType</b></tt>.
I then provide functions to convert the <tt><b>LlvmStructField</b></tt> list
into an opaque <tt><b>LlvmStructDesc</b></tt> type and provide the following
functions:
</p>
<pre class="code">
-- | Turn an struct specified as an LlvmStructField list into an
-- LlvmStructDesc and give it a name. The LlvmStructDesc may
-- contain padding to make it conform to the definition.
mkLlvmStructDesc :: String -> [LlvmStructField] -> LlvmStructDesc
-- | Retrieve the struct's LlvmType from the LlvmStructDesc.
llvmTypeOfStruct :: LlvmStructDesc -> LlvmType
-- | Given and LlvmStructDesc and the name of a field within the
-- LlvmStructDesc, retrieve a field's index with the struct and its
-- LlvmType.
structFieldLookup :: LlvmStructDesc -> String -> (Int, LlvmType)
</pre>
<p>
Once the <tt><b>LlvmStructDesc</b></tt> is built for a given struct, fields
within the struct can be addressed in the LLVM code generator by name, making
the Haskell code generator code far easier to read.
</p>
<p>
Pretty soon after I got the above working I also managed to get enough LLVM
code generation working to compile a complete small program which then runs
correctly.
I consider that to be milestone 3.
</p>
Functional Programing, Tail Call Recursion and Javascript.
http://www.mega-nerd.com/erikd/Blog/2010/11/30/fp-tail-js.atom
2010-11-30T10:47:00Z
2010-11-30T10:47:00Z
<p>
About 6 weeks ago, I got an email from
<a href="http://blogs.atlassian.com/developer/csharkie/">
Craig Sharkie</a>,
who runs the Sydney Javascript group,
<a href="http://sydjs.com/">
SydJS</a>.
He was contacting me because I run the
<a href="http://groups.google.com/group/fp-syd">
Sydney functional programing group</a>
and he was asking if I knew anyone who might be able to give a presentation
about tail call recursion at SydJS.
In the spirit of FP-Syd outreach I volunteered to do it, even though I haven't
really done all that much Javascript.
</p>
<p>
On the night, I showed up, had a beer and then presented
<a href="http://www.mega-nerd.com/erikd/Blog/files/js-tail-call.pdf">
my slides</a>.
I started off explaining what functional programming is and why its is
interesting (hint; common language features like garbage collection, dynamic
typing, lambda expression and type inference all started in functional
languages).
</p>
<p>
I used the factorial function as an example of function that can be implemented
recursively and I demoed the
<a href="http://www.mega-nerd.com/erikd/Blog/files/js-demo/">
Javascript versions</a>
in a web browser.
I gave the standard recursive form whose stack usage grows linearly with
<tt><b>n</b></tt>:
</p>
<pre class="code">
function factorial (n)
{
/* Termination condition. */
if (n <= 1)
return 1 ;
/* Recursion. */
return n * factorial (n - 1) ;
}
</pre>
<p>
followed by the tail recursive form:
</p>
<pre class="code">
function factorial (n)
{
function fac_helper (n, fac)
{
if (n <= 1)
return fac ;
return fac_helper (n - 1, n * fac) ;
}
return fac_helper (n, 1) ;
}
</pre>
<p>
Unfortunately even though this is written in tail recursive form, it still doesn't
run in constant stack space.
That's because neither the ECMAScript 3 and 5 standards mandate tail call
optimisation and few of the Javascript engines implement it.
</p>
<p>
For languages whose compilers do implement the TCO, the above function will
run in constant stack space and I demonstrated this using the same function
<a href="http://www.mega-nerd.com/erikd/Blog/files/mlfactorial.ml">
written in Ocaml</a>:
</p>
<pre class="code">
(* Compile using: ocamlopt nums.cmxa mlfactorial.ml -o mlfactorial *)
open Big_int
(*
val mult_int_big_int : int -> big_int -> big_int
Multiplication of a big integer by a small integer
*)
let ($*) = mult_int_big_int
let factorial n =
let rec fac_helper x fac =
if x <= 1 then
fac
else
fac_helper (x - 1) (x $* fac)
in
fac_helper n unit_big_int
let () =
let n = int_of_string Sys.argv.(1) in
let facn = factorial n in
Printf.printf "factorial %d = %s\n" n (string_of_big_int facn)
</pre>
<p>
When this program is run through the Ocaml compiler, the compiler detects that
the factorial function is written in tail recursive form and that it can
therefore use the Tail Call Optimisation and create a executable that runs in
constant stack space.
I demostrated the constant stack space usage by running it under valgrind using
valgrind's DRD tool:
</p>
<pre class="code">
<b>></b> valgrind --quiet --tool=drd --show-stack-usage=yes ./factorial 5
factorial 5 = 120
==3320== thread 1 finished and used 11728 bytes out of 8388608 on its stack. Margin: 8376880 bytes.
<b>></b> valgrind --quiet --tool=drd --show-stack-usage=yes ./factorial 10
factorial 10 = 3628800
==3323== thread 1 finished and used 11728 bytes out of 8388608 on its stack. Margin: 8376880 bytes.
<b>></b> valgrind --quiet --tool=drd --show-stack-usage=yes ./factorial 20
factorial 20 = 2432902008176640000
==3326== thread 1 finished and used 11728 bytes out of 8388608 on its stack. Margin: 8376880 bytes.
</pre>
<p>
Regardless of the value of <tt><b>n</b></tt> the stack space used is constant
(although, for much larger values of <tt><b>n</b></tt>, the
<tt><b>Big_int</b></tt> calculations start easting a little more stack, but
thats much less of a problem).
</p>
<p>
Finally, I showed a way of doing TCO by hand using a technique I found in
Spencer Tipping's
<a href="https://github.com/spencertipping/js-in-ten-minutes/">
<i>"Javascipt in Ten Minutes"</i></a>.
The solution adds a couple of new properties to the prototype of the
<tt><b>Function</b></tt> object to provide delimited continuations (another
idea from functional programming).
See the
<a href="http://www.mega-nerd.com/erikd/Blog/files/js-demo/demo5-factorial.js">
the code</a>
for the details.
Suffice to say that this solution is really elegant and should be safe to run
in just about any browser whose Javascript implementation is not completely
broken.
</p>
<p>
As far as I am concerned, my presentation was received very well and the Twitter
responses (all two of them) ranged from
<a href="https://twitter.com/sydjs/status/4821816115728384">
<i>"brain melting"</i></a>
to
<a href="https://twitter.com/pamelafox/status/4884534680092672">
<i>"awesome"</i></a>.
</p>
<p>
I then hung around for the rest of the meeting, had another beer and chatted to
people.
One interesting part of the meeting was called "Di-<i>script</i>-ions", where a
member of the audience would put up small 4-10 line snippets of Javascript code
and asked the audience what they did and why.
What was surprising to me that for some cases the semantics of a small piece of
Javascript code is completely non-obvious.
Javascript seems to have some very weird interactions between scoping rules,
whether functions are defined directly or as a variable and the sequence of
initialisation.
Strange indeed.
</p>
<p>
Anyway, thanks to Craig Sharkie for inviting me.
I had a great time.
</p>
libsndfile Malware on Windows.
http://www.mega-nerd.com/erikd/Blog/2010/10/16/malware.atom
2010-10-16T01:11:00Z
2010-10-16T01:11:00Z
<p>
I just found a very suspicious bit torrent download available here:
</p>
<center>
<p>
<a href="http://www.torrentzap.com/torrent/1581031/Libsndfile+%2864-Bit%29+1.0.23">
http://www.torrentzap.com/torrent/1581031/Libsndfile+%2864-Bit%29+1.0.23</a>
</p>
</center>
<p>
The file being shared is intended to look like the Windows 64 bit installer for
libsndfile-1.0.23 and seems to be widely available on this and a number of other
torrent sites.
</p>
<p>
However, the file on the torrent sites is called
<b><tt>libsndfile-64-bit-1.0.23.exe</tt></b> while the one I distribute is
called
<a href="http://www.mega-nerd.com/libsndfile/files/libsndfile-1.0.23-w64-setup.exe">
<b><tt>libsndfile-1.0.23-w64-setup.exe</tt></b></a>.
</p>
<p>
I haven't analyzed the torrent version of the file; I simply don't have the
tools or the knowledge to investigate it.
I don't even have access to a machine that runs 64 bit Windows.
The setup file on my website was cross compiled from Linux to 64 bit Windows
using the very wonderful
<a href="http://mingw-w64.org/">
MinGW w64</a>
tools and the setup installer created using
<a href="http://www.jrsoftware.org/isinfo.php">
INNO Setup</a>
running under
<a href="http://www.winehq.org/">
Wine</a>.
However, the file is named differently and has a different md5sum.
That in itself is more than enough reason to be suspicious.
</p>
<p>
The valid file that I distribute has the following md5 and sha256 sums:
</p>
<pre class="code">
md5sum : efe73b7cb52724e7db7bb7d6ce145929
sha256sum : 30896dac1002a7b509b6f4620317dad730d8ad761e4ff0402db5a94b0d4c09a2
</pre>
<p>
I'm not really aware of how problems like this are addressed on Windows.
Is there a safe, secure, verifiable way of distributing Windows software
packages?
If so, I'd appreciate it if someone could let me know how its done.
</p>
<p>
For Linux this is much easier.
Firstly, the vast majority of people on Linux install libsndfile via their Linux
distribution.
The person who packages libsndfile for any given distribution grabs the source
code tarball from my
<a href="http://www.mega-nerd.com/libsndfile/#Download">
web site</a>.
At the same time they should also grab the GPG signature file and verify that
the source code tarball is correct and valid.
</p>
<p>
I don't know what happens in all distributions, but in Debian, the person doing
the packaging GPG signs the package before uploading to the Debian servers.
Once the GPG signed package is uploaded, the packager's GPG signature is checked
before it goes into the unstable distribution.
From there the validity of the package is tracked all the way to where an end
user installs it on a machine via the
<a href="http://www.debian.org/doc/manuals/securing-debian-howto/ch7#s7.4.1">
process documented here</a>.
This process means that its very difficult to get malware onto a Linux machine
via the distribution's package manager.
</p>
<p>
I suppose this in one more reason why people should be running Linux rather than
Windows.
</p>
The (Problems with the) RF64 File Specification.
http://www.mega-nerd.com/erikd/Blog/2010/10/07/rf64_specs.atom
2010-10-07T10:36:00Z
2010-10-07T10:36:00Z
<p>
One of the very common sound file formats that
<a href="http://www.mega-nerd.com/libsndfile/">
libsndfile</a>
reads and writes is the
<a href="http://en.wikipedia.org/wiki/WAV">
WAV</a>
format.
This format uses unsigned 32 bit integers internally to specify chunk lengths
which limits the total size of well formed file to be about 4 gigabytes in size.
On modern systems with high bit widths, multiple channels and high sample rates,
this 4Gig limit can be run into very easily.
For instance at a sample rate of 96kHz, with 24 bit samples, a 5.1 surround
sound recording will run into the 4Gig limit after about 41 minutes.
</p>
<p>
In order to overcome the limitations of WAV, the
<a href="http://www.ebu.ch/">
European Broadcasting Union</a>
decided in 2006 to start the specification of an extended WAV file format
capable of handling 64 bit file offsets.
The document that resulted from this specification process was first released in
2006 and the latest update was made in 2009 and is
<a href="http://tech.ebu.ch/docs/tech/tech3306-2009.pdf">
available here</a>.
I have a number of problems with this specification document.
</p>
<p>
First and foremost, in section 3.5, the document states:
</p>
<blockquote><i>
In spite of higher sampling frequencies and multi-channel audio, some production
audio files will inevitably be smaller than 4 Gbyte and they should therefore
stay in Broadcast Wave Format.
<br/><br/>
The problem arises that a recording application cannot know in advance whether
the recorded audio it is compiling will exceed 4 Gbyte or not at end of
recording (i.e. whether it needs to use RF64 or not).
<br/><br/>
The solution is to enable the recording application to switch from BWF to RF64
on the fly at the 4 Gbyte size-limit, while the recording is still going on.
<br/><br/>
This is achieved by reserving additional space in the BWF by inserting a 'JUNK'
chunk 3 that is of the same size as a 'ds64' chunk. This reserved space has no
meaning for Broadcast Wave, but will become the 'ds64' chunk, if a transition
to RF64 is necessary.
</i></blockquote>
<p>
In short, the suggestion above for writing a file boils down to:
</p>
<ol>
<li>Open the file and write a RIFF/WAV file header with a JUNK section big
enough to allow the header to be replaced with an RF64 header if needed.
</li>
<li>If the file ends up bigger than 4 gigabytes, go back and replace the
existing header with an RF64 header.
</li>
</ol>
<p>
There are two problems with this suggestion; it makes testing difficult and it
makes the software more complex which means its more likely to contain bugs.
The testing problem arises because testing that the RF64 header is written
correctly can only be done by writing a 4 gigabyte file.
Programmers can then either choose not to test this (which means the software is
is highly unlikely to work as specified) or test write a full 4 Gig file.
However, programmers also want their tests to run quickly (so that they can be
run often) and writing 4 gigabytes of data to disk is definitely not going to
be quick.
Of course, a smaller unit test might be able to bypass the requirement of
writing 4 gigabytes, but it would still be prudent to do a real test at the
WAV to RF64 switch over point.
The complexity problem is simply that writing a WAV file header first and then
overwriting it with an RF64 header later is far more complicated than just
writing an RF64 header to begin with.
Complexity breeds bugs.
</p>
<p>
The libsndfile project has had, from the very beginning, a pretty comprehensive
test suite and the running of that test suite takes about 30 seconds on current
hardware.
In order to comprehensively test the reading and writing of RF64 files,
libsndfile disregards the rather silly suggestion of the EBU to convert on the
fly between WAV and RF64 files.
If the software calling libsndfile specifies that an RF64 file be generated,
libsndfile will write an RF64 file, even if that file only contains 100 bytes.
</p>
<p>
A second problem with the RF64 specification is that the specification is
ambiguous in a very subtle way.
The problem is with how the binary chunks within the file are specified.
For WAV files, chunks are specified in
<a href="http://www-mmsp.ece.mcgill.ca/documents/audioformats/wave/Docs/riffmci.pdf">
this document</a>
as:
</p>
<pre class="code">
typedef unsigned long DWORD ;
typedef unsigned char BYTE ;
typedef DWORD FOURCC ; // Four-character code
typedef FOURCC CKID ; // Four-character-code chunk identifier
typedef DWORD CKSIZE ; // 32-bit unsigned size value
typedef struct { // Chunk structure
CKID ckID ; // Chunk type identifier
CKSIZE ckSize ; // Chunk size field (size of ckData)
BYTE ckData [ckSize] ; // Chunk data
} CK;
</pre>
<p>
This specifies that a chunk has a 4 byte identifier, followed by a 4 byte chunk
size, followed by the chunk data.
The important thing to note here is that the chunk size does not include the
4 byte chunk identifier and the 4 byte chunk size field.
Inspecting real WAV files found in the wild will confirm that this is the case
for all common chunks found in WAV files.
</p>
<p>
Now contrast the above with how the chunks are specified in the EBU document.
Ror instance the <b><tt>'fmt '</tt></b> chunk (which is common to both WAV and
RF64) is specified as:
</p>
<pre class="code">
struct FormatChunk5 // declare FormatChunk structure
{
char chunkId[4]; // 'fmt '
unsigned int32 chunkSize; // 4 byte size of the 'fmt ' chunk
unsigned int16 formatType; // WAVE_FORMAT_PCM = 0x0001, etc.
unsigned int16 channelCount; // 1 = mono, 2 = stereo, etc.
unsigned int32 sampleRate; // 32000, 44100, 48000, etc.
unsigned int32 bytesPerSecond; // only important for compressed formats
unsigned int16 blockAlignment; // container size (in bytes) of one set of samples
unsigned int16 bitsPerSample; // valid bits per sample 16, 20 or 24
unsigned int16 cbSize; // extra information (after cbSize) to store
char extraData[22]; // extra data of WAVE_FORMAT_EXTENSIBLE when necessary
};
</pre>
<p>
Here, the <b><tt>chunkSize</tt></b> field is simply the <i>"size of the 'fmt '
chunk"</i> and nowhere in the EBU document is it specified exactly how that
<b><tt>chunkSize</tt></b> field should be calculated.
However, if you give the EBU documentation to any experienced software engineer
with no previous knowledge of RIFF/WAV files, they would almost certainly assume
that the <b><tt>chunkSize</tt></b> field should be the size of the whole chunk,
including the <b><tt>chunkID</tt></b> and <b><tt>chunkSize</tt></b> fields.
However, someone who knows about RIFF/WAV files will be less likely to follow
that path.
</p>
<p>
This leaves the programmer implementing code to read and write this format with
a couple of possibilities:
</p>
<ul>
<li>Assume that the document is right and should be followed to the letter.
</li>
<li>Assume the document is wrong and that the <b><tt>'fmt '</tt></b> chunk of
an RF64 file should be identical to that of a WAV file.
</li>
<li><s>Give up and go home.</s>
</li>
</ul>
<p>
However, the last part of section 3.5 of the EBU/RF64 document describes how a
WAV file is to be upgraded to an RF64 file, and that description makes no
mention of the <b><tt>'fmt '</tt></b> chunk being modified during that upgrade.
One can only assume from this, that the <b><tt>'fmt '</tt></b> chunk in an RF64
file should be identical to that of a WAV file and that the EBU/RF64
specification is misleading.
</p>
<p>
For libsndfile, I have decided to assume that the specification is indeed
misleading.
Unfortunately, I'm pretty sure that at some point I will be asked to at least
<i>read</i> files which strictly adhere to the literal interpretation of the
document.
I'm also pretty sure that implementing code to read files written to conform to
both interpretations of the spec will be a very painful exercise.
</p>
<!--
http://www-mmsp.ece.mcgill.ca/documents/audioformats/wave/Docs/riffmci.pdf
http://www-mmsp.ece.mcgill.ca/documents/audioformats/wave/Docs/RIFFNEW.pdf
-->
Distros and Test Suites.
http://www.mega-nerd.com/erikd/Blog/2010/10/03/distros_and_test_suites.atom
2010-10-03T11:58:00Z
2010-10-03T11:58:00Z
<p>
libsndfile is cross platform and is expected to run on 32 and 64 bit CPUs on
any system that is reasonably POSIX compliant (ie even Windows).
It also has a lot of low level code that does things like endian swapping and
bit shifting etc.
Although I compile and test the code on all the systems I have access to, I
don't have access to everything.
That's why libsndfile has a test suite.
</p>
<p>
The libsndfile test suite is as comprehensive as I can make it.
Its taken a lot or work, over man years to get to where it is, but has saved me
many times that amount of work tracking obscure bugs.
</p>
<p>
The test suite is important.
That's why I suggest that anyone building libsndfile from source should run the
test suite before using the library.
This is especially true for people packaging libsndfile for distributions.
That's why is so disappointing to see something like this
<a href="https://bugs.gentoo.org/335728">
Gentoo bug</a>.
</p>
<p>
Gentoo managed to mess up their build meta-data resulting in a libsndfile binary
that was horribly broken on 32 bit systems.
It was broken in such a way that just about every single test in the libsndfile
test suite would have failed.
Unfortunately, since Gentoo didn't run the test suite they distributed their
broken build meta-data to users.
And the users started emailing me with weird bug reports.
</p>
<p>
Fortunately, other distributions like Debian get it right.
Debian even keeps
<a href="https://buildd.debian.org/">
build logs</a>
for all releases of all packages on all architectures and makes them available
on the web.
For instance, the build log for libsndfile version 1.0.21-3 on the MIPS can be
<a href="https://buildd.debian.org/fetch.cgi?&pkg=libsndfile&ver=1.0.21-3&arch=mips&stamp=1280847502&file=log">
found here</a>.
</p>
<p>
If anyone is using a distro which does not routinely run the test suite when
building packages which supply a test suite, I recommend that they switch to
a distro that does.
</p>
LLVM Backend for DDC : Milestone #2.
http://www.mega-nerd.com/erikd/Blog/2010/08/22/llvm_milestone2.atom
2010-08-22T03:43:00Z
2010-08-22T03:43:00Z
<p>
For a couple of weeks after AusHac 2010 I didn't manage to find any time to
working on DDC at all, but I'm now back on it and late last week I reached the
second milestone on the
<a href="http://www.mega-nerd.com/erikd/Blog/CodeHacking/DDC/llvm_backend.html">
LLVM backend for DDC</a>.
The backend now has the ability to box and unbox 32 bit integers and perform
simple arithmetic operations on valid combinations of them.
</p>
<p>
Disciple code that can currently be compiled correctly via LLVM includes basic
stuff like:
</p>
<pre class="code">
identInt :: Int -> Int
identInt a = a
plusOneInt :: Int -> Int
plusOneInt x = x + 1
addInt :: Int -> Int -> Int
addInt a b = a + b
addInt32U :: Int32# -> Int32# -> Int32#
addInt32U a b = a + b
addMixedInt :: Int32# -> Int -> Int
addMixedInt a b = boxInt32 (a + unboxInt32 b)
cafOneInt :: Int
cafOneInt = 1
plusOne :: Int -> Int
plusOne x = x + cafOneInt
</pre>
<p>
where <b><tt>Int32#</tt></b> specifies an unboxed 32 bit integer and
<b><tt>Int32</tt></b> specifies the boxed version.
</p>
<p>
While writing the Haskell code for DDC, I'm finding that its easiest to generate
LLVM code for a specific narrow case first and then generalize it as more cases
come to light.
I also found that the way I had been doing the LLVM code generation was tedious
and ugly, invloving lots of concatenation of small lists.
To fix this I built myself an <b><tt>LlvmM</tt></b> monad on top of the
<b><tt>StateT</tt></b> monad:
</p>
<pre class="code">
type LlvmM = StateT [[LlvmStatement]] IO
</pre>
<p>
Using this I can then generate a block of LLVM code as a list of
<b><tt>LlvmStatement</tt></b>s and add it to the monad using an
<b><tt>addBlock</tt></b> function which basically pushes the blocks of code
down onto a stack:
</p>
<pre class="code">
addBlock :: [LlvmStatement] -> LlvmM ()
addBlock code
= do state <- get
put (code : state)
</pre>
<p>
The <b><tt>addBlock</tt></b> function is then used as the base building block
for a bunch of more specific functions like these:
</p>
<pre class="code">
unboxInt32 :: LlvmVar -> LlvmM LlvmVar
unboxInt32 objptr
| getVarType objptr == pObj
= do int32 <- lift $ newUniqueReg i32
iptr0 <- lift $ newUniqueNamedReg "iptr0" (pLift i32)
iptr1 <- lift $ newUniqueNamedReg "iptr1" (pLift i32)
addBlock
[ Comment [ show int32 ++ " = unboxInt32 (" ++ show objptr ++ ")" ]
, Assignment iptr0 (GetElemPtr True objptr [llvmWordLitVar 0, i32LitVar 0])
, Assignment iptr1 (GetElemPtr True iptr0 [llvmWordLitVar 1])
, Assignment int32 (Load iptr1) ]
return int32
readSlot :: Int -> LlvmM LlvmVar
readSlot 0
= do dstreg <- lift $ newUniqueNamedReg "slot.0" pObj
addBlock [ Comment [ show dstreg ++ " = readSlot 0" ]
, Assignment dstreg (Load localSlotBase) ]
return dstreg
readSlot n
| n > 0
= do dstreg <- lift $ newUniqueNamedReg ("slot." ++ show n) pObj
r0 <- lift $ newUniqueReg pObj
addBlock [ Comment [ show dstreg ++ " = readSlot " ++ show n ]
, Assignment r0 (GetElemPtr True localSlotBase [llvmWordLitVar n])
, Assignment dstreg (Load (pVarLift r0)) ]
return dstreg
readSlot n = panic stage $ "readSlot with slot == " ++ show n
</pre>
<p>
which are finally hooked up to do things like:
</p>
<pre class="code">
llvmVarOfExp (XUnbox ty@TCon{} (XSlot v _ i))
= do objptr <- readSlot i
unboxAny (toLlvmType ty) objptr
llvmVarOfExp (XUnbox ty@TCon{} (XForce (XSlot _ _ i)))
= do orig <- readSlot i
forced <- forceObj orig
unboxAny (toLlvmType ty) forced
</pre>
<p>
When the code generation of a single function is complete it the list of
<b><tt>LlvmStatement</tt></b> blocks is then retrieved, reversed and
concatenated to produce the list of <b><tt>LlvmStatement</tt></b>s for the
function.
</p>
<p>
With the <b><tt>LlvmM</tt></b> monad in place converting DDC's Sea AST into LLVM
code is now pretty straight forward.
Its just a matter of finding and implementing all the missing pieces.
</p>
From Gedit to Geany.
http://www.mega-nerd.com/erikd/Blog/2010/08/04/gedit_geany.atom
2010-08-04T11:17:00Z
2010-08-04T11:17:00Z
<p>
After effectively
<a href="http://www.mega-nerd.com/erikd/Blog/CodeHacking/rip_nedit.html">
giving up on Nedit</a>,
my text editor of choice for the last fifteen years, I gave Gedit a serious try.
</p>
<p>
For a full two weeks, I stuck with Gedit, including the intense 2½ day
hacking session of
<a href="http://random.axman6.com/blog/?p=219">
AusHac2010</a>.
Unfortunately, switching from a very full featured editor like Nedit to Gedit
was painful.
There were a bunch of features that I had grown used to that were just absent or
inconvienient in Gedit.
The problem is that Gedit aims to be a relatively full featured programmer's
editor while still being the default easy-to-use editor in GNOME.
As far as I am concerned, these two aims are in conflict, making Gedit an
adequate simple text editor and a poor editor for advanced coders.
</p>
<p>
After butting my head against basic usability issues with Gedit I was even
considered either modifying it extensively using plugins or maybe even forking
it and maintaining a forked version.
Yes, that would be a huge pain in the neck, but fortunately that will not now
be necessary.
</p>
<p>
In response to my blog post titled
<a href="http://www.mega-nerd.com/erikd/Blog/CodeHacking/rip_nedit.html">
<i>"R.I.P. Nedit"</i></a>
fellow Haskell hacker and Debian Haskell Group member
<a href="https://www.joachim-breitner.de/blog/">
Joachim Breitner</a>
suggested I have a look at the
<a href="http://www.geany.org/">
Geany text editor and IDE</a>.
</p>
<p>
Geany is obviously a tool aimed squarely as an editor for full time, committed
programmers.
Its also much more than just an editor, in that it has many features of an IDE
(Integrated Development Environment).
In fact, when I first fired it up it looked like this (click for a larger view):
</p>
<br/>
<center>
<a href="http://www.mega-nerd.com/erikd/Img/geany-default.png">
<img src="http://www.mega-nerd.com/erikd/Img/geany-default-small.png"
border="0" alt="Geany default window" />
</a>
</center>
<br/>
<p>
On seeing this I initially thought Geany was not for me.
Fortunately I found that the extra IDE-like features can easily be hidden,
providing me with a simple-to-use, highly configurable, advanced text editor.
The features I really like are:
</p>
<ul>
<li>High degree of configurability, including key bindings.
</li>
<li>Syntax highlighting (configurable) for a huge number of languages.
</li>
<li>Custom syntax highlighting (ie user definable highlighting for languages
not currently supported by Geany).
</li>
<li>Regex search and search/replace.
</li>
<li>Search and replace within a selected area only.
</li>
<li>Highlighting of matching braces and brackets.
</li>
<li>Language specific editing modes and auto indentation.
</li>
<li>Go to specified line number.
</li>
<li>Plugins.
</li>
</ul>
<p>
There are still a few little niggles, but nothing like the pain I experienced
trying to use Gedit.
For instance, when run from the command line, Geany will open new files in a
tab of an existing Geany instance.
With multiple desktop workspaces, this is sub optimal.
It would be much nicer if Geany would start a new instance if there was not
already an instance running on the current workspace.
After a brief inspection of the Gedit sources (Gedit has the desired feature),
I came up with a fix for this issue which I will be submitting to the Geany
development mailing list after a couple of days of testing.
</p>
<p>
Another minor issue (shared with Gedit) is that of fonts.
Nedit uses bitmap fonts while Geany (and Gedit) use TrueType fonts.
When I choose light coloured fonts on a black background I find the fonts in
Geany (and Gedit) a lot fuzzier than the same size fonts in Nedit.
I've tried a number of different fonts including
<a href="http://www.levien.com/type/myfonts/inconsolata.html">
Inconsolata</a>
but I've currently settled on
<a href="http://dejavu-fonts.org/wiki/Main_Page">
DejaVu Sans Mono</a>
although I'm not entirely satisfied.
</p>
<p>
Currently my Geany setup (editing some Haskell code) looks like this:
</p>
<br/>
<center>
<img src="http://www.mega-nerd.com/erikd/Img/geany-modded.png"
border="0" alt="Geany modified config" />
</center>
<br/>
<p>
Light text on a black background with highlighting using a small number of
colours; red for types, green for literals, yellow for keywords etc.
</p>
<p>
Geany is a great text editor.
For any committed coders currently using either Nedit or Gedit and not entirely
happy, I strongly recommend that you give Geany a try.
</p>
R.I.P. Nedit
http://www.mega-nerd.com/erikd/Blog/2010/07/27/rip_nedit.atom
2010-07-27T12:18:00Z
2010-07-27T12:18:00Z
<p>
For serious programmers, the text editor they user is an intensely personal
thing.
Try suggesting to an Emacs user that they should switch to Vim or vice-versa.
Most would shudder at the thought.
</p>
<p>
My choice of editor for the last 15 years has been Nedit, the
<a href="http://www.nedit.org/">
Nirvana Editor</a>.
Nedit has been an outstanding editor; feature full yet easy to use.
When I first started using it, Nedit was a closed source binary-only download
but sometime in the late 1990s, it was released under the GNU GPL.
</p>
<p>
Unfortunately Nedit has been suffering from bit rot and neglect for a number
of years.
The main problem is that it uses the
<a href="http://en.wikipedia.org/wiki/Motif_%28widget_toolkit%29">
Motif widget toolkit</a>.
For open source, there are basically two options for Motif;
<a href="http://lesstif.sourceforge.net/">
Lesstif</a>,
an LGPL reimplementation of Motif which has been basically unmaintained for
a number of years, or
<a href="http://www.opengroup.org/openmotif/">
OpenMotif</a>
released under a license which is in no way
<a href="http://www.opensource.org/">
OSI approved</a>.
On top of that, Nedit still doesn't support UTF-8, mainly because Lesstif
doesn't support it.
</p>
<p>
I have, in the past, tried to fix bugs in Nedit, but the bugs are not really
in Nedit itself, but in an interaction between Nedit whichever Motif library
it is linked against and the underlying X libraries.
Depending on whether Nedit is linked against Lesstif and OpenMotif, Nedit will
display different sets of bugs.
I have tried fixing bugs in Nedit linked against Lesstif, but got absolutely
nowhere.
Lesstif is one of the few code bases I have ever worked on that I was
completely unable to make progress on.
</p>
<p>
With Nedit getting flakier with each passing year I finally decided to switch
to a new editor.
I had already discounted Emacs and Vim; switching from Nedit to either of those
two archaic beasts was going to be way too painful.
Of all the FOSS editors available,
<a href="http://projects.gnome.org/gedit/">
Gedit</a>
seemed to be the closest in features to Nedit.
</p>
<p>
Unfortunately, Gedit does not compare well with Nedit feature wise.
To me it seems to try to be simultaneously as simple as possible and to have as
many features as possible and the features don't seem to fit together all that
well from a usability point of view.
On top of that, it lacks the following:
</p>
<ul>
<li>Regex search and regex search/replace.
Apparently there is a regex search/replace plugin, but that uses a
different hot key combination that literal search/repalce.
Nedit on the other hand uses the same dialog box for literal and
regex search/replaces; with a toggle button to switch between literal
and regex searches.
</li>
<li>Search and replace within the selected area only.
</li>
<li>Highlighting of matching braces and brackets.
</li>
<li>Language specific editing modes and auto indentation.
</li>
<li>A macro language allowing further customisation.
</li>
<li>A simple, quick way to go to a particular line number (for Gedit,
Control-L is supposed to work, but doesn't).
</li>
</ul>
<p>
On top of that Gedit could also do with some improved key bindings and some
improvements to its syntax highlighting patterns.
The Ocaml syntax highlighting is particularly poor.
</p>
<p>
I'm now going to try to use Gedit, by customising its setup and and using the
plugin system to see if I can regain the features that made Nedit such a
pleasure to use.
</p>
LLVM Backend : Milestone #1.
http://www.mega-nerd.com/erikd/Blog/2010/07/18/llvm_milestone1.atom
2010-07-18T12:18:00Z
2010-07-18T12:18:00Z
<p>
About 3 weeks ago I started work on the
<a href="http://www.mega-nerd.com/erikd/Blog/CodeHacking/DDC/llvm_backend.html">
LLVM backend for DDC</a>
and I have now reached the first milestone.
</p>
<p>
Over the weekend I attended
<a href="http://www.haskell.org/haskellwiki/AusHac2010AusHac2010">
AusHac2010</a>
and during Friday and Saturday I managed to get DDC modified so I could
compile a Main module via the existing C backend and another module via the
LLVM backend to produce an executable that ran, but gave an incorrect answer.
</p>
<p>
Today, I managed to get a very simple function actually working correctly.
The function is trivial:
</p>
<pre class="code">
identInt :: Int -> Int
identInt a = a
</pre>
<p>
and the generated LLVM code looks like this:
</p>
<pre class="code">
define external ccc %struct.Obj* @Test_identInt(%struct.Obj* %_va)
{
entry:
; _ENTER (1)
%local.slotPtr = load %struct.Obj*** @_ddcSlotPtr
%enter.1 = getelementptr inbounds %struct.Obj** %local.slotPtr, i64 1
store %struct.Obj** %enter.1, %struct.Obj*** @_ddcSlotPtr
%enter.2 = load %struct.Obj*** @_ddcSlotMax
%enter.3 = icmp ult %struct.Obj** %enter.1, %enter.2
br i1 %enter.3, label %enter.good, label %enter.panic
enter.panic:
call ccc void ()* @_panicOutOfSlots( ) noreturn
br label %enter.good
enter.good:
; ----- Slot initialization -----
%init.target.0 = getelementptr %struct.Obj** %local.slotPtr, i64 0
store %struct.Obj* null, %struct.Obj** %init.target.0
; ---------------------------------------------------------------
%u.2 = getelementptr inbounds %struct.Obj** %local.slotPtr, i64 0
store %struct.Obj* %_va, %struct.Obj** %u.2
;
br label %_Test_identInt_start
_Test_identInt_start:
; alt default
br label %_dEF1_a0
_dEF1_a0:
;
br label %_dEF0_match_end
_dEF0_match_end:
%u.3 = getelementptr inbounds %struct.Obj** %local.slotPtr, i64 0
%_vxSS0 = load %struct.Obj** %u.3
; ---------------------------------------------------------------
; _LEAVE
store %struct.Obj** %local.slotPtr, %struct.Obj*** @_ddcSlotPtr
; ---------------------------------------------------------------
ret %struct.Obj* %_vxSS0
}
</pre>
<p>
That looks like a lot of code but there are a couple of points to remember:
</p>
<ul>
<li>This includes code for DDC's garbage collector.
</li>
<li>DDC itself is still missing a huge number of optimisations that can
added <b>after</b> the compiler actually works.</li>
</ul>
<p>
I have found David Terei's LLVM AST code that I pulled from the
<a href="http://darcs.haskell.org/ghc/compiler/llvmGen/">
GHC sources</a>
very easy to use.
Choosing this code was definitely not a mistake and I have been corresponding
with David, which has resulted in a few updates to this code, including
<a href="http://web.archiveorange.com/archive/v/j7U5dKOOHeJ2TsDJkIiW">
a commit</a>
with my name on it.
</p>
<p>
LLVM is also conceptually very, very sound and easy to work with.
For instance, variables in LLVM code are allowed to contain the dot character,
so that its easy to avoid name clashes between C function/variable names and
names generated during the generation of LLVM code, by making generated names
contain a dot.
</p>
<p>
Finally, I love the fact that LLVM is a typed assembly language.
There would have been dozens of times over the weekend that I generated LLVM code
that the LLVM compiler rejected because it would't type check.
Just like when programming with Haskell, once the code type checked, it
actually worked correctly.
</p>
<p>
Anyway, this is a good first step.
Lots more work to be done.
</p>
LLVM Backend for DDC.
http://www.mega-nerd.com/erikd/Blog/2010/06/29/llvm_backend.atom
2010-06-28T20:51:00Z
2010-06-28T20:51:00Z
<p>
With the blessing of
<a href="http://www.cse.unsw.edu.au/~benl/">
Ben Lippmeier</a>
I have started work on an new backend for his
<a href="http://www.haskell.org/haskellwiki/DDC">
DDC</a>
compiler.
Currently, DDC has a backend that generates C code which then gets run
through GNU GCC to generate executables.
Once it is working, the new backend will eventually replace the C one.
</p>
<p>
The new DDC backend will target the very excellent
<a href="http://llvm.org">
LLVM</a>,
the
<a href="http://en.wikipedia.org/wiki/Low_Level_Virtual_Machine">
Low Level Virtual Machine</a>.
Unlike C, LLVM is specifically designed as a general retargetable compiler
backend.
It became the obvious choice for DDC when the GHC Haskell compiler
<a href="http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/Backends/LLVM">
added an LLVM backend</a>
which almost immediately showed great promise.
Its implementation was of relatively low complexity in comparison to the
existing backends and it also provided pretty impressive performance.
This GHC backend was implemented by
<a href="http://www.cse.unsw.edu.au/~davidt/papers.html">
David Terei</a>
as part of an undergraduate thesis in the
<a href="http://www.cse.unsw.edu.au/~pls/PLS/PLS.html">
Programming Languages and Systems</a>
group an UNSW.
</p>
<p>
Since DDC is written in Haskell, there are two obvious ways to implement an
LLVM backend:
</p>
<ol>
<li>Using the haskell LLVM bindings available on
<a href="http://hackage.haskell.org/package/llvm">
hackage</a>.
</li>
<li>Using David Terei's code that is part of the GHC compiler.
</li>
</ol>
<p>
At first glance, the former might well be the more obvious choice, but the LLVM
bindings have a couple of drawbacks from the point of view of using them in
DDC.
In the end, the main factor in choosing which to use was Ben's interest in
<a href="http://en.wikipedia.org/wiki/Bootstrapping_%28compilers%29">
boostrapping</a>
the compiler (compiling the compiler with itself) as soon as possible.
</p>
<p>
The existing LLVM bindings use a number of advanced Haskell features, that is,
features beyond that of the
<a href="http://www.haskell.org/onlinereport/">
Haskell 98</a>
standard.
If we used the LLVM bindings in DDC, that would mean the DDC would have to
support all the features needed by the binding before DDC could be
bootstrapped.
Similarly, the LLVM bindings use GHC's
<a href="http://en.wikipedia.org/wiki/Foreign_function_interface">
Foreign Function Interface (FFI)</a>
to call out the the LLVM library.
DDC currently does have some FFI support, but this was another mark against
the bindings.
</p>
<p>
By way of contrast, David Terei's LLVM backend for GHC is pretty much standard
Haskell code and since it generates text files containing LLVM's
<a href="http://llvm.org/docs/LangRef.html">
Intermediate Representation (IR)</a>,
a high-level, typed assembly language, there is no FFI problem.
The only downside of David's code is that the current version in the GHC Darcs
tree uses a couple of modules that are private to GHC itself.
Fortunately, it looks like these problems can be worked around with relatively
little effort.
</p>
<p>
Having decided to use David's code, I started hacking on a little test
project.
The aim of the test project to set up an LLVM Abstract Syntax Tree (AST) in
Haskell for a simple module.
The AST is then pretty printed as a textual LLVM IR file and assembled using
LLVM's <b><tt>llc</tt></b> compiler to generate native assembler.
Finally the assembler code is compiled with a C module containing a
<b><tt>main</tt></b> function which calls into the LLVM generated code.
</p>
<p>
After managing to get a basic handle on LLVM's IR code, the test project
worked; calling from C into LLVM generated code and getting the expected
result.
The next step is to prepare David's code for use in DDC while making it easy
to track David's upstream changes.
</p>
GHC 6.12.1 in Debian Testing.
http://www.mega-nerd.com/erikd/Blog/2010/03/14/ghc6.12_testing.atom
2010-03-14T07:58:00Z
2010-03-14T07:58:00Z
<p>
Joachim Breitner recently
<a href="http://lists.debian.org/debian-haskell/2010/03/msg00102.html">
announced</a>
on the
<a href="http://lists.debian.org/debian-haskell/">
Debian Haskell mailing list</a>
that version 6.12.1 of the Glasgow/Glorious Haskell compiler was about to
transition from Debian unstable to Debian testing.
That has now happened.
This means there is a very good chance it will be part of the next stable
release of Debian.
</p>
<p>
A big thanks is due to Kari Pahula, the Debian maintainer for GHC who managed
to get this version of GHC working on a bunch of CPU architectures not
officially supported by the upstream GHC maintainers.
Deserving of equal attention are Joachim Breitner and Marco Túlio Gontijo e
Silva who did a large amount of real quality work to improve the way Haskell
libraries are packaged in Debian.
</p>
<p>
The big change recently was drastic improvements in the way library dependencies
are tracked across packages which will make it much easier to write tools to
automatically check for broken dependency chains.
Packaging Haskell libraries for Debian is now a relatively trivial and fool
proof exercise.
Packaging a library which is on
<a href="http://hackage.haskell.org/packages/archive/pkg-list.html">
Hackage</a>
can take as little as 5 minutes.
</p>
<p>
With the current version of GHC in Debian and a large and growing collection of
Haskell libraries, writing Haskell code on Debian using nothing but Debian
packages is now a pleasure.
Ubuntu and other Debian and Ubuntu derived distributions of course also benefit
from this work.
</p>
Intel Embedded Graphics Driver Fail.
http://www.mega-nerd.com/erikd/Blog/2010/03/11/intel_graphics_fail.atom
2010-03-11T09:24:00Z
2010-03-11T09:24:00Z
<p>
In my day job I do Linux embedded work and as people in the embedded world know,
Linux is a pretty commonly used embedded OS.
Today I was evaluating a new board and found it had an Intel graphics chip that
was not properly detected by Ubuntu 9.10.
The ever trusty <tt><b>lspci</b></tt> said this:
</p>
<pre class="code">
00:02.0 VGA compatible controller: Intel Corporation System Controller Hub
(SCH Poulsbo) Graphics Controller (rev 07)
</pre>
<p>
We all know that Intel employs a bunch of well known
<a href="http://www.x.org/wiki/">
Xorg</a>
developers, so this shouldn't be a problem, right?
</p>
<p>
Unfortunately, it is a problem.
Intel's offering for this chipset is the
<a href="http://edc.intel.com/Software/Downloads/IEGD/">
Intel® Embedded Graphics Drivers</a>
web page where they offer a 124 megabyte download (registration required).
After registration you get to choose which driver pack you want and which OS
you are downloading it for.
Ubuntu was not on the list and neither was Debian.
I chose Fedora 10 (released in 2008) as that was the most recent one.
</p>
<p>
Now, you can image my surprise when the driver download for Fedora Linux
contained just four files:
</p>
<pre class="code">
Archive: /tmp/IEGD_10_3_GOLD.zip
testing: UsersGuide_10_3_1525.pdf OK
testing: IEGD_10_3_GOLD_1525.exe OK
testing: IEGD_SU_10_3_GOLD.pdf OK
testing: RELNOTES_10_3_1525.txt OK
No errors detected in compressed data of /tmp/IEGD_10_3_GOLD.zip.
</pre>
<p>
Yep, thats right, the driver download for Fedora Linux contains two PDF files,
a text file and an executable installer for <i>Windows</i>.
</p>
<p>
Being the curious (and paranoid) type I decided to explore this further,
by running the installer under
<a href="http://www.winehq.com/">
WINE</a>
in a
<a href="http://en.wikipedia.org/wiki/Chroot">
chroot</a>.
After the installer you get left with several metric craploads of Java Jar
files, and another windows executable <tt><b>iegd-ced.exe</b></tt> that
supposedly configures this nightmare.
I ran it (again, under WINE in a chroot) but it didn't seem to do anything
sensible or worthwhile so I looked around amongst the other installed files
and found <tt><b>IEGD_10_3_Linux.tgz</b></tt>.
</p>
<p>
Inside that tarball there are a bunch of Xorg library binaries (for several
different versions of Xorg), a large chunk of source code that gets compiled
into the Linux kernel and even better yet, a couple of Microsoft Visual
Studio project files.
WTF?
</p>
<p>
Unbe-fscking-lievable.
Needless to say, I will avoid any hardware which uses this chipset and
any other hardware that requires binary only kernel blobs packaged this
badly.
Doing so makes my life easier.
</p>
<p>
The people at Intel who thought this was a good idea must have their own
personal mother-lode of stupid.
</p>
Debugging Memory Leaks in a GTK+ House of Cards.
http://www.mega-nerd.com/erikd/Blog/2010/01/03/house_of_cards.atom
2010-01-03T00:38:00Z
2010-01-03T00:38:00Z
<p>
Recently I've been hacking on
<a href="http://blog.kfish.org/">
Conrad Parker</a>'s
sound editing, audio mangling and DJing tool
<a href="http://www.metadecks.org/software/sweep/">
Sweep</a>.
As part of my
<a href="http://trac.metadecks.org/timeline?from=01%2F01%2F10&daysback=30&changeset=on">
bug fixing and clean up work</a>
I ran Sweep under the Linux world's favourite memory debugging tool
<a href="http://valgrind.org/">
Valgrind</a>.
Even after running valgrind with the
<a href="http://live.gnome.org/Valgrind">
officially sanctioned</a>
environment variables and gtk suppressions file, the resulting
<a href="http://www.mega-nerd.com/erikd/Blog/files/sweep-valgrind.txt.gz">
500k gzipped output file</a>
was a little shocking.
</p>
<p>
Now I'm pretty sure a number of those leaks are in Sweep, but a significant
number of them seem to be in GTK+ and Glib.
Since trying to differentiate the leaks in Sweep from the leaks in GTK+ was
proving to be a very difficult and frustrating task I decided to look at the
behaviour of a simple GTK+ program under Valgrind.
The program I chose was the
<a href="http://library.gnome.org/devel/gtk-tutorial/stable/c39.html#SEC-HELLOWORLD">
helloworld</a>
example from the
<a href="http://library.gnome.org/devel/gtk-tutorial/stable/">
GTK+ tutorial</a>.
</p>
<p>
Compiling that on Ubuntu 9.10 and running it under valgrind using the following
commands:
</p>
<pre class="code">
export G_SLICE=always-malloc
export G_DEBUG=gc-friendly
valgrind --tool=memcheck --leak-check=full --leak-resolution=high \
--num-callers=50 --show-reachable=yes --suppressions=gtk.suppression \
helloworld > helloworld-vg.txt 2>&1
</pre>
<p>
resulted in a memcheck summary of:
</p>
<pre class="code">
==22566== LEAK SUMMARY:
==22566== definitely lost: 1,449 bytes in 8 blocks
==22566== indirectly lost: 3,716 bytes in 189 blocks
==22566== possibly lost: 4,428 bytes in 107 blocks
==22566== still reachable: 380,505 bytes in 7,898 blocks
==22566== suppressed: 35,873 bytes in 182 blocks
</pre>
<p>
The full memcheck report is
<a href="http://www.mega-nerd.com/erikd/Blog/files/helloworld-vg.txt.gz">
available here</a>.
</p>
<p>
The simplest GTK+ hello world program is 100 lines of code and results in a leak
report of over 8000 leaked blocks even when using the recommended valgrind
suppressions file and GTK+ debugging environment variables.
If someone modifies that code and adds another leak, trying to find that leak
needle in the GTK+ leak haystack is going to be a needlessly difficult task.
</p>
<p>
Researching this some more I find that GTK+ is known to do a large number of
allocations that are done once per program run and are never released.
Furthermore the GTK+ developers seem to think this is ok and from the point of
view of a user running a GTK+ program this is true.
However for developers coding against GTK+ and hoping to use Valgrind to find
leaks in their own code, this is a royal PITA.
Leaks in the developer's code can easily be swamped and hidden by GTK+ leaks.
My guess is that most people don't even bother checking unless their program's
memory footprint grows over time for no good reason.
</p>
<p>
Obviously, I'm not the first to realise how hard it is too debug memory leaks
in a program when the library it links against throws up so many warnings.
In fact, back in 2001
<a href="https://bugzilla.gnome.org/show_bug.cgi?id=64096">
a bug was raised in the GTK+ bug tracker</a>
requesting the addition of a call to be used only during debugging that would
release all memory so that client programs are easier to debug.
That bug has remained open and without action for over 8 years.
</p>
<p>
As far as I am concerned, this is completely unacceptable.
<del>If this was my code, I would be too ashamed to put my name on it.</del>
<b>Edit:</b> Being able to valgrind GTK+ client code is worth the effort and
cost of changing the otherwise perfectly reasonable behaviour of not accounting
for lifetime-of-the-app data structures (thanks Andrew).
</p>
<p>
<b>Note:</b> Anyone who wishes to comment on this can do so
<a href="http://www.reddit.com/r/programming/comments/akxak/debugging_memory_leaks_in_a_gtk_house_of_cards/">
on reddit</a>.
</p>