m3ga blog

Sat, 01 Jan 2011

LLVM Backend for DDC : Very Nearly Done.

The LLVM backend for DDC that I've been working on sporadically since June is basically done. When compiling via the LLVM backend, all but three of 100+ tests in the DDC test suite pass. The tests that pass when going via the C backend but fail via the LLVM backend are of two kinds:

Use DDC's foreign import construct to name a C macro to perform a type cast where the macro is defined in one of C header files.
Use static inline functions in the C backend to do peek and poke operations on arrays of unboxed values.

In both of these cases, DDC is using features of the C language to make code generation easier. Obviously, the LLVM backend needs to do something else to get the same effect.

Fixing the type casting problem should be relatively simple. Ben is currently working on making type casts a primitive of the Core language so that both the C and LLVM backends can easily generate code for them.

The array peek and poke problem is little more complex. I suspect that it too will require the addition of new Core language primitive operations. This is a much more complex problem than the type casting one and I've only just begun to start thinking about it.

Now that the backend is nearly done, its not unreasonable to look at its performance. The following table shows the compile and run times of a couple of tests in the DDC test suite compiling via the C and the LLVM backend.

Test name	C Build Time	LLVM Build Time	C Run Time	LLVM Run Time
93-Graphics/Circle	3.124s	3.260s	1.701s	1.536s
93-Graphics/N-Body/Boxed	6.126s	6.526s	7.649s	4.899s
93-Graphics/N-Body/UnBoxed	3.559s	4.017s	9.843s	6.162s
93-Graphics/RayTracer	12.890s	13.102s	13.465s	8.973s
93-Graphics/SquareSpin	2.699s	2.889s	1.609s	1.604s
93-Graphics/Styrene	13.685s	14.349s	11.312s	8.527s

Although there is a small increase in compile times when compiling via LLVM, the LLVM run times are significantly reduced. The conceptual complexity of the LLVM backend is also low (the line count is about 4500 lines, which will probably fall with re-factoring) and thanks to LLVM's type checking being significantly better than C's, I think its reasonable to be more confident in the quality of the LLVM backend than the existing C backend. Finally, implementing things like proper tail call optimisation will be far easier in LLVM backend than in C.

All in all, I think doing this LLVM backend has been an interesting challenge and will definitely pay off in the long run.

Posted at: 13:54 | Category: CodeHacking/DDC | Permalink

Wed, 01 Dec 2010

LLVM Backend for DDC : Milestone #3.

After my last post on this topic, I ran into some problems with the AST (abstract syntax tree) that was being passed to my code for LLVM code generation. After discussing the problem with Ben, he spent some time cleaning up the AST definition, the result of which was that nearly all the stuff I already had, stopped working. This was a little disheartening. That and the fact that I was really busy, meant that I didn't touch the LLVM backend for a number of weeks.

When I finally did get back to it, I found that it wasn't as broken as I had initially thought. Although the immediate interface between Ben's code and mine had changed significantly, all the helper functions I had written were still usable. Over a week and a bit, I managed to patch everything up again and get back to where I was. I also did a lot of cleaning up and came up with a neat solution to a problem which was bugging me during my previous efforts.

The problem was that structs defined via the LLVM backend needed to have exactly the same memory layout as the structs defined via the C backend. This is a strict requirement for proper interaction between code generated via C and LLVM. This was made a little difficult by David Terei's haskell LLVM wrapper code (see previous post) which makes all structs packed by default, while structs on the C side were not packed. Another dimension of this problem was finding an easy way to generate LLVM code to access structs in a way that was easy to read and debug in the code generator and also not require different code paths for generating 32 and 64 bit code.

Struct layout is tricky. Consider a really simple struct like this:


  struct whatever
  {   int32_t tag ;
      char * pointer ;
  } ;

On a 32 bit system, that struct will take up 8 bytes; 4 bytes for the int32_t and 4 for the pointer. However, on a 64 bit system, where pointers are 8 bytes in size, the struct will take up 16 bytes. Why not 12 bytes? Well, some 64 bit CPUs (Alpha and Sparc64 are two I can think of) are not capable of unaligned memory accesses; a read from memory into a CPU register where the memory address (in bytes) is not an integer multiple of the size of the register. Other CPUs like x86_64 can read unaligned data, but reading unaligned data is usually slower than reading correctly aligned data.

In order to avoid unaligned, the compiler assumes that the start address of the struct will be aligned to the correct alignment for the biggest CPU register element in the struct, in this case the pointer. It then adds 4 bytes of padding between the int32_t and the pointer to ensure that if the struct is correctly aligned then the pointer will also be correctly aligned.

Because structs are packed in the David Terei's code, the above struct would require a different definition on 32 and 64 bit systems, ie


  ; 32 bit version of the struct
  %struct.whatever.32 = type <{ i32, i8 * }>

  ; 64 bit version of the struct
  %struct.whatever.64 = type <{ i32, [4 * i8], i8 * }>

where the 64 bit version contains 4 padding bytes. However, the difference between these two definitions causes another problem. To access fields within a struct, LLVM code uses the getelementptr operator which addresses fields by index. Unfortunately, the index (zero based) of the pointer is 1 for the 32 bit version and 2 for the 64 bit version. That would make code generation a bit of a pain in the neck.

The solution is allow the specification of LLVM structs in Haskell as a list of LlvmStructField elements, using


  data LlvmStructField
        = AField String LlvmType    -- Field name and type.
        | APadTo2                   -- Pad next field to a 2 byte offset.
        | APadTo4                   -- Pad next field to a 4 byte offset.
        | APadTo8                   -- Pad next field to a 8 byte offset.

        | APadTo8If64               -- Pad next field to a 8 byte offset only
                                    -- for 64 bit.

Note that the AField constructor requires both a name and the LlvmType. I then provide functions to convert the LlvmStructField list into an opaque LlvmStructDesc type and provide the following functions:


  -- | Turn an struct specified as an LlvmStructField list into an
  -- LlvmStructDesc and give it a name. The LlvmStructDesc may
  -- contain padding to make it conform to the definition.
  mkLlvmStructDesc :: String -> [LlvmStructField] -> LlvmStructDesc

  -- | Retrieve the struct's LlvmType from the LlvmStructDesc.
  llvmTypeOfStruct :: LlvmStructDesc -> LlvmType

  -- | Given and LlvmStructDesc and the name of a field within the
  -- LlvmStructDesc, retrieve a field's index with the struct and its
  -- LlvmType.
  structFieldLookup :: LlvmStructDesc -> String -> (Int, LlvmType)

Once the LlvmStructDesc is built for a given struct, fields within the struct can be addressed in the LLVM code generator by name, making the Haskell code generator code far easier to read.

Pretty soon after I got the above working I also managed to get enough LLVM code generation working to compile a complete small program which then runs correctly. I consider that to be milestone 3.

Posted at: 20:41 | Category: CodeHacking/DDC | Permalink

Sun, 22 Aug 2010

LLVM Backend for DDC : Milestone #2.

For a couple of weeks after AusHac 2010 I didn't manage to find any time to working on DDC at all, but I'm now back on it and late last week I reached the second milestone on the LLVM backend for DDC. The backend now has the ability to box and unbox 32 bit integers and perform simple arithmetic operations on valid combinations of them.

Disciple code that can currently be compiled correctly via LLVM includes basic stuff like:


  identInt :: Int -> Int
  identInt a = a

  plusOneInt :: Int -> Int
  plusOneInt x = x + 1

  addInt :: Int -> Int -> Int
  addInt a b = a + b

  addInt32U :: Int32# -> Int32# -> Int32#
  addInt32U a b = a + b

  addMixedInt :: Int32# -> Int -> Int
  addMixedInt a b = boxInt32 (a + unboxInt32 b)

  cafOneInt :: Int
  cafOneInt = 1

  plusOne :: Int -> Int
  plusOne x = x + cafOneInt

where Int32# specifies an unboxed 32 bit integer and Int32 specifies the boxed version.

While writing the Haskell code for DDC, I'm finding that its easiest to generate LLVM code for a specific narrow case first and then generalize it as more cases come to light. I also found that the way I had been doing the LLVM code generation was tedious and ugly, invloving lots of concatenation of small lists. To fix this I built myself an LlvmM monad on top of the StateT monad:


  type LlvmM = StateT [[LlvmStatement]] IO

Using this I can then generate a block of LLVM code as a list of LlvmStatements and add it to the monad using an addBlock function which basically pushes the blocks of code down onto a stack:


  addBlock :: [LlvmStatement] -> LlvmM ()
  addBlock code
   = do	  state	<- get
          put (code : state)

The addBlock function is then used as the base building block for a bunch of more specific functions like these:


  unboxInt32 :: LlvmVar -> LlvmM LlvmVar
  unboxInt32 objptr
   | getVarType objptr == pObj
   = do     int32    <- lift $ newUniqueReg i32
            iptr0    <- lift $ newUniqueNamedReg "iptr0" (pLift i32)
            iptr1    <- lift $ newUniqueNamedReg "iptr1" (pLift i32)
            addBlock
                    [ Comment [ show int32 ++ " = unboxInt32 (" ++ show objptr ++ ")" ]
                    , Assignment iptr0 (GetElemPtr True objptr [llvmWordLitVar 0, i32LitVar 0])
                    , Assignment iptr1 (GetElemPtr True iptr0 [llvmWordLitVar 1])
                    , Assignment int32 (Load iptr1) ]
            return  int32


  readSlot :: Int -> LlvmM LlvmVar
  readSlot 0
   = do   dstreg    <- lift $ newUniqueNamedReg "slot.0" pObj
          addBlock  [ Comment [ show dstreg ++ " = readSlot 0" ]
                    , Assignment dstreg (Load localSlotBase) ]
          return    dstreg

  readSlot n
   | n > 0
   = do   dstreg    <- lift $ newUniqueNamedReg ("slot." ++ show n) pObj
          r0        <- lift $ newUniqueReg pObj
          addBlock  [ Comment [ show dstreg ++ " = readSlot " ++ show n ]
                    , Assignment r0 (GetElemPtr True localSlotBase [llvmWordLitVar n])
                    , Assignment dstreg (Load (pVarLift r0)) ]
          return    dstreg

  readSlot n = panic stage $ "readSlot with slot == " ++ show n

which are finally hooked up to do things like:


  llvmVarOfExp (XUnbox ty@TCon{} (XSlot v _ i))
   = do   objptr    <- readSlot i
          unboxAny (toLlvmType ty) objptr

  llvmVarOfExp (XUnbox ty@TCon{} (XForce (XSlot _ _ i)))
   = do   orig      <- readSlot i
          forced    <- forceObj orig
          unboxAny (toLlvmType ty) forced

When the code generation of a single function is complete it the list of LlvmStatement blocks is then retrieved, reversed and concatenated to produce the list of LlvmStatements for the function.

With the LlvmM monad in place converting DDC's Sea AST into LLVM code is now pretty straight forward. Its just a matter of finding and implementing all the missing pieces.

Posted at: 13:43 | Category: CodeHacking/DDC | Permalink

Sun, 18 Jul 2010

LLVM Backend : Milestone #1.

About 3 weeks ago I started work on the LLVM backend for DDC and I have now reached the first milestone.

Over the weekend I attended AusHac2010 and during Friday and Saturday I managed to get DDC modified so I could compile a Main module via the existing C backend and another module via the LLVM backend to produce an executable that ran, but gave an incorrect answer.

Today, I managed to get a very simple function actually working correctly. The function is trivial:


  identInt :: Int -> Int
  identInt a = a

and the generated LLVM code looks like this:


  define external ccc %struct.Obj* @Test_identInt(%struct.Obj* %_va)  
  {
  entry:
      ; _ENTER (1)
      %local.slotPtr = load %struct.Obj*** @_ddcSlotPtr
      %enter.1 = getelementptr inbounds %struct.Obj** %local.slotPtr, i64 1
      store %struct.Obj** %enter.1, %struct.Obj*** @_ddcSlotPtr
      %enter.2 = load %struct.Obj*** @_ddcSlotMax
      %enter.3 = icmp ult %struct.Obj** %enter.1, %enter.2
      br i1 %enter.3, label %enter.good, label %enter.panic
  enter.panic:
      call ccc void ()* @_panicOutOfSlots(  ) noreturn
      br label %enter.good
  enter.good:
      ; ----- Slot initialization -----
      %init.target.0 = getelementptr  %struct.Obj** %local.slotPtr, i64 0
      store %struct.Obj* null, %struct.Obj** %init.target.0
      ; ---------------------------------------------------------------
      %u.2 = getelementptr inbounds %struct.Obj** %local.slotPtr, i64 0
      store %struct.Obj* %_va, %struct.Obj** %u.2
      ; 
      br label %_Test_identInt_start
  _Test_identInt_start:
      ; alt default
      br label %_dEF1_a0
  _dEF1_a0:
      ; 
      br label %_dEF0_match_end
  _dEF0_match_end:
      %u.3 = getelementptr inbounds %struct.Obj** %local.slotPtr, i64 0
      %_vxSS0 = load %struct.Obj** %u.3
      ; ---------------------------------------------------------------
      ; _LEAVE
      store %struct.Obj** %local.slotPtr, %struct.Obj*** @_ddcSlotPtr
      ; ---------------------------------------------------------------
      ret %struct.Obj* %_vxSS0
  }

That looks like a lot of code but there are a couple of points to remember:

This includes code for DDC's garbage collector.
DDC itself is still missing a huge number of optimisations that can added after the compiler actually works.

I have found David Terei's LLVM AST code that I pulled from the GHC sources very easy to use. Choosing this code was definitely not a mistake and I have been corresponding with David, which has resulted in a few updates to this code, including a commit with my name on it.

LLVM is also conceptually very, very sound and easy to work with. For instance, variables in LLVM code are allowed to contain the dot character, so that its easy to avoid name clashes between C function/variable names and names generated during the generation of LLVM code, by making generated names contain a dot.

Finally, I love the fact that LLVM is a typed assembly language. There would have been dozens of times over the weekend that I generated LLVM code that the LLVM compiler rejected because it would't type check. Just like when programming with Haskell, once the code type checked, it actually worked correctly.

Anyway, this is a good first step. Lots more work to be done.

Posted at: 22:18 | Category: CodeHacking/DDC | Permalink

Tue, 29 Jun 2010

LLVM Backend for DDC.

With the blessing of Ben Lippmeier I have started work on an new backend for his DDC compiler. Currently, DDC has a backend that generates C code which then gets run through GNU GCC to generate executables. Once it is working, the new backend will eventually replace the C one.

The new DDC backend will target the very excellent LLVM, the Low Level Virtual Machine. Unlike C, LLVM is specifically designed as a general retargetable compiler backend. It became the obvious choice for DDC when the GHC Haskell compiler added an LLVM backend which almost immediately showed great promise. Its implementation was of relatively low complexity in comparison to the existing backends and it also provided pretty impressive performance. This GHC backend was implemented by David Terei as part of an undergraduate thesis in the Programming Languages and Systems group an UNSW.

Since DDC is written in Haskell, there are two obvious ways to implement an LLVM backend:

Using the haskell LLVM bindings available on hackage.
Using David Terei's code that is part of the GHC compiler.

At first glance, the former might well be the more obvious choice, but the LLVM bindings have a couple of drawbacks from the point of view of using them in DDC. In the end, the main factor in choosing which to use was Ben's interest in boostrapping the compiler (compiling the compiler with itself) as soon as possible.

The existing LLVM bindings use a number of advanced Haskell features, that is, features beyond that of the Haskell 98 standard. If we used the LLVM bindings in DDC, that would mean the DDC would have to support all the features needed by the binding before DDC could be bootstrapped. Similarly, the LLVM bindings use GHC's Foreign Function Interface (FFI) to call out the the LLVM library. DDC currently does have some FFI support, but this was another mark against the bindings.

By way of contrast, David Terei's LLVM backend for GHC is pretty much standard Haskell code and since it generates text files containing LLVM's Intermediate Representation (IR), a high-level, typed assembly language, there is no FFI problem. The only downside of David's code is that the current version in the GHC Darcs tree uses a couple of modules that are private to GHC itself. Fortunately, it looks like these problems can be worked around with relatively little effort.

Having decided to use David's code, I started hacking on a little test project. The aim of the test project to set up an LLVM Abstract Syntax Tree (AST) in Haskell for a simple module. The AST is then pretty printed as a textual LLVM IR file and assembled using LLVM's llc compiler to generate native assembler. Finally the assembler code is compiled with a C module containing a main function which calls into the LLVM generated code.

After managing to get a basic handle on LLVM's IR code, the test project worked; calling from C into LLVM generated code and getting the expected result. The next step is to prepare David's code for use in DDC while making it easy to track David's upstream changes.

Posted at: 06:51 | Category: CodeHacking/DDC | Permalink

Tue, 17 Nov 2009

DDC : Man or Boy?

Computer scientist Donald Knuth came up with something he called the Man or Boy Test as a way of evaluating implementations of the ALGOL60 language (standardized in 1963) to distinguish compilers that correctly implemented "recursion and non-local references" from those that did not. Knuth said:

"I have written the following simple routine, which may separate the 'man-compilers' from the 'boy-compilers'."

My first attempt at solving this problem in Disciple resulted in me raising bug #148 in the DDC bug tracker with the following code:


  -- Compiler needs a little help inferring the types.
  a :: Int -> a -> a -> a -> a -> a -> Int
  a k x1 x2 x3 x4 x5
   = do    b () = do { k := k - 1 ; a k b x1 x2 x3 x4 }
           if k <= 0 then x4 () + x5 () else b ()

  fn n = \() -> n

  main ()  -- Function 'a' should return -67
   = do    out = a 10 (fn 1) (fn -1) (fn -1) (fn 1) (fn 0)
           if out /= -67
               then println $ "Output was " % show out % ". Should have been -67."
               else println "Passed!"

Fiddling around with the problem a bit, I suddenly realised that the Disciple language has call-by-reference semantics by default (by way of contrast, the C programming language has default call-by-value semantics with optional call-by-reference semantics using pointers).

While chatting with Ben on IRC he suggested using a copy to create a local copy of the function parameter that gets mutated so that mutation doesn't change the value outside call frame.

Here are two correct solutions to the Man or Boy problem:


  a0 :: Int -> a -> a -> a -> a -> a -> Int
  a0 k x1 x2 x3 x4 x5
   = do   b () = do { k := k - 1 ; a0 (copy k) b x1 x2 x3 x4 }
          if k <= 0 then x4 () + x5 () else b ()


  a1 :: Int -> a -> a -> a -> a -> a -> Int
  a1 k x1 x2 x3 x4 x5
   = do   m = copy k
          b () = do { m := m - 1 ; a1 m b x1 x2 x3 x4 }
          if k <= 0 then x4 () + x5 () else b ()

  fn n = \() -> n

  main ()
   = do   out0 = a0 10 (fn 1) (fn -1) (fn -1) (fn 1) (fn 0)
          out1 = a1 10 (fn 1) (fn -1) (fn -1) (fn 1) (fn 0)

          println "All outputs below should be equal to -67."
          println $ "Output 0 : " % show out0
          println $ "Output 1 : " % show out1

Both of these Disciple solutions are significantly less complex than the equivalent Haskell solution.

While I have no problem with function parameters being passed by reference, I don't think its a good idea to have those parameters being mutable by default (ie with the values also changing in the calling function).

I need to play with this some more.

Posted at: 22:03 | Category: CodeHacking/DDC | Permalink

Sun, 15 Nov 2009

Hacking DDC.

Over the last couple of months I've been doing a bit of hacking on an experimental compiler called DDC. This has been some of the most interesting, gratifying and challenging hacking I have done in years. Having this much fun should probably be illegal!!

I was introduced to DDC at the April 2008 meeting of FP-Syd when Ben Lippmeier, its author, gave a presentation titled "The Disciplined Disciple Compiler". The two main reasons this compiler is interesting are:

Its written in Haskell an advanced purely functional programming language.
The language it compiles (Disciple) has some interesting solutions to the problems of side effects and mutability.

The Disciple language is very Haskell-like but has some extra features in the type system which allows the compiler to track mutability and side effects in the type system. The important differences between the Disciple language and the Haskell language are listed on the DDC web page as:

Strict Evaluation Order is the default, laziness is introduced explicitly.
Type directed Field Projections complement type classing.
All data objects support Destructive Update.
The Effect System tracks what computational effects are being used in a program, without the need for state monads.
The Class System ensures that effects and destructive update play nicely with laziness.
Closure Typing is used to track data sharing, and to preserve soundness in the presence of Polymorphic Update.

Obviously a compiler that is doing all this really clever stuff has to be pretty complicated, but it still only weighs in at about 50k lines of code.

The main challenge in working on this is that i am not a very experienced Haskell programmer. There are also large chunks of the compiler doing some very complicated stuff that I don't even have a hope of understanding without reading and understanding Ben's PhD thesis.

Despite that, Ben was willing to give me commit access to the Darcs repo and I have been able to significantly reduce the number of bugs in the DDC bugtracker. Since I was already pretty familiar with the concepts of lexing and parsing as well as being familiar with Parsec (probably the most widely used parsing tool in the Haskell community) I started off fixing some simple lexer and parser bugs like:

I then managed to hack in support for Int64 and Float64 (#106) followed by some significant re-factoring of the Parsec parser which reduced the usage of the Parsec.try construct allowing Parsec to produce much better error messages.

Once I'd done all that, I ran into a very busy time at work and didn't mess with DDC for a couple of months. When I finally got back to looking at DDC, I realised that nearly all of the remaining bugs were much deeper than the bugs I had tackled so far. Tackling these deeper bugs required a new strategy as follows:

Scan the bug list for reports that either had test cases already or give enough information for me to proceed.
Create a new darcs branch for each bug. This allowed me to work on multiple different bugs at once so that if I got stuck on any one specific bug, I could just leave it and move on to another.
Create a reproducible test case if one didn't exist already.
Create a shell script in the root directory of each branch which did make and then ran the specific test case for this specific bug.
Use Haskell's Debug.Trace module in conjunction with Haskell's very wonderful Show Type Class to add debug statements to the code.
Use the Wolf Fencing debugging technique to narrow down the problem to specific area of the code.

Once the problem had been narrowed down to a piece of code, all that remained was to develop a fix. In many cases this resulted in me asking Ben how he'd like it fixed, either in email or on IRC. I also often came up with an ugly fix at first which was refined and cleaned up before being applied and pushed upstream.

With the above methodology I was able to fix a number of deeper and more complex bugs like the following:

I'm now getting a pretty good idea of how the compiler is put together and I'm stretching my hacking into feature enhancements.

My enthusiasm for DDC was recently validated by functional programming guru Oleg Kiselyov's comment on the haskell-cafe mailing list:

"One may view ML and Haskell as occupying two ends of the extreme. ML assumes any computation to be effectful and every function to have side effects. Therefore, an ML compiler cannot do optimizations like reordering (even apply commutative laws where exists), unless it can examine the source code and prove that computations to reorder are effect-free. .....

Haskell, on the other hand, assumes every expression pure. Lots algebraic properties become available and can be exploited, by compilers and people. ....

Hopefully a system like DDC will find the middle ground."

Anyway, back to hacking ....

Posted at: 21:28 | Category: CodeHacking/DDC | Permalink