Thu, 22 Mar 2007

Lazy Lists.

Lazy evaluation is a default feature of the Haskell programming language and an optional feature of Ocaml. Most programming languages (Ocaml, C, C++, Perl, Python, Java etc) use eager evaluation; where a result specified by a line of code is calculated as soon as the program gets to that line. Lazy evaluation on the other hand, defers the calculation of a result until that result is needed.

The real beauty of lazy evaluation is that a result that is never used is never evaluated. Lazy evaluation also allows the specification of lists which are effectively infinite, as long as the programmer doesn't actually try to access every element in the list. Obviously, attempting to do so would take infinite time and and require infinite memory to actually hold the list :-).

While searching for information on Ocaml's lazy programming features I came across a post at the enchanted mind blog. That post is ok, but the code is just snippets and when put together as it is, doesn't actually work.

After a bit of fiddling around, I managed to get it working. However, once I understood it, I didn't think the example was as good as it could be. Firstly, the input to the lazy list is just a standard finite length Ocaml list, but more importantly it doesn't give any idea of how to do a potentially infinite list which is a much more interesting case.

That left the field open for a nice blog post demonstrating lazy lists in Ocaml. Read on.

Anybody who has done high school or higher mathematics would probably have come across recurrence relations the most well know of which is the Fibonacci sequence.

The Fibonacci sequence is often used as example for teaching the concept of recursion in computer science (even if some people think there are better examples). The Fibonacci sequence can be expressed recursively in Ocaml like this:


  let rec fibonacci n =
      match n with
      |    1 -> 1
      |    2 -> 1
      |    x -> (fibonacci (n - 1)) + (fibonacci (n - 2))

If one wanted to generate a list containing say the first 20 Fibonacci numbers using the above recursive function, the 19th number in the sequence would be calculated twice, the 18th number three times so on. Its simply not efficient.

A better solution is to use a lazy list, which calculates new values of the sequence as they are needed, based on entries already in the list. Here's an example that creates a lazy list of the fibonacci numbers:


  type lazy_fib_t =
      Node of int * lazy_fib_t Lazy.t

  let create_fib_list () =
      let rec fib_n minus_2 minus_1 =
          let n = minus_1 + minus_2 in
          Printf.printf "fib_n %d %d -> %d\n" minus_2 minus_1 n ;
          Node (n, lazy (fib_n minus_1 n))
      in
      lazy (Node (1, lazy (Node (1, lazy (fib_n 1 1)))))

  let print_fib_list depth lst =
      let rec sub_print current remaining =
          if current > depth then ()
          else
          match Lazy.force remaining with
          |    Node (head, tail) ->
                  Printf.printf "%3d : %d\n" current head ;
                  sub_print (current + 1) tail
      in
      sub_print 0 lst

  let _ =
      let fib_list = create_fib_list () in
      print_fib_list 4 fib_list ;
      print_endline "------------" ;
      print_fib_list 6 fib_list ;

This is a complete working Ocaml program. To run it, just save the text to a file, say "lazy_fib.ml" and then do:


  ocaml lazy_fib.ml

We'll look at the output in detail later. First lets break it down; looking at the program, from top to bottom we have:


  type lazy_fib_t =
      Node of int * lazy_fib_t Lazy.t

The above two lines define a recursive type called lazy_fib_t, which has a single variant called Node which contains a tuple of an integer and the head of a lazy list.


  let create_fib_list () =
      let rec fib_n minus_2 minus_1 =
          let n = minus_1 + minus_2 in
          Printf.printf "fib_n %d %d -> %d\n" minus_2 minus_1 n ;
          Node (n, lazy (fib_n minus_1 n))
      in
      lazy (Node (1, lazy (Node (1, lazy (fib_n 1 1)))))

The function above, create_fib_list, creates a lazy list. It also contains an internal function, fib_n, which we'll look at later. The last line of the function is where all the magic is; it creates three nodes of a lazy list, the first two containing the first two integers of the Fibonacci sequence and a third node which is a closure, containing a call to the internal function fib_n with the correct parameters to generate the next number in the sequence.

The internal function fib_n takes two parameters, the values of the sequence for n - 1 and n - 2. From these two values, it generates the value for n, prints a message and then constructs a new Node containing the value for n and a lazy evaluation for the next value.

The next function is the function which prints the first n elements of a lazy list. It looks like this:


  let print_fib_list depth lst =
      let rec sub_print current remaining =
          if current > depth then ()
          else
          match Lazy.force remaining with
          |    Node (head, tail) ->
                  Printf.printf "%3d : %d\n" current head ;
                  sub_print (current + 1) tail
      in
      sub_print 0 lst

The print_fib_list function contains an internal function sub_print which is called with a current depth of zero and the head of the lazy list to be printed. The internal function recursively moves down the list until current is greater than depth, which cause the recursion to complete and unwind.

At each node of the lazy list where current is less than or equal to depth, the function forces the evaluation of the node. The forcing will only evaluate a node if it hasn't already been evaluated. Once the node has been force evaluated, the value is printed and the function is called recursively.

Finally, the main function of the program is this:


  let _ =
      let fib_list = create_fib_list () in
      print_fib_list 4 fib_list ;
      print_endline "------------" ;
      print_fib_list 6 fib_list ;

All it does is call the function create_fib_list, and then print the first four Fibonacci numbers of the list, prints a dashed line and then prints the first six Fibonacci numbers of the list. Its important to note that the print function is called with the same list on both occasions.

When the program is run, the output should look like this:


    0 : 1
    1 : 1
  fib_n 1 1 -> 2
    2 : 2
  fib_n 1 2 -> 3
    3 : 3
  fib_n 2 3 -> 5
    4 : 5
  ------------
    0 : 1
    1 : 1
    2 : 2
    3 : 3
    4 : 5
  fib_n 3 5 -> 8
    5 : 8
  fib_n 5 8 -> 13
    6 : 13

As can be seen above, the first time the print function is called, the fib_n closure is called for all values of n greater than one. Each time fib_n is called a new node is generated in the list. When the print function is called the second time, it fib_n is only called for values that weren't evaluated on the first call to the print function just as was expected.

One of the few problems with the above implementation is that it uses integers which in Ocaml on 32 bit CPU platforms is only a 31 bit integer. It would however be relatively easy to use Ocaml's Big_int module which provides arbitrary length integers.

Posted at: 21:43 | Category: CodeHacking/Ocaml | Permalink