m3ga blog

Calling from C into Ocaml.

Tue, 24 Feb 2009 11:31 GMT

I've got a project where it would be nice to be able to call Ocaml code from a C program. Although interfacing Ocaml and C is covered in the official manual and the O'Reilly Ocaml book, neither of these sources have a complete example.

As a firm believer in the idea that a 100 lines of working code is worth a thousand lines of explanatory text in a book or on the web, I thought I'd put together a small but complete example. First off, here is the Ocaml code (download ocaml-called-from-c.ml):


  let ocaml_puts name =
      Printf.printf "Program name is '%s'.\n" name ;
      (* Must flush stdout before returning to C. *)
      flush stdout

  let ocaml_string_join join arr =
      (* Create and return a string. *)
      String.concat join (Array.to_list arr)

  (* On program initialisation, register functions to be called from C. *)
  let () =
      Callback.register "ocaml_puts" ocaml_puts ;
      Callback.register "ocaml_string_join" ocaml_string_join

There are two functions that will be called from C, ocaml_puts and ocaml_string_join and both functions must be registered as callbacks with the Ocaml runtime using Callback.register. To find the function signatures of these functions we can use the ocamlc program:


  prompt > ocamlc -i ocaml-called-from-c.ml
  val ocaml_puts : string -> unit
  val ocaml_string_join : string -> string array -> string

The first function has a single string parameter and returns nothing, while the second takes two parameters, a string and an array of strings and returns a string.

The C program which calls these two functions looks like this (download c-main-calls-ocaml.c):


  #include 
  #include 
  #include 
  
  #include 
  #include 
  #include 
  #include 
  
  
  static void
  call_ocaml_void (const char * name)
  {   CAMLparam0 () ;
      CAMLlocal1 (ostr) ;
  
      ostr = caml_copy_string (name);
  
      value * func = caml_named_value ("ocaml_puts") ;
  
      if (func == NULL)
          puts ("caml_named_value failed!") ;
      else
          caml_callback (*func, ostr) ;
  
      CAMLreturn0 ;
  } /* call_ocaml_void */
  
  
  static void
  call_ocaml_string (char * join, char const ** argv)
  {   CAMLparam0 () ;
      CAMLlocal3 (ojoin, oargv, ores) ;
  
      ojoin = caml_copy_string (join);
      oargv = caml_alloc_array (caml_copy_string, argv) ;
  
      value * func = caml_named_value ("ocaml_string_join") ;
  
      if (func == NULL)
          puts ("caml_named_value failed!") ;
      else
          ores = caml_callback2 (*func, ojoin, oargv) ;
  
      printf ("Ocaml returned : '%s'\n", String_val (ores)) ;
  
      CAMLreturn0 ;
  } /* call_ocaml_string */
  
  
  int
  main (int argc, char ** argv)
  {   const char * progname ;
      int k, count ;
      
      progname = argv [0] ;
      if (strstr (progname, "./") == progname)
          progname += 2 ;
  
      if (argc < 2)
      {   puts ("Need at least 1 command line argument.") ;
          exit (1) ;
          } ;
  
      count = argc >= 2 ? atoi (argv [1]) : 1 ;
      count = count < 1 ? 1 : count ;
      printf ("Count : %d\n", count) ;

      /* Must call this before calling any Ocaml code. */
      caml_startup (argv) ;
  
      for (k = 0 ; k < count ; k++)
          call_ocaml_void (progname) ;
  
      for (k = 0 ; k < count ; k++)
          call_ocaml_string (" ", (char const **) (argv + 1)) ;
  
      return 0 ;
  } /* main */

The main function is mostly self explanatory; the only thing to note is that if we want to call any Ocaml code from C, we must call caml_startup first. Looking at the functions that call into Ocaml, note that these functions begin with a call to CAMLparam0 and ends with a call to CAMLreturn0. These are both macros, the first of which sets up the Ocaml specific stack requirements and the second of which cleans up after the first. The '0' at the end of their names indicates that there are zero Ocaml managed data objects passed into and returned from the C function respectively.

For values to be passed to Ocaml, we use local Ocaml managed variables set up with CAMLlocal1 if we only have one, or CAMLlocal3 if we have 3. Data can be copied into these local Ocaml variables using the caml_copy_* and caml_alloc_* families of functions.

The Ocaml functions we want to call can be looked up by name using caml_named_value and the function actually called using caml_callback if we only have one parameter to pass or caml_callback2 for two parameters.

For the call to ocaml_string_join which returns a string, we can extract the return value from the Ocaml wrapper using String_val. There are also other functions to retrieve other data types, the only real caveat being that if the type isn't atomic (eg int or double) and you want to return it from the C function it will be necessary allocate memory for it and copy it because the memory area returned from Ocaml will be invalid after the call to CAMLreturn0.

Finally, building this simple example can be done as follows (using version 3.10.2 of the Ocaml compiler):


  ocamlopt -c ocaml-called-from-c.ml -o ocaml-called-from-c.cmx
  ocamlopt -output-obj -o camlcode.o ocaml-called-from-c.cmx 
  gcc -g -Wall -Wextra  -c c-main-calls-ocaml.c -o c-main-calls-ocaml.o
  gcc camlcode.o c-main-calls-ocaml.o -ldl -lm -L /usr/lib/ocaml/3.10.2 \
         -lasmrun -o c-main-calls-ocaml

The first line compiles to Ocaml file into an Ocaml object (*.cmx) using the native code compiler, the second takes the Ocaml object and all the other Ocaml objects needed and generates a object file (camlcode.o) that can be linked to C code. The last two lines compile the C code into an object file and then links all the C objects and required libraries into an executable.


  prompt > ./c-main-calls-ocaml 4 abc wxyz
  Count : 4
  Program name is 'c-main-calls-ocaml'.
  Program name is 'c-main-calls-ocaml'.
  Program name is 'c-main-calls-ocaml'.
  Program name is 'c-main-calls-ocaml'.
  Ocaml returned : '4 abc wxyz'
  Ocaml returned : '4 abc wxyz'
  Ocaml returned : '4 abc wxyz'
  Ocaml returned : '4 abc wxyz'

At this point its probably a good idea to run the program under valgrind and vary the first parameter to prove to oneself that un-freed memory when the program terminates is a constant (due to the Ocaml runtime) and does not vary in proportion to the number of times the Ocaml code is called (which would indicate a memory leak in the interface code).

Ocaml : Null Cursor in lablgtk.

Tue, 02 Dec 2008 10:36 GMT

I need to be able to hide the cursor in an Ocaml/lablgtk/Cairo program I'm writing (a touchscreen calibration utility to go with the Zytouch driver). Usually this is done by creating a small 1x1 pixel cursor that is transparent, but I couldn't find any existing Ocaml/lablgtk code to do it. With a little help from Jacques Garrigue on the lablgtk mailing list I came up with the following function to create a cursor:


  let create_null_cursor win =
      let w, h = 1, 1 in
      let mask = Gdk.Bitmap.create ~window:win ~width:w ~height:h () in
      let pixmap = Gdk.Pixmap.create ~window:win ~width:w ~height:h ~depth:1 () in
      let color = Gdk.Color.alloc (Gdk.Color.get_system_colormap ()) (`RGB (0, 0, 0)) in
      Gdk.Cursor.create_from_pixmap pixmap mask color color w h

which I use as follows:


  let win = GWindow.window ~resizable:true () in
  ignore (win#connect#destroy GMain.quit) ;

  (* More code here. *)

  win#show () ;

  (* Must set the cursor after win#show () or we get a Gpointer.NULL exception. *)
  let cursor = create_null_cursor win#misc#window in
  Gdk.Window.set_cursor win#misc#window cursor ;

  GMain.main ()

Problem solved.

Just Drawing Stuff on the Screen.

Wed, 20 Aug 2008 12:29 GMT

Richard Jones laments that drawing stuff on the screen is harder than it should be. I haven't seen his code, but it looks like he might be trying to do it with Ocaml and GTK which probably is more difficult than it should be. GTK isn't really meant for that sort of stuff.

Fortunately, there is a really well designed and thoroughly thought out library for doing graphics called Cairo, which even has a really great set of Ocaml bindings. On Debian/Ubuntu, the Cairo bindings can be installed using:


   sudo apt-get install libcairo-ocaml-dev

I messed about with Ocaml and Cairo about a year ago and came up with this little demo.


  (*
  **    http://www.e-dsp.com/what-are-fourier-coefficients-and-how-to-calculate-them/
  **
  **    http://en.wikipedia.org/wiki/Fourier_series#Definition
  *)

  type fourier_series_t =
  {   a0 : float ;
      an : float array ;
      bn : float array ;
      }


  let initial_size = 200

  let two_pi = 8.0 *. atan 1.0

  let sum_float_array ary =
      Array.fold_left (fun x y -> x +. y) 0.0 ary


  let calc_series max_n ary =
      (*
      **    This uses a rough numerical approximation to integration.
      **    As long as the array is long enough (say 1000 or more elements), the
      **    results should be reasonable.
      *)
      let len = float_of_int (Array.length ary) in
      let calc_Xn trig_func n =
          let n = n + 1 in
          let ary = Array.mapi (
		  		fun i x -> x *.
                trig_func ((float_of_int (n * i)) *. two_pi /. (len -. 1.0))
				) ary
          in
          2.0 *. (sum_float_array ary) /. len
      in
      let a0 = (sum_float_array ary) /. len in
      let an = Array.init max_n (calc_Xn cos) in
      let bn = Array.init max_n (calc_Xn sin) in
      { a0 = a0 ; an = an ; bn = bn }


  let waveform_of_series outlen series =
      (*
      **  Given a fourier series, calculate a single cycle waveform of the
      **  specified length.
      *)
      let calc_point i =
          let x = two_pi *. (float_of_int i) /. (float_of_int (outlen - 1)) in
          let asum = sum_float_array (Array.mapi (
                    fun i an -> an *. (cos (float_of_int (i + 1) *. x))) series.an
                    )
          in
          let bsum = sum_float_array (Array.mapi (
                    fun i bn -> bn *. (sin (float_of_int (i + 1) *. x))) series.bn
                    )
          in
          series.a0 +. asum +. bsum
      in
      Array.init outlen calc_point


  let fold_over_clipped_sine gain len =
      let point i =
          let x = gain *. sin (two_pi *. (float_of_int i) /. (float_of_int len)) in
          if x > 1.0 then x -. 2.0
          else if x < -1.0 then x +. 2.0
          else x
      in
      Array.init len point


  let redraw w series _ =
      let cr = Cairo_lablgtk.create w#misc#window in
      let { Gtk.width = width ; Gtk.height = height } = w#misc#allocation in
      Cairo.save cr ;
      (   Cairo.identity_matrix cr ;
          let border = 20.0 in
          Cairo.move_to cr border border ;
          Cairo.line_to cr border (float_of_int height -. border) ;
          Cairo.stroke cr ;

          let wave_width = width - 100 - (int_of_float border) in
          let middle = float_of_int height /. 2.0 in
          let wave_height = 0.7 *. (middle -. border) in

          Cairo.move_to cr border middle ;
          Cairo.line_to cr (border +. float_of_int wave_width) middle ;
          Cairo.stroke cr ;

          Cairo.move_to cr (border +. float_of_int wave_width) border ;
          Cairo.line_to cr (border +. float_of_int wave_width)
                                      (float_of_int height -. border) ;
          Cairo.stroke cr ;

          Cairo.set_source_rgb cr 1.0 0.0 0.0 ;
          let wave_data = waveform_of_series wave_width series in
          Cairo.move_to cr border (float_of_int height /. 2.0) ;
          Array.iteri (fun i x ->
                        Cairo.line_to cr (border +. float i)
						      (middle -. wave_height *. x))
                        wave_data ;
          Cairo.stroke cr ;
          ) ;
      Cairo.restore cr ;
      true


  let () =
      if Array.length Sys.argv != 2 then
      (   Printf.printf "Usage : %s \n\n" Sys.argv.(0) ;
          exit 0 ;
          ) ;

      let series_len = int_of_string (Sys.argv.(1)) in

      let w = GWindow.window ~title:"Fourier Series Demo" ~width:600 ~height:400 () in
      ignore (w#connect#destroy GMain.quit) ;

      let b = GPack.vbox ~spacing:6 ~border_width:12  ~packing:w#add () in
      let f = GBin.frame ~shadow_type:`IN ~packing:(b#pack ~expand:true ~fill:true) () in
      let area = GMisc.drawing_area ~width:initial_size ~height:initial_size
                    ~packing:f#add ()
      in
      let array_len = 1000 in
      let wave = fold_over_clipped_sine 1.2 array_len in
      let series = calc_series series_len wave in

      ignore (area#event#connect#expose (redraw area series)) ;

      w#show () ;
      GMain.main ()

The above code can be compiled using:


    ocamlopt -I +cairo -I +lablgtk2 cairo.cmxa lablgtk.cmxa cairo_lablgtk.cmxa \
	    gtkInit.cmx fsdemo.ml -o fsdemo

and the output looks like this:

So while I agree that the 140 of lines of code here is about 30 times as much as Richard's code from his ZX80 days, I also think the results are at least 30 times as good.

Ocaml and Unix.select.

Sat, 19 Jul 2008 11:10 GMT

At the June meeting of FP-Syd, Tim Docker gave a presentation about his Tuple Space Server written in Haskell. This presentation rather intrigued me because I have had a long term interest in numerical analysis and numerical optimisation problems which lend themselves very well to parallel and distributed computing. I decided I should write a Tuple Space Server myself, in Ocaml.

Tim's Tuple Space server used threads and Software Transactional Memory (STM) to handle the connection of multiple masters and workers to the server itself. Although the Ocaml CoThreads library does have an STM module I thought there was probably an easier way.

In my day job I'm working on some C++ code that handles multiple network sockets and open file descriptors using the POSIX select system call. On Linux at least, there is a select tutorial man page which gives a example of using select written in C.

The beauty of select is that it allows a single process to multiplex multiple sockets and/or file descriptors without resorting to threads. However, the C example in the tutorial clearly demonstrates that this system call is a bit of a pain to use directly. Fortunately, for the project at work, I had some really great C++ base classes written by my colleague Peter to build on top of. These base classes hide all the nastiness of dealing with the system call itself by wrapping the select call into a daemon class and providing a simple base class which clients of the select call can inherit from.

For Ocaml there is a thin wrapper around the C library function in the Unix module and it has the following signature:


  val select :
    file_descr list -> file_descr list -> file_descr list -> float ->
      file_descr list * file_descr list * file_descr list

It takes three lists of file descriptors (one descriptor list for each of read, write and exceptions), a float value for a timeout and returns a tuple of three lists; one each for the file descriptors ready for reading, writing and exception handling.

Whereas the C++ solution had a daemon class, the Ocaml version instead has a daemon function. The daemon function operates on a set of tasks, with one file descriptor per task. Each file descriptor was embedded in a struct which I named task_t:


  type task_t =
  {   fd : Unix.file_descr ;
  
      mutable wake_time : float option ;
  
      mutable select_on : bool ;

      mutable process_read : task_t -> bool * task_t list ;
  
      mutable process_wake : task_t -> bool * task_t list ;
  
      finalize : task_t -> unit ;
      }

The fields of the struct are as follows:

fd : This is the file descriptor or socket that this task is operating on.
wake_time : This is an optional, per task timeout value. It is specified as either None for no timeout or Some with a value in seconds (ie Some 10.0 specifies a ten second timeout). A typical use might be when writing an network daemon where you want to close/drop any client connections which have been silent for more than a specified amount of time.
select_on : If this is true, the daemon will add this task's file descriptor to the list of files waiting for read. This field (set to false) in conjunction with the process_wake can be used to implement deferred actions on a descriptor that ignore readiness for read.
process_read : This is the function that is called when the file descriptor has data ready to be read. The function takes the task_t struct as a parameter and returns a tuple containing a bool and a task_t list, which may be empty. There's more on what happens with the return values of this function below.
process_wake : This is the function that is called when a wake_time value has been set and there has be no activity on the socket for the specified amount of time. This function's return type is the same as for process_read.
finalize : This function will be called when either of the above two functions notifies the daemon that the file descriptor should be closed.

The first thing to note in the above is the careful use of an immutable field for the file descriptor and mutable fields for process_read, process_wake and wake_time. The file descriptor is immutable so that any client code does not change its value behind the back of the daemon.

The others fields of the struct are purposely made to be mutable so that they can be changed on the fly. The functions process_read and process_wake both return their results in the same manner, a tuple containing two items:

A bool which indicates whether the daemon should hold onto the task (if true) or if false, the daemon should run the task's finalize function and then drop it to be cleaned up by the garbage collector.
A task_t list (possibly empty) of new tasks for the daemon to manage. Typically the daemon's server socket would be managed by the daemon function just like any other socket. When the server socket's process_read function accepts a new client connection it would return the new client task in this list.

The actual daemon run loop keeps the tasks in a hash table where the key is the file descriptor. Once the initial set of tasks is in the hash table, the loop basically does the following:

Find the file descriptors of all the tasks in the hash table which their select_on field set to true (uses Hashtbl.fold).
Find the minimum wake_time timeout of all the tasks (this is actually done on the same pass over all items in the hash tables as step 1.).
Pass the file descriptors from step 1. to the select with the timeout value found in 2. (The lists for writable and exception file descriptors are empty.)
When select returns a list or file descriptors ready to be read, map the file descriptor to a task using the hash table and then run the process_read function of each readable task.
For each task whose wake_time is exceeded, run its process_wake function.
For steps 4. and 5., if a task's process function returns false as the first element of the tuple it returns, remove the task from the hash table and run the task's finalize function. Also if the second element in the tuple is a non-empty list, then add the tasks to the hash table.

The above code was placed in a module named Daemon. Using this module, I've whipped up a simple demo program, an echo server the source code of which is available here. The tarball contains four files:

`Makefile`	The project's Makefile.
`daemon.ml`	The Daemon module.
`echo-server.ml`	The Echo server.
`tcp.ml`	A module of TCP/IP helper functions.

To compile this you will need the Ocaml native compiler which can be installed on Debian or Ubuntu using:


  sudo apt-get install ocaml-nox

The server can be built using make and when run, you can connect to the server using:


  telnet localhost 9301

All lines sent to the server will be immediately echoed back to you.

Objects vs Modules.

Fri, 23 May 2008 21:45 GMT

Although I've been using Ocaml for a several years now, I've not yet been in a situation where I've needed to write an Ocaml class to define a C++/Java/Python/Smalltalk/OO style object. I've found that most of the problems I encountered could be easily solved using functional code and that Ocaml's objects didn't provide an obviously better solution. Until now (or so I thought).

The problem was one of moving around the filesystem keeping track of the old directories so they were easy to return to. The obvious model for this was the pushd and popd built-ins in command shells like GNU Bash. This functionality can be easily wrapped up in an Ocaml object as in the following example and demo code (which needs to be linked to the Unix module):


  class dirstack = object
      val mutable stack = []

      method push dirname =
          (* Find the current working directory. *)
          let cwd = Unix.getcwd () in
          (* Change to the new directory. *)
          Unix.chdir dirname ;
          (* If successful, push old cwd onto the stack. *)
          stack <- cwd :: stack

      method pop () =
          match stack with
          |    [] -> failwith "Directory stack is empty."
          |    head :: tail ->
                  Unix.chdir head

  	end

  let () =
      print_endline (Unix.getcwd ()) ;
      let dstack = new dirstack in
      dstack#push "/tmp" ;
      print_endline (Unix.getcwd ()) ;
      dstack#push "/bin" ;
      print_endline (Unix.getcwd ()) ;
      dstack#pop () ;
      print_endline (Unix.getcwd ()) ;
      dstack#pop () ;
      print_endline (Unix.getcwd ())

However, there are some problems with the above code. Firstly, if the push and pop methods need to be used throughout the program, the dstack object needs to be made more widely accessible using one of the following three methods:

Being placed in the global scope.
Being made into a Singleton objecct.
Being passed around as a parameter to whatever function may need it.

Yuck! Yuck! Double yuck! Suddenly, this object oriented solution didn't look like such a great idea.

Then it struck me. This object can be easily transformed into an Ocaml module like this:


  module Dirstack = struct
      let stack = ref []

      let push dirname =
          (* Find the current working directory. *)
          let cwd = Unix.getcwd () in
          (* Change to the new directory. *)
          Unix.chdir dirname ;
          (* If successful, push old cwd onto the stack. *)
          stack := cwd :: !stack

      let pop () =
          match !stack with
          |    [] -> failwith "Directory stack is empty."
          |    head :: tail ->
                  stack := tail ;
                  Unix.chdir head

     end

  let () =
      print_endline (Unix.getcwd ()) ;
      Dirstack.push "/tmp" ;
      print_endline (Unix.getcwd ()) ;
      Dirstack.push "/bin" ;
      print_endline (Unix.getcwd ()) ;
      Dirstack.pop () ;
      print_endline (Unix.getcwd ()) ;
      Dirstack.pop () ;
      print_endline (Unix.getcwd ())

This solution using a module is much better than the one using an object. The Dirstack module itself is globally accessible and is already a singleton while the stack used to hold past directories is implemented as a list whose scope is limited to the module itself. (Furthermore, if Dirstack is implemented in its own file instead of using a module defined within a larger file, then the stack variable can be hidden completely by not listing it in the Dirstack interface file.)

So while I'm pleased with this solution, it does mean that I'll have to continue my hunt for a problem where an object provides a better solution than any other feature of the Ocaml language. This is particularly ironic because when choosing between two strict statically typed languages, Haskell and Ocaml, I chose Ocaml because I thought I needed objects. However, I stuck with Ocaml because of its pragmatism.

Ocaml : Exception Back Traces in Native Code.

Sun, 06 Apr 2008 02:48 GMT

Some time ago I wrote a blog post about exception back traces which at the time of that post only existed for the Ocaml byte code compiler.

However, version 3.10 of the Ocaml compiler which was released about a year ago, included exception back traces for native code as well as byte code. With the imminent release of Ubuntu's Hardy Heron, version 3.10 of the compiler is about to become much more widely available .

Enabling exception back traces is as simple as adding the "-g" option to the ocamlopt command line and then setting a single environment variable as follows.


  export OCAMLRUNPARAM="b1"

Functional Programming and Testing.

Sun, 24 Feb 2008 12:26 GMT

I read quite a lot of programming related blogs, but its rare for me to find one as muddle headed as this one titled "Quality Begs for Object-Orientation" on the O'Reilly network.

The author, Michael Feathers, starts the post by mentioning that he is dabbling in Ocaml and then makes the assertion that:

"I think that most functional programming languages are fundamentally broken with respect to the software lifecycle."

Now I'm not too sure why he brings up software lifecycle, because all he talks about is testing. However, he does give an example in Java involving testing and wraps up his post by saying that his Java solution is difficult to do in Ocaml, Haskell and Erlang.

Feathers gets two things wrong. Firstly he seems to be writing Java code using Ocaml's syntax and then complains that Ocaml is not enough like Java. His conclusion is hardly surprising. Ocaml is simply not designed for writing Java-like object oriented code.

The second problem is his claim that testing in functional languages is more difficult than with Java. While this may be true when writing Java code with Ocaml's syntax, it is not true for the more general case of writing idiomatic Ocaml or functional code.

So lets look at the testing of Object Oriented code in comparison to Functional code.

With the object orientated approach, a bunch of data fields are bundled up together in an object and methods defined some of which may mutate the state of the object's data fields. When testing objects with mutable fields, its important to test that the state transitions are correct under mutation.

By way of contrast, when doing functional programming, one attempts to write pure functions; functions which have no internal state and where outputs depend only on inputs and constants.

The really nice thing about pure functions is that they are so easy to test. The absence of internal state means that there are no state transitions to test. The only testing left is to collect a bunch of inputs that test for all the boundary conditions, pass each through the function under test and validate the output.

Since testing pure functions is easier that testing objects with mutable state, I would suggest that assuring quality using automated testing is easier for functional code than for object oriented code. This conclusion directly contradicts the title of Feathers' blog post: "Quality Begs for Object-Orientation".

The lesson to be learned here is that if anyone with a purely Java background wants to learn Ocaml or any other functional language, they have to be prepared for a rather large paradigm shift. Old habits and ways of thinking need to be discarded. For Ocaml, that means ignoring Ocaml's object oriented and imperative programming features for as long as possible and attempting to write nothing but pure stateless functions.

Update : 2008-02-26 17:04

Conrad Parker posted this to to reddit and the ensuing discussion was quite interesting.

Ocaml Snippet : Sqlite3.

Sat, 24 Nov 2007 03:20 GMT

One of the really nice things about using Ocaml on Debian and Ubuntu is the large number of really well packaged third party libraries.

Most of these libraries are also well documented from doc strings extracted from the source code files using ocamldoc. However, the documentation for most ocaml libraries is purely reference documentation and its not always obvious how to use the library simply from reading the reference docs. What's really needed is example code to be read in conjunction with the reference docs.

I'm working on a program where I needed a small, fast easy to administer database. With those requitements, Sqlite is really hard to beat and best of all, someone has already written Ocaml bindings. On Debian or Ubuntu, the Ocaml Sqlite bindings can be installed using:


  sudo apt-get install libsqlite3-ocaml-dev

In order to get a feel for using it and take my first steps into the world of SQL (which I'd had very minimal exposure to before now), I wrote a small program to test out the features provided by the library.

The following stand alone program should be taken as an example of how to access a Sqlite database from Ocaml. Since I am not an SQL expert, the actual SQL usage should be taken with a grain of salt.


  exception E of string

  let create_tables db =
      (* Create two tables in the database. *)
      let tables =
      [    "people", "pkey INTEGER PRIMARY KEY, first TEXT, last TEXT, age INTEGER" ;
          "cars", "pkey INTEGER PRIMARY KEY, make TEXT, model TEXT" ;
          ]
      in
      let make_table (name, layout) =
          let stmt = Printf.sprintf "CREATE TABLE %s (%s);" name layout in
          match Sqlite3.exec db stmt with
          |    Sqlite3.Rc.OK -> Printf.printf "Table '%s' created.\n" name
          |    x -> raise (E (Sqlite3.Rc.to_string x))
      in
      List.iter make_table tables


  let insert_data db =
      (* Insert data in both the tables. *)
      let people_data =
      [    "John", "Smith", 23;
          "Helen", "Jones", 29 ;
          "Adam", "Von Schmitt", 32 ;
          ]
      in
      let car_data =
      [    "bugatti", "veyron" ;
          "porsche", "911" ;
          ]
      in
      let insert_people (first, last, age) =
          (* Use NULL for primary key and Sqlite will generate a unique key. *)
          let stmt = Printf.sprintf "INSERT INTO people values (NULL, '%s', '%s', %d);"
                                     first last age
          in
          match Sqlite3.exec db stmt with
          |    Sqlite3.Rc.OK -> ()
          |    x -> raise (E (Sqlite3.Rc.to_string x))
      in
      let insert_car (make, model) =
          let stmt = Printf.sprintf "INSERT INTO cars values (NULL, '%s', '%s');"
                                     make model
		  in
          match Sqlite3.exec db stmt with
          |    Sqlite3.Rc.OK -> ()
          |    x -> raise (E (Sqlite3.Rc.to_string x))
      in
      List.iter insert_people people_data ;
      List.iter insert_car car_data ;
      print_endline "Data inserted."


  let list_tables db =
      (* List the table names of the given database. *)
      let lister row headers =
          Printf.printf "    %s : '%s'\n" headers.(0) row.(0)
      in
      print_endline "Tables :" ;
      let code = Sqlite3.exec_not_null db ~cb:lister
                          "SELECT name FROM sqlite_master;"
      in
      (    match code with
          |    Sqlite3.Rc.OK -> ()
          |    x -> raise (E (Sqlite3.Rc.to_string x))
          ) ;
      print_endline "------------------------------------------------"


  let search_callback db =
      (* Perform a simple search using a callback. *)
      let print_headers = ref true in
      let lister row headers =
          if !print_headers then
          (    Array.iter (fun s -> Printf.printf "  %-12s" s) headers ;
              print_newline () ;
              print_headers := false
              ) ;
          Array.iter (Printf.printf "  %-12s") row ;
          print_newline ()
      in
      print_endline "People under 30 years of age :" ;
      let code = Sqlite3.exec_not_null db ~cb:lister
                                 "SELECT * FROM people WHERE age < 30;"
      in
      match code with
      |    Sqlite3.Rc.OK -> ()
      |    x -> raise (E (Sqlite3.Rc.to_string x))



  let search_iterator db =
      (* Perform a simple search. *)
      let str_of_rc rc =
          match rc with
          |    Sqlite3.Data.NONE -> "none"
          |    Sqlite3.Data.NULL -> "null"
          |    Sqlite3.Data.INT i -> Int64.to_string i
          |    Sqlite3.Data.FLOAT f -> string_of_float f
          |    Sqlite3.Data.TEXT s -> s
          |    Sqlite3.Data.BLOB _ -> "blob"
      in
      let dump_output s =
          Printf.printf "  Row   Col   ColName    Type       Value\n%!"  ;
          let row = ref 0 in
          while Sqlite3.step s = Sqlite3.Rc.ROW do
              for col = 0 to Sqlite3.data_count s - 1 do
                  let type_name = Sqlite3.column_decltype s col in
                  let val_str = str_of_rc (Sqlite3.column s col) in
                  let col_name = Sqlite3.column_name s col in
                  Printf.printf "  %2d  %4d    %-10s %-8s   %s\n%!"
                                 !row col col_name type_name val_str ;
                  done ;
              row := succ !row ;
              done
      in
      print_endline "People over 25 years of age :" ;
      let stmt = Sqlite3.prepare db "SELECT * FROM people WHERE age > 25;" in
      dump_output stmt    ;
      match Sqlite3.finalize stmt with
      |    Sqlite3.Rc.OK -> ()
      |    x -> raise (E (Sqlite3.Rc.to_string x))


  let update db =
      print_endline "Helen Jones has just turned 30, so update table." ;
      print_endline "Should now only be one person under 30." ;
      let stmt = "UPDATE people SET age = 30 WHERE " ^
                      "first = 'Helen' AND last = 'Jones';"
      in
      (    match Sqlite3.exec db stmt with
          |    Sqlite3.Rc.OK -> ()
          |    x -> raise (E (Sqlite3.Rc.to_string x))
          ) ;
      search_callback db


  let delete_from db =
      print_endline "Bugattis are too expensive, so drop that entry." ;
      let stmt = "DELETE FROM cars WHERE make = 'bugatti';" in
      match Sqlite3.exec db stmt with
      |    Sqlite3.Rc.OK -> ()
      |    x -> raise (E (Sqlite3.Rc.to_string x))


  let play_with_database db =
      print_endline "" ;
      create_tables db ;
      print_endline "------------------------------------------------" ;
      list_tables db ;
      insert_data db ;
      print_endline "------------------------------------------------" ;
      search_callback db ;
      print_endline "------------------------------------------------" ;
      search_iterator db ;
      print_endline "------------------------------------------------" ;
      update db ;
      print_endline "------------------------------------------------" ;
      delete_from db ;
      print_endline "------------------------------------------------"


  (* Program main. *)

  let () =
      (* The database is called test.db. Delete it if it already exists. *)
      let db_filename = "test.db" in
      (    try Unix.unlink db_filename
          with _ -> ()
          ) ;

      (* Create a new database. *)
      let db = Sqlite3.db_open db_filename in

      play_with_database db ;

      (* Close database when done. *)
      if Sqlite3.db_close db then print_endline "All done.\n"
      else print_endline "Cannot close database.\n"

The above code can be run as a script using:


  ocaml -I +sqlite3 sqlite3.cma unix.cma sqlite_test.ml

or compiled to a native binary using:


  ocamlopt -I +sqlite3 sqlite3.cmxa unix.cmxa sqlite_test.ml -o sqlite_test

When run, the output should look like this:


  Table 'people' created.
  Table 'cars' created.
  ------------------------------------------------
  Tables :
      name : 'people'
      name : 'cars'
  ------------------------------------------------
  Data inserted.
  ------------------------------------------------
  People under 30 years of age :
    pkey          first         last          age
    1             John          Smith         23
    2             Helen         Jones         29
  ------------------------------------------------
  People over 25 years of age :
    Row   Col   ColName    Type       Value
     0     0    pkey       INTEGER    2
     0     1    first      TEXT       Helen
     0     2    last       TEXT       Jones
     0     3    age        INTEGER    29
     1     0    pkey       INTEGER    3
     1     1    first      TEXT       Adam
     1     2    last       TEXT       Von Schmitt
     1     3    age        INTEGER    32
  ------------------------------------------------
  Helen Jones has just turned 30, so update table.
  Should now only be one person under 30.
  People under 30 years of age :
    pkey          first         last          age
    1             John          Smith         23
  ------------------------------------------------
  Bugattis are too expensive, so drop that entry.
  ------------------------------------------------
  All done.

Lazy Lists.

Thu, 22 Mar 2007 10:43 GMT

Lazy evaluation is a default feature of the Haskell programming language and an optional feature of Ocaml. Most programming languages (Ocaml, C, C++, Perl, Python, Java etc) use eager evaluation; where a result specified by a line of code is calculated as soon as the program gets to that line. Lazy evaluation on the other hand, defers the calculation of a result until that result is needed.

The real beauty of lazy evaluation is that a result that is never used is never evaluated. Lazy evaluation also allows the specification of lists which are effectively infinite, as long as the programmer doesn't actually try to access every element in the list. Obviously, attempting to do so would take infinite time and and require infinite memory to actually hold the list .

While searching for information on Ocaml's lazy programming features I came across a post at the enchanted mind blog. That post is ok, but the code is just snippets and when put together as it is, doesn't actually work.

After a bit of fiddling around, I managed to get it working. However, once I understood it, I didn't think the example was as good as it could be. Firstly, the input to the lazy list is just a standard finite length Ocaml list, but more importantly it doesn't give any idea of how to do a potentially infinite list which is a much more interesting case.

That left the field open for a nice blog post demonstrating lazy lists in Ocaml. Read on.

Anybody who has done high school or higher mathematics would probably have come across recurrence relations the most well know of which is the Fibonacci sequence.

The Fibonacci sequence is often used as example for teaching the concept of recursion in computer science (even if some people think there are better examples). The Fibonacci sequence can be expressed recursively in Ocaml like this:


  let rec fibonacci n =
      match n with
      |    1 -> 1
      |    2 -> 1
      |    x -> (fibonacci (n - 1)) + (fibonacci (n - 2))

If one wanted to generate a list containing say the first 20 Fibonacci numbers using the above recursive function, the 19th number in the sequence would be calculated twice, the 18th number three times so on. Its simply not efficient.

A better solution is to use a lazy list, which calculates new values of the sequence as they are needed, based on entries already in the list. Here's an example that creates a lazy list of the fibonacci numbers:


  type lazy_fib_t =
      Node of int * lazy_fib_t Lazy.t

  let create_fib_list () =
      let rec fib_n minus_2 minus_1 =
          let n = minus_1 + minus_2 in
          Printf.printf "fib_n %d %d -> %d\n" minus_2 minus_1 n ;
          Node (n, lazy (fib_n minus_1 n))
      in
      lazy (Node (1, lazy (Node (1, lazy (fib_n 1 1)))))

  let print_fib_list depth lst =
      let rec sub_print current remaining =
          if current > depth then ()
          else
          match Lazy.force remaining with
          |    Node (head, tail) ->
                  Printf.printf "%3d : %d\n" current head ;
                  sub_print (current + 1) tail
      in
      sub_print 0 lst

  let _ =
      let fib_list = create_fib_list () in
      print_fib_list 4 fib_list ;
      print_endline "------------" ;
      print_fib_list 6 fib_list ;

This is a complete working Ocaml program. To run it, just save the text to a file, say "lazy_fib.ml" and then do:


  ocaml lazy_fib.ml

We'll look at the output in detail later. First lets break it down; looking at the program, from top to bottom we have:


  type lazy_fib_t =
      Node of int * lazy_fib_t Lazy.t

The above two lines define a recursive type called lazy_fib_t, which has a single variant called Node which contains a tuple of an integer and the head of a lazy list.


  let create_fib_list () =
      let rec fib_n minus_2 minus_1 =
          let n = minus_1 + minus_2 in
          Printf.printf "fib_n %d %d -> %d\n" minus_2 minus_1 n ;
          Node (n, lazy (fib_n minus_1 n))
      in
      lazy (Node (1, lazy (Node (1, lazy (fib_n 1 1)))))

The function above, create_fib_list, creates a lazy list. It also contains an internal function, fib_n, which we'll look at later. The last line of the function is where all the magic is; it creates three nodes of a lazy list, the first two containing the first two integers of the Fibonacci sequence and a third node which is a closure, containing a call to the internal function fib_n with the correct parameters to generate the next number in the sequence.

The internal function fib_n takes two parameters, the values of the sequence for n - 1 and n - 2. From these two values, it generates the value for n, prints a message and then constructs a new Node containing the value for n and a lazy evaluation for the next value.

The next function is the function which prints the first n elements of a lazy list. It looks like this:


  let print_fib_list depth lst =
      let rec sub_print current remaining =
          if current > depth then ()
          else
          match Lazy.force remaining with
          |    Node (head, tail) ->
                  Printf.printf "%3d : %d\n" current head ;
                  sub_print (current + 1) tail
      in
      sub_print 0 lst

The print_fib_list function contains an internal function sub_print which is called with a current depth of zero and the head of the lazy list to be printed. The internal function recursively moves down the list until current is greater than depth, which cause the recursion to complete and unwind.

At each node of the lazy list where current is less than or equal to depth, the function forces the evaluation of the node. The forcing will only evaluate a node if it hasn't already been evaluated. Once the node has been force evaluated, the value is printed and the function is called recursively.

Finally, the main function of the program is this:


  let _ =
      let fib_list = create_fib_list () in
      print_fib_list 4 fib_list ;
      print_endline "------------" ;
      print_fib_list 6 fib_list ;

All it does is call the function create_fib_list, and then print the first four Fibonacci numbers of the list, prints a dashed line and then prints the first six Fibonacci numbers of the list. Its important to note that the print function is called with the same list on both occasions.

When the program is run, the output should look like this:


    0 : 1
    1 : 1
  fib_n 1 1 -> 2
    2 : 2
  fib_n 1 2 -> 3
    3 : 3
  fib_n 2 3 -> 5
    4 : 5
  ------------
    0 : 1
    1 : 1
    2 : 2
    3 : 3
    4 : 5
  fib_n 3 5 -> 8
    5 : 8
  fib_n 5 8 -> 13
    6 : 13

As can be seen above, the first time the print function is called, the fib_n closure is called for all values of n greater than one. Each time fib_n is called a new node is generated in the list. When the print function is called the second time, it fib_n is only called for values that weren't evaluated on the first call to the print function just as was expected.

One of the few problems with the above implementation is that it uses integers which in Ocaml on 32 bit CPU platforms is only a 31 bit integer. It would however be relatively easy to use Ocaml's Big_int module which provides arbitrary length integers.

Xtreme Numerical Accuracy.

Wed, 21 Mar 2007 09:49 GMT

I'm working on a digital filter design program in Ocaml which was suffering from some numerical issues with Ocaml's native 64 bit floats. The problem was that the algorithm operates on both large floating point numbers and small floating point numbers. These numbers eventually end up in a matrix, and I then use Gaussian elimination to solve a set of simultaneous equations.

Anyone who has done any numerical computation will know that adding large floating point numbers to small floating numbers is a recipe for numerical inaccuracy. For me, the numerical issues were screwing things up badly.

When faced with a problem like this there are two possible solutions:

Do all the computations symbolically, and only substitute numbers at the very last stage and then being careful to process the numerical parts in a way to minimize rounding and truncation error.
Replace the floating point operations with operations on a number type which can provide higher (and maybe even arbitrary) precision.

The first option, doing all the computations symbolically was not practical due to the complexity of the computation. That left only the second option.

Looking around for what was available for Ocaml, I found the contfrac project on Sourceforge. As all the math geeks (hi Mark) have probably guessed by now, contfrac expresses numbers in terms of a really cool mathematical concept called continued fractions.

The idea is that any number can be represented by a (potentially infinite) list of integers [ a0 ; a1, a2, a3, ...]. Given the list of integers, the number itself can be calculated using:

All rational numbers have a finite length continued fraction expansion. For example, the rational number 75/99 is expressed as [ 0 ; 1, 3, 8 ].

Not surprisingly, all the irrational numbers have infinite length continued fraction expansions. The surprising thing (for me at least) is that many of the irrational numbers have CF expansions that are surprisingly regular. The square root of two is expressed as [ 1 ; 2, 2, 2, ...] with an infinitely repeating list of 2s. The natural logarithm e is expressed as [ 2 ; 1, 2, 1, 1, 4, 1, 1, 6, ...] which again has a regular pattern, as does the golden ratio, [ 1 ; 1, 1, 1, ...]. While all the previous CF expansions have a degree of regularity, the expansion of pi, is [ 3 ; 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 2, 1, 1, 2, 2, 2, 2, 1, 84, 2, 1, 1, 15, 3, 13,...], which looks completely random.

With numbers expressed as continued fractions, the Ocaml contfrac module then implements addition, subtraction, multiplication and division. Once the four arithmetic operations are defined, contfrac then implements a number of trigonometric and transcendental functions in terms of the same continued fractions.

Unfortunately, the module doesn't implement everything I need so I'm going to have to hack on some extra functionality. The actual Ocaml implementation uses Ocaml's lazy lists which is an aspect of Ocaml I hadn't played with yet. Time for some fiddling with lazy lists.

Ocaml : Exception Backtraces.

Sun, 22 Oct 2006 00:39 GMT

There's a paper dated December 2002 by Kevin Murphy where he explains why he was looking at Ocaml. That article was recently linked on programming.reddit.com and there was a comment complaining that Ocaml couldn't print out backtraces on exceptions. Someone posted later that this was not right, but I've heard this complaint often enough that I thought I should blog about how to do it.

First off, Ocaml has two compilers, one which produces bytecode and one which produces native binaries. The native code compiler is not currently able to produce exception backtraces and this is where the Reddit commenter got the idea. However, there is a patch in the Ocaml bug tracker which adds backtrace capabilities. I'm hoping that this goes into the compiler proper in the next release or two.

For a project that is currently compiling with ocamlopt (the native code compiler), changing the to bytecode compiler is as simple as editing the Makefile and replacing all invocations of "ocamlopt" with "ocamlc -g" where the "-g" turns on exception backtraces. You can then rebuild the application. The final step is to turn on backtraces in the bytecode run time environment which is done by setting an environment variable:


  export OCAMLRUNPARAM="b1"

Once compiled to bytecode and with the environment variable set, the application can be run and should produce the required backtrace. The following is an example of a backtrace from something I'm working on at the moment (I hacked the code to make sure I could get one).


  Fatal error: exception Invalid_argument("index out of bounds")
  Raised by primitive operation at unknown location
  Called from file "meyers_diff.ml", line 93, characters 1-31
  Called from file "meyers_diff.ml", line 200, characters 10-52
  Called from file "meyers_diff.ml", line 221, characters 16-60
  Called from file "meyers_diff.ml", line 264, characters 11-148
  Called from file "meyers_diff.ml", line 305, characters 17-50
  Called from file "array.ml", line 130, characters 31-51
  Called from file "meyers_diff.ml", line 323, characters 1-316

Obviously it would be nicer if function names were included here, but this is more than sufficient for debugging purposes.

Ocaml : Code for Variant Types and Pattern Matching.

Mon, 11 Sep 2006 08:41 GMT

Since my blog post on Ocaml's Variant Types and Pattern Matching I've had two requests for the code, so here it is:


type expr_t =
    |  Var of string
    |  Plus of expr_t * expr_t
    |  Times of expr_t * expr_t
    |  Divide of expr_t * expr_t


let rec simplify expr =
    match expr with
        |   Times (a, Plus (b, c)) ->
                Plus (simplify (Times (a, b)), simplify (Times (a, c)))
        |   Divide (Divide (a, b), Divide (c, d)) ->
                Divide (simplify (Times (a, d)), simplify (Times (b, c)))
        |   anything -> anything (* Comment : default case *)


let rec str_of_expr expr =
    match expr with
        |   Var v -> v
        |   Plus (a, b) ->
                "(" ^ (str_of_expr a) ^ "+" ^ (str_of_expr b) ^ ")"
        |   Times (a, b) ->
                (str_of_expr a) ^ "*" ^ (str_of_expr b)
        |   Divide (a, b) ->
                (str_of_expr a) ^ " / " ^ (str_of_expr b)


let _ =
    let expr = Times (Var "x", Plus (Var "y", Var "z")) in
    Printf.printf "  orig : %s\n" (str_of_expr expr) ;
    let expr = simplify expr in
    Printf.printf "  new  : %s\n" (str_of_expr expr)

The above code is a single self contained program; to run that program, save it to to a file named say "cas.ml" then run it (assuming you have ocaml installed) using the command "ocaml cas.ml" which should result in the following output:


  orig : x*(y+z)
  new  : (x*y+x*z)

Obviously, this is just just a demo, but it should be pretty clear that this code could easily be extended to something more useful.

Ocaml : Variant Types and Pattern Matching.

Sun, 03 Sep 2006 02:53 GMT

At last month's SLUG meeting, Mark Greenaway asked if anybody knew of any good Computer Algebra Systems (CAS) available under Linux. I spoke up and told him that I looked around for the same thing some time ago, couldn't find anything that I liked so I ended up writing something that fit my particular needs from scratch.

Later that night I was talking to Robert Collins and Andrew Cowie about stretch languages; languages that differ radically from the languages a programmer already knows so that learning the new language teaches them new programming concepts and paradigms.

For me, my last stretch language was Ocaml which I started using around mid 2004. I discovered Ocaml when I went looking for a language to implement my Computer Algebra System (CAS) in. I did do a trial implementation in C, but that was simply too much of a pain in the neck. I also knew that C++ was not a sufficiently big step away from C to be useful for this project. My other main language at the time was Python, but I knew Python's dynamic typing would make my life difficult.

It was at about this time that I asked my friend Andr� Pang to suggest a language. Andr� had recently given a talk at SLUG titled "Beyond C, C++, Python and Perl" and seemed to know about a whole bunch of different languages. I told him that I was looking for something that was strongly typed, statically typed, had garbage collection for memory management and had Object Oriented (OO) features.

One of Andr�'s suggestions was Java which I was already familiar with. However, I disliked the fact that Java does not allow one to write code outside of a class and Java was also a little too verbose for my tastes. He also tried to push Haskell, but Haskell doesn't have traditional OO features. In retrospect this wouldn't have been a problem, but at the time I rejected Haskell for this reason. However, his final suggestion was Ocaml which seemed to fit all of my requirements. While investigating Ocaml I found a small example on the Ocaml Tutorial that implemented a bare bones CAS.

The two things that makes Ocaml really great for CAS are variant types and pattern matching on these variants. I'll look at these separately.

Variant Types in Ocaml.

Ocaml's variant types are a little like a type safe, bullet proof version of unions in C and C++. In Ocaml one defines a variant type like this:

type expr_t =
    |  Var of string
    |  Plus of expr_t * expr_t
    |  Times of expr_t * expr_t
    |  Divide of expr_t * expr_t

So, here we have a type named expr_t (a mathematical expression) that can hold one of four things:

A variable name which is just a string.
A plus node with a left sub expression and a right sub expression.
A times node with a left sub expression and a right sub expression.
A divide node with a numerator sub expression and a denominator sub expression.

All of the sub expressions are of the same type, expr_t, which makes this a recursive type. Using this recursive variant type, an expression like "x * (y + z)" can be build like this:


let expr = Times (Var "x", Plus (Var "y", Var "z")) ;;

which results in a tree structure with each operator and our variables x, y and z being held in a node of type expr_t and represented by a circle in this diagram:

with the variable expr being the Times node at the top of the diagram.

The really nice thing about variants is that each instance knows which variant it is. That means that its not possible (by mistake or on purpose) to access a node of one variant as another variant. The Ocaml compiler simply won't let that happen.

Compare this to a C version using unions where the programmer has be make sure he/she accesses each instance correctly, or the acres of code required to do the same thing with objects in C++ or Java.

Pattern Matching on Variants.

So once we can construct a mathematical expression we would also want to print it out. Thats where Ocaml's pattern matching comes in. Here's a function to convert any expression tree into a string representation that can be printed.

let rec str_of_expr expr =
    match expr with
        |   Var v -> v
        |   Plus (a, b) ->
                "(" ^ (str_of_expr a) ^ "+" ^ (str_of_expr b) ^ ")"
        |   Times (a, b) ->
                (str_of_expr a) ^ "*" ^ (str_of_expr b)
        |   Divide (a, b) ->
                (str_of_expr a) ^ " / " ^ (str_of_expr b)

The function is called str_of_expr and the "rec" just before the function name means that the function is recursive. The function takes a single parameter of type expr_t and returns a string.

The "match expr with" on the next line is a bit like a switch statement in C, C++, Java and other languages. On the lines following the match there are four options, one for each of the variants of the expr_t type. So for instance, if the expr is a Var variant the function just returns the string that is held by Var and if its a Plus node with two sub expressions, a and b, then the function is called on each of the sub expressions and a string is built using Ocaml's string concatenation operator "^".

The above usage of pattern matching is pretty simple and can done almost as easily in other languages. So lets look at something a little more complicated.

More Advanced Pattern Matching.

One of the many things one might want to do in a CAS is applying mathematical transforms on an expression. For instance, we might want to be able to expand out our expressions above "x * (y + z)" to give "(x * y + x * z)". Fortunately, using Ocaml's advanced pattern matching this is really easy. Here's an example:

let rec simplify expr =
    match expr with
        |   Times (a, Plus (b, c)) ->
                Plus (simplify (Times (a, b)), simplify (Times (a, c)))
        |   Divide (Divide (a, b), Divide (c, d)) ->
                Divide (simplify (Times (a, d)), simplify (Times (b, c)))
        |   anything -> anything (* Comment : default case *)

The function simplify has two transformations and a default case which does nothing. Again, the function is recursive, but the first two match cases match on much more complex expression trees. In fact, the first match case will exactly match our expression and generate the expression we're after, "(x * y + x * z)".

Obviously, to make a real Computer Algebra System requires quite a bit more than what I have here. However, as you can see, Ocaml's variant types and pattern matching are a perfect fit for the problems a programmer writing a CAS would face. In fact, few other languages, with the possible exception of Haskell, would have fit this problem as well.

Ocaml : Fold.

Tue, 11 Jul 2006 10:43 GMT

In a previous post, I blogged about Ocaml's iter and map functions and how they can be applied to arrays and lists. In some circumstances, these functions can be used as a replacement for a for loop. However, there are some other situations where iter and map are can only provide a non-optimal solution. For example, here's a small program which uses Ocaml's imperative features to calculate the sum of the elements of an integer array:

let _ =
    let a = [| 1 ; 2 ; 5 ; 7 ; 11 |] in
    let sum = ref 0 in
    for i = 0 to Array.length a - 1 do
        sum := !sum + a.(i)
    done ;
    Printf.printf "Sum : %d\n" !sum

The value sum is a reference to an integer which is initialized to zero and the referenced sum is then updated inside the for loop. Operating on references is a little different to operating on values; it requires the use of the de-reference operator "!" to access the referenced value and requires the use of the de-reference assignment operator ":=" to update the referenced value.

Like the previous iter and map examples, there are a number of places this can go wrong, even though its only a very small chunk of code. Here's how to use iter to acheive the same result:

let _ =
    let a = [| 1 ; 2 ; 5 ; 7 ; 11 |] in
    let sum = ref 0 in
    Array.iter (fun x -> sum := !sum + x) a ;
    Printf.printf "Sum : %d\n" !sum

bit thats only a small win.

Fortunately, there is a significantly better Higher Order Function solution to this problem, a concept called fold and implemented as functions fold_left and fold_right. The following example program uses both and reduces the for loop in the first example to a single line, including the initialization of the accumulator used to calculate the sum:

let _ =
    let a = [| 1 ; 2 ; 5 ; 7 ; 11 |] in
    let fold_left_sum = Array.fold_left (fun x y -> x + y) 0 a in
    Printf.printf "Fold_left sum  : %d\n" fold_left_sum ;
    let fold_right_sum = Array.fold_right (fun x y -> x + y) a 0 in
    Printf.printf "Fold_right sum : %d\n" fold_right_sum

So lets have a look at a single fold_left:

Array.fold_left (fun x y -> x + y) 0 a

Obviously, the first parameter passed to fold_left is an anonymous function which takes two parameters x and y and returns their sum and the last parameter is simply the array the fold is being applied. The second parameter, 0 in this case, is where all the magic happens. The value 0 is the value that will be passed to the anonymous function as the x parameter, the first time it is called. For subsequent calls, the value of the x parameter will be the return value of the previous call of the anonymous function.

Obviously the easiest way to visualize this is with an example that prints out the values. Here it is:

let _ =
    let a = [| 1 ; 2 ; 5 ; 7 ; 11 |] in
    let fold_left_sum = Array.fold_left
    (   fun x y ->
            Printf.printf "%4d %4d\n" x y ;
            x + y
            )
        0 a
    in
    Printf.printf "\nFold_left sum  : %d\n" fold_left_sum

For those of you too lazy to try this yourselves , here is the output:

   0    1
   1    2
   3    5
   8    7
  15   11

Fold_left sum  : 26

There, just as I explained it. So what about fold_right? Well there are two differences and they are a little subtle so here's the program:

let _ =
    let a = [| 1 ; 2 ; 5 ; 7 ; 11 |] in
    let fold_right_sum = Array.fold_right
    (   fun x y ->
            Printf.printf "%4d %4d\n" x y ;
            x + y
            )
        a 0
    in
    Printf.printf "\nFold_right sum : %d\n" fold_right_sum

and here's the output:

  11    0
   7   11
   5   18
   2   23
   1   25

Fold_right sum : 26

The two differences are:

With fold_left the function is applied from first entry to last entry while with fold_right the function is applied in reverse order.
With fold_left the initializer/previous result is passed as the first parameter to the applied function while with fold_right the initializer/previous result is passed as the second parameter.

Like iter and map, Ocaml's fold functions can reduce the number of points of possible error in a program. More importantly, for the code reader who understands and is comfortable with these techniques, reading and understanding code using these functions is quicker than reading and understanding the equvalient for loop.

Ocaml rocks!

Ocaml : Iter and Map.

Sat, 08 Jul 2006 06:39 GMT

Ocaml, like most other languages has arrays for storing multiple values of the same type. Ocaml also has built in lists; lists that behave like the singly linked list that people write in lower level languages like C but without the pain.

To operate on arrays in Ocaml it is certainly possible to use a for loop to work on each element in the array in turn just like one would do in other languages. Here's a simple example program:

let _ =
    let int_array = [| 1 ; 2 ; 3 ; 7 ; 11 |] in
    for i = 0 to Array.length int_array - 1 do
        Printf.printf "%5d\n" int_array.(i)
    done

Note : This program can be run (assuming you have ocaml installed) by simply saving the above code in a text file with a ".ml" extension and then running:

    ocaml

The astute reader will notice that the for loop approach used above has a couple of things that the programmer must get right. Even in this really simple example, the programmer has to know that array indices start at zero, has to know or find out the length of the array, has to subtract one from that length to avoid accessing elements beyond the last element and then has to explicitly access each element of the array.

Fortunately, in functional languages like Ocaml, it is possible to avoid all of these potential causes of error by using more functional idioms like Higher Order Functions (HOF).

The idea behind using HOF on lists and arrays is that the programmer defines a function that operates on a single element and then applies that function to each entry one at a time. Here's a simple example program which defines a list of integers and then prints it.

let printline_int x =
    Printf.printf "%5d\n" x

let _ =
    let int_array = [| 1 ; 2 ; 3 ; 7 ; 11 |] in
    Array.iter printline_int int_array

Array.iter is a higher order function named iter in the Array module. It takes two paramaters, a function and an array and applies the function to each element of the array. In this particular case, the function printline_int will be applied to each element in int_array from the first to the last.

However, the printline_int function very simple and is only used once in this very small program. It therefore makes just as much sense to make it a closure like this:

let _ =
    let int_array = [| 1 ; 2 ; 3 ; 7 ; 11 |] in
    Array.iter (fun x -> Printf.printf "%5d\n" x) int_array

The bit enclosed within the parentheses here defines an anonymous function which takes a single parameter x and then defines the body of the function on the right hand side of the arrow.

However, even for this most recent example there's a more compact way to write it. Because the anonymous function's parameter is the last token in the function body it can also be written like this:

let _ =
    let int_array = [| 1 ; 2 ; 3 ; 7 ; 11 |] in
    Array.iter (Printf.printf "%5d\n") int_array

In this example, the bit within the parentheses is a partially applied function; a function which needs one or more extra parameters to complete the call. In this case the bit within the parentheses behaves just like the printline_int function in the earlier example.

All of the Array.iter examples above apply a function to the elements of the array and that function has no return value. In Ocaml, a function without a return value is said to return unit (much like void in C, C++, and Java etc).

So here's another example where we take the original array, add one to each element to create a new array and then print out the new array:

let _ =
    let int_array = [| 1 ; 2 ; 3 ; 7 ; 11 |] in
    let plus_one = Array.map (fun x -> x + 1) int_array in
    Array.iter (Printf.printf "%5d\n") plus_one

Array.map is much like Array.iter. The big difference is that the function that is passed to Array.map and applied to each element in the array must return a value. In this case, the anonymous function simply takes an int parameter and returns an int. However, it could just as easily take an int and return some other type. Here's an example where the anonymous function takes an int and returns a float:

let _ =
    let int_array = [| 1 ; 2 ; 3 ; 7 ; 11 |] in
    let exp_val = Array.map (fun x -> exp (float_of_int x)) int_array in
    Array.iter (Printf.printf "%10.2f\n") exp_val

There are equivalent functions to the ones above which work on lists, called unsurprisingly List.iter and List.map. There are also other interesting functions like iteri (arrays only), filter (lists only), fold_left and fold_right (both arrays and lists) and a couple of others. They are all powerful tools which allow algorithms to be implemented in Ocaml more succinctly and with less chance of error than with imperative programming languages.

Functional Triangles.

Thu, 06 Jul 2006 14:09 GMT

Our friend Mark Greenaway has been playing around with Ocaml. He's written a program that draws n-sided polygons (for odd n) and then draws lines between the midpoint of each line segment and the opposite vertex to show that all of those lines will coincide at a point which is the middle of the polygon.

Mark's solution to this problem is pretty typical of a programmer who comes to Ocaml from imperative languages. Typically it uses a bunch of for loops when better, more functional techniques exist .

Here's my solution to the problem:

(* To compile this: ocamlc graphics.cma triangles2.ml -o triangles2 *)
open Graphics

let pi = 3.1415926535897932384626433832795
let window_width = 640
let window_height = 480

let gen_points count =
  let cx = window_width / 2 in
  let cy = window_height / 2 in
  let ary = Array.create count 0 in
  let radius = 0.9 *. float_of_int cy in
  let delta_angle = 2.0 *. pi /. (float_of_int count) in
  Array.mapi (fun i x ->
    ( cx + truncate (radius *. sin (float_of_int i *. delta_angle)),
      cy + truncate (radius *. cos (float_of_int i *. delta_angle))
    )) ary

let gen_outer_lines points =
  set_color black ;
  let (x, y) =  points.(Array.length points - 1) in
  moveto x y ;
  Array.iter (fun (x, y) -> lineto x y) points

let gen_center_lines points =
  set_color red ;
  let do_line sx sy ex ey =
    moveto sx sy ;
    lineto ex ey
  in
  let len = Array.length points in
  let mid = len / 2 in
  Array.iteri (fun i (sx, sy) ->
    let (first, second) = ((i + mid) mod len, (i + mid + 1) mod len) in
    let (ex1,  ey1) = points.(first) in
    let (ex2,  ey2) = points.(second) in
    do_line sx sy ((ex1 + ex2) / 2) ((ey1 + ey2) / 2)
    ) points

let _ =
  let points = gen_points 5 in
  open_graph " 640x480" ;
  gen_outer_lines points ;
  gen_center_lines points ;
  set_color blue ;
  Array.iter (fun (x, y) -> fill_circle x y 10) points ;
  ignore (read_line ())

There's a few algorithmic tweaks here and there but the big difference is the complete lack of for loops.

GTK+ Callbacks in OCaml.

Tue, 25 Apr 2006 05:15 GMT

GUI programming is one of those areas thats always a bit of a pain regardless of language used. Ocaml however has some language features (not present in C, C++ Java, Python etc) which make GUI programming in Ocaml somewhat more elegant.

Regardless of the language used, GUI programming means writing a bunch of small functions (or class methods in OO systems) that get called when the GUI elements are manipulated by the user of the application. These small event handling functions are often called callbacks and are usually associated with the GUI widget (button / menu / whatever) when the widget is created. For example if C/GTK+, to catch the mouse button click even in some widget w the programmer has to define a button press handler with a function signature like:

  static gboolean
  button_press_callback (GtkWidget *w, GdkEventButton *ev, gpointer data) ;

When the program's user clicks the mouse button in the widget, this function will be called with a pointer to the widget as the first parameter and a pointer to the button event data (cursor x/y position, time etc) as the second parameter. The third parameter is a generic pointer. At the time the callback is registered, the programmer can set this pointer to be a pointer to anything he/she wishes and then cast it to the right type in the callback. One common usage is to set this as a pointer to a struct containing the current state of the application, so that this state can be modified within the callback function.

One common problem I have always run up against when doing the above is when I have more than one instance of a particular widget type and want to handle all of them using a single callback. If the last parameter is a pointer to the state, then how do I differentiate between the different widgets that generate the event. Yes, it can be done using the widget pointer, but thats just a pain. Usually, the best solution is to make the state data a global variable (yuck) and then make the last parameter a value which identifies which widget generated the event.

Ocaml has a really neat solution to the above problem; a programming trick called closures. In this particular context, closures allow something that behaves like a partial application of a function.

Consider the function :

  let add_2 a b =
      a + b

For those who don't read Ocaml, this is a function that takes two integer parameters and returns their sum. Using the above function we can define a new function like this:

  let add_to_x = add_2 x

As you can see, this uses our function from above, but calls it with only one parameter. So what the hell is add_to_x? Well its a closure (a type of function) that takes a single integer parameter and adds it to the value the variable x contained when the closure was created.

People with a background in OO languages can look at a closure like add_to_x as an object with a single method, but without all the overhead of defining a class and instantiating an object of that class.

So, when doing GUI programming in Ocaml we can define a callback with any number of parameters and with the last parameter as button event data:

  let button_press_callback a b c d event =
      (* Code goes here, and then return true. *)
      true

and when we register the callback with create a closure of the callback function with all but the last parameter. This is a much neater way of doing things than any widget callback handling I've ever seen in C, C++, C#, Java or Python.

Ocaml and Gtk.

Fri, 21 Apr 2006 12:44 GMT

I'm currently working on an application that, when its (nearly ) ready, will be released under the GNU GPL.

This app will require a lot of list and text manipulation so I really wanted to write this in Ocaml where list handling is so nice and text handling is so hard to screw up. I also need a GUI for this app and since I had already done a couple of GUIs in C/GTK+ that may have been an option. Unfortunately, manipulating lists and strings in C is just way too painful to contemplate. I therefore decided to go with Ocaml.

The Ocaml bindings to GTK+ is called lablgtk2 and there is also an Ocaml version of the GTK+ 2 tutorial which includes Ocaml versions of all the example programs for the original C/GTK+ tutorial.

I always find doing GUIs rather painful and just because I'm writing a GUI in Ocaml doesn't make it any less painful. The GUI basics are always pretty easy; the pain arises when you want to do anything a little unusual and the transition is not at all smooth. One minute you're rocking along thinking, damn, this is easy, and then you hit a brick wall trying to get one tiny little detail working. These problems are even worse in Ocaml because the documentation is nowhere near as good as it could be. In fact, getting the GUI I wanted was proving so difficult that I decided the best way forward might be to write a custom GTK+ widget in C and then wrap that in Ocaml.

The first step along this path is to write the widget and a test application in C. This was surprising easy after my battles with Ocaml and lablgtk. The GTK+ developer documentation is pretty good and sticking any GTK+ function name into Google will usually turn up a number of mailing list posts or whatever with example code.

I now had a damn good start on a widget in C that I needed to wrap in Ocaml. Or so I thought. It turned out that wrapping the widget and getting data back and forth across the Ocaml/C boundary was going to make this a very difficult exercise.

However, I learnt a lot while hacking together the widget in C, so I went back to Ocaml and lablgtk2 and came across a really good method for learning how to do things.

First figure out how to do it in C (reading docs / Google / whatever).
Grep for same concepts in the lablgtk2 include files (much like C header files) to find out what they are called.
Google for "concept lablgtk2" which usually turns up a code snippet that works.

With the above information discovery system the Ocaml version of the GUI is getting pretty close to being as full featured as the C version. I'm also finding that lablgtk2 is really nice to work with. Its not just a plain wrapper around the GTK+ functionality. In many places it deviates significantly from the way the GTK+/C interface does things, but where it does, that is a good thing.

Once the GUI is done, I need to hook it up to the back end code which is already at least partially working and then I'll be ready for the first pre-alpha release. After hacking in Ocaml for over a year now, I really look forward to releasing my first Ocaml app to the public.