Tue, 24 Feb 2009

Calling from C into Ocaml.

I've got a project where it would be nice to be able to call Ocaml code from a C program. Although interfacing Ocaml and C is covered in the official manual and the O'Reilly Ocaml book, neither of these sources have a complete example.

As a firm believer in the idea that a 100 lines of working code is worth a thousand lines of explanatory text in a book or on the web, I thought I'd put together a small but complete example. First off, here is the Ocaml code (download ocaml-called-from-c.ml):

  let ocaml_puts name =
      Printf.printf "Program name is '%s'.\n" name ;
      (* Must flush stdout before returning to C. *)
      flush stdout

  let ocaml_string_join join arr =
      (* Create and return a string. *)
      String.concat join (Array.to_list arr)

  (* On program initialisation, register functions to be called from C. *)
  let () =
      Callback.register "ocaml_puts" ocaml_puts ;
      Callback.register "ocaml_string_join" ocaml_string_join

There are two functions that will be called from C, ocaml_puts and ocaml_string_join and both functions must be registered as callbacks with the Ocaml runtime using Callback.register. To find the function signatures of these functions we can use the ocamlc program:

  prompt > ocamlc -i ocaml-called-from-c.ml
  val ocaml_puts : string -> unit
  val ocaml_string_join : string -> string array -> string

The first function has a single string parameter and returns nothing, while the second takes two parameters, a string and an array of strings and returns a string.

The C program which calls these two functions looks like this (download c-main-calls-ocaml.c):

  #include <stdio.h>
  #include <stdlib.h>
  #include <string.h>
  #include <caml/alloc.h>
  #include <caml/mlvalues.h>
  #include <caml/memory.h>
  #include <caml/callback.h>
  static void
  call_ocaml_void (const char * name)
  {   CAMLparam0 () ;
      CAMLlocal1 (ostr) ;
      ostr = caml_copy_string (name);
      value * func = caml_named_value ("ocaml_puts") ;
      if (func == NULL)
          puts ("caml_named_value failed!") ;
          caml_callback (*func, ostr) ;
      CAMLreturn0 ;
  } /* call_ocaml_void */
  static void
  call_ocaml_string (char * join, char const ** argv)
  {   CAMLparam0 () ;
      CAMLlocal3 (ojoin, oargv, ores) ;
      ojoin = caml_copy_string (join);
      oargv = caml_alloc_array (caml_copy_string, argv) ;
      value * func = caml_named_value ("ocaml_string_join") ;
      if (func == NULL)
          puts ("caml_named_value failed!") ;
          ores = caml_callback2 (*func, ojoin, oargv) ;
      printf ("Ocaml returned : '%s'\n", String_val (ores)) ;
      CAMLreturn0 ;
  } /* call_ocaml_string */
  main (int argc, char ** argv)
  {   const char * progname ;
      int k, count ;
      progname = argv [0] ;
      if (strstr (progname, "./") == progname)
          progname += 2 ;
      if (argc < 2)
      {   puts ("Need at least 1 command line argument.") ;
          exit (1) ;
          } ;
      count = argc >= 2 ? atoi (argv [1]) : 1 ;
      count = count < 1 ? 1 : count ;
      printf ("Count : %d\n", count) ;

      /* Must call this before calling any Ocaml code. */
      caml_startup (argv) ;
      for (k = 0 ; k < count ; k++)
          call_ocaml_void (progname) ;
      for (k = 0 ; k < count ; k++)
          call_ocaml_string (" ", (char const **) (argv + 1)) ;
      return 0 ;
  } /* main */

The main function is mostly self explanatory; the only thing to note is that if we want to call any Ocaml code from C, we must call caml_startup first. Looking at the functions that call into Ocaml, note that these functions begin with a call to CAMLparam0 and ends with a call to CAMLreturn0. These are both macros, the first of which sets up the Ocaml specific stack requirements and the second of which cleans up after the first. The '0' at the end of their names indicates that there are zero Ocaml managed data objects passed into and returned from the C function respectively.

For values to be passed to Ocaml, we use local Ocaml managed variables set up with CAMLlocal1 if we only have one, or CAMLlocal3 if we have 3. Data can be copied into these local Ocaml variables using the caml_copy_* and caml_alloc_* families of functions.

The Ocaml functions we want to call can be looked up by name using caml_named_value and the function actually called using caml_callback if we only have one parameter to pass or caml_callback2 for two parameters.

For the call to ocaml_string_join which returns a string, we can extract the return value from the Ocaml wrapper using String_val. There are also other functions to retrieve other data types, the only real caveat being that if the type isn't atomic (eg int or double) and you want to return it from the C function it will be necessary allocate memory for it and copy it because the memory area returned from Ocaml will be invalid after the call to CAMLreturn0.

Finally, building this simple example can be done as follows (using version 3.10.2 of the Ocaml compiler):

  ocamlopt -c ocaml-called-from-c.ml -o ocaml-called-from-c.cmx
  ocamlopt -output-obj -o camlcode.o ocaml-called-from-c.cmx 
  gcc -g -Wall -Wextra  -c c-main-calls-ocaml.c -o c-main-calls-ocaml.o
  gcc camlcode.o c-main-calls-ocaml.o -ldl -lm -L /usr/lib/ocaml/3.10.2 \
         -lasmrun -o c-main-calls-ocaml

The first line compiles to Ocaml file into an Ocaml object (*.cmx) using the native code compiler, the second takes the Ocaml object and all the other Ocaml objects needed and generates a object file (camlcode.o) that can be linked to C code. The last two lines compile the C code into an object file and then links all the C objects and required libraries into an executable.

  prompt > ./c-main-calls-ocaml 4 abc wxyz
  Count : 4
  Program name is 'c-main-calls-ocaml'.
  Program name is 'c-main-calls-ocaml'.
  Program name is 'c-main-calls-ocaml'.
  Program name is 'c-main-calls-ocaml'.
  Ocaml returned : '4 abc wxyz'
  Ocaml returned : '4 abc wxyz'
  Ocaml returned : '4 abc wxyz'
  Ocaml returned : '4 abc wxyz'

At this point its probably a good idea to run the program under valgrind and vary the first parameter to prove to oneself that un-freed memory when the program terminates is a constant (due to the Ocaml runtime) and does not vary in proportion to the number of times the Ocaml code is called (which would indicate a memory leak in the interface code).

Posted at: 22:31 | Category: CodeHacking/Ocaml | Permalink