Sat, 19 Jul 2008

Ocaml and Unix.select.

At the June meeting of FP-Syd, Tim Docker gave a presentation about his Tuple Space Server written in Haskell. This presentation rather intrigued me because I have had a long term interest in numerical analysis and numerical optimisation problems which lend themselves very well to parallel and distributed computing. I decided I should write a Tuple Space Server myself, in Ocaml.

Tim's Tuple Space server used threads and Software Transactional Memory (STM) to handle the connection of multiple masters and workers to the server itself. Although the Ocaml CoThreads library does have an STM module I thought there was probably an easier way.

In my day job I'm working on some C++ code that handles multiple network sockets and open file descriptors using the POSIX select system call. On Linux at least, there is a select tutorial man page which gives a example of using select written in C.

The beauty of select is that it allows a single process to multiplex multiple sockets and/or file descriptors without resorting to threads. However, the C example in the tutorial clearly demonstrates that this system call is a bit of a pain to use directly. Fortunately, for the project at work, I had some really great C++ base classes written by my colleague Peter to build on top of. These base classes hide all the nastiness of dealing with the system call itself by wrapping the select call into a daemon class and providing a simple base class which clients of the select call can inherit from.

For Ocaml there is a thin wrapper around the C library function in the Unix module and it has the following signature:


  val select :
    file_descr list -> file_descr list -> file_descr list -> float ->
      file_descr list * file_descr list * file_descr list

It takes three lists of file descriptors (one descriptor list for each of read, write and exceptions), a float value for a timeout and returns a tuple of three lists; one each for the file descriptors ready for reading, writing and exception handling.

Whereas the C++ solution had a daemon class, the Ocaml version instead has a daemon function. The daemon function operates on a set of tasks, with one file descriptor per task. Each file descriptor was embedded in a struct which I named task_t:


  type task_t =
  {   fd : Unix.file_descr ;
  
      mutable wake_time : float option ;
  
      mutable select_on : bool ;

      mutable process_read : task_t -> bool * task_t list ;
  
      mutable process_wake : task_t -> bool * task_t list ;
  
      finalize : task_t -> unit ;
      }

The fields of the struct are as follows:

The first thing to note in the above is the careful use of an immutable field for the file descriptor and mutable fields for process_read, process_wake and wake_time. The file descriptor is immutable so that any client code does not change its value behind the back of the daemon.

The others fields of the struct are purposely made to be mutable so that they can be changed on the fly. The functions process_read and process_wake both return their results in the same manner, a tuple containing two items:

The actual daemon run loop keeps the tasks in a hash table where the key is the file descriptor. Once the initial set of tasks is in the hash table, the loop basically does the following:

  1. Find the file descriptors of all the tasks in the hash table which their select_on field set to true (uses Hashtbl.fold).
  2. Find the minimum wake_time timeout of all the tasks (this is actually done on the same pass over all items in the hash tables as step 1.).
  3. Pass the file descriptors from step 1. to the select with the timeout value found in 2. (The lists for writable and exception file descriptors are empty.)
  4. When select returns a list or file descriptors ready to be read, map the file descriptor to a task using the hash table and then run the process_read function of each readable task.
  5. For each task whose wake_time is exceeded, run its process_wake function.
  6. For steps 4. and 5., if a task's process function returns false as the first element of the tuple it returns, remove the task from the hash table and run the task's finalize function. Also if the second element in the tuple is a non-empty list, then add the tasks to the hash table.

The above code was placed in a module named Daemon. Using this module, I've whipped up a simple demo program, an echo server the source code of which is available here. The tarball contains four files:


Makefile The project's Makefile.
daemon.ml The Daemon module.
echo-server.ml The Echo server.
tcp.ml A module of TCP/IP helper functions.

To compile this you will need the Ocaml native compiler which can be installed on Debian or Ubuntu using:


  sudo apt-get install ocaml-nox

The server can be built using make and when run, you can connect to the server using:


  telnet localhost 9301

All lines sent to the server will be immediately echoed back to you.

Posted at: 21:10 | Category: CodeHacking/Ocaml | Permalink