Mon, 18 Aug 2008
Nemiver : A GUI debugger for GNOME.
The many years, Linux has lacked a good GUI debugger for C and C++ programs. Yes, everyone knows about GNU GDB, but that is a command line debugger and really not very useful for stepping through a program. There was also the Data Display Debugger (DDD) which uses the Motif widget set, usually supplied by the Lesstif Project. Unfortunately, Lesstif development has basically been abandoned and OpenMotif is not really an option because its license fails to meet term 8 of the Open Software Definition.
This means that for many years, developers on Linux have tended to avoid the "stepping through code with a debugger" approach to debugging. While I think that single stepping is not the most applicable to every debugging problem, there are times when single stepping is useful and possibly also the fastest way to track down a problem.
However, I was recently made aware of a new GUI debugger for the GNOME (ie really the Linux) desktop, Nemiver.
The only problem with nemiver is that the version in Ubuntu Hardy is a little old and was giving me a few troubles. However, after building and installing a version 2.2.0-2 package of libgtksourceviewmm-2.0 from source I was able to build nemiver from SVN and so far its working way better than DDD ever did.
So, here it is, a good looking, stable and capable GUI debugger for Linux.
Posted at: 19:39 | Category: CodeHacking | Permalink
Sat, 19 Jul 2008
Ocaml and Unix.select.
At the June meeting of FP-Syd, Tim Docker gave a presentation about his Tuple Space Server written in Haskell. This presentation rather intrigued me because I have had a long term interest in numerical analysis and numerical optimisation problems which lend themselves very well to parallel and distributed computing. I decided I should write a Tuple Space Server myself, in Ocaml.
Tim's Tuple Space server used threads and Software Transactional Memory (STM) to handle the connection of multiple masters and workers to the server itself. Although the Ocaml CoThreads library does have an STM module I thought there was probably an easier way.
In my day job I'm working on some C++ code that handles multiple network sockets and open file descriptors using the POSIX select system call. On Linux at least, there is a select tutorial man page which gives a example of using select written in C.
The beauty of select is that it allows a single process to multiplex multiple sockets and/or file descriptors without resorting to threads. However, the C example in the tutorial clearly demonstrates that this system call is a bit of a pain to use directly. Fortunately, for the project at work, I had some really great C++ base classes written by my colleague Peter to build on top of. These base classes hide all the nastiness of dealing with the system call itself by wrapping the select call into a daemon class and providing a simple base class which clients of the select call can inherit from.
For Ocaml there is a thin wrapper around the C library function in the Unix module and it has the following signature:
val select :
file_descr list -> file_descr list -> file_descr list -> float ->
file_descr list * file_descr list * file_descr list
It takes three lists of file descriptors (one descriptor list for each of read, write and exceptions), a float value for a timeout and returns a tuple of three lists; one each for the file descriptors ready for reading, writing and exception handling.
Whereas the C++ solution had a daemon class, the Ocaml version instead has a daemon function. The daemon function operates on a set of tasks, with one file descriptor per task. Each file descriptor was embedded in a struct which I named task_t:
type task_t =
{ fd : Unix.file_descr ;
mutable wake_time : float option ;
mutable select_on : bool ;
mutable process_read : task_t -> bool * task_t list ;
mutable process_wake : task_t -> bool * task_t list ;
finalize : task_t -> unit ;
}
The fields of the struct are as follows:
- fd : This is the file descriptor or socket that this task is operating on.
- wake_time : This is an optional, per task timeout value. It is specified as either None for no timeout or Some with a value in seconds (ie Some 10.0 specifies a ten second timeout). A typical use might be when writing an network daemon where you want to close/drop any client connections which have been silent for more than a specified amount of time.
- select_on : If this is true, the daemon will add this task's file descriptor to the list of files waiting for read. This field (set to false) in conjunction with the process_wake can be used to implement deferred actions on a descriptor that ignore readiness for read.
- process_read : This is the function that is called when the file descriptor has data ready to be read. The function takes the task_t struct as a parameter and returns a tuple containing a bool and a task_t list, which may be empty. There's more on what happens with the return values of this function below.
- process_wake : This is the function that is called when a wake_time value has been set and there has be no activity on the socket for the specified amount of time. This function's return type is the same as for process_read.
- finalize : This function will be called when either of the above two functions notifies the daemon that the file descriptor should be closed.
The first thing to note in the above is the careful use of an immutable field for the file descriptor and mutable fields for process_read, process_wake and wake_time. The file descriptor is immutable so that any client code does not change its value behind the back of the daemon.
The others fields of the struct are purposely made to be mutable so that they can be changed on the fly. The functions process_read and process_wake both return their results in the same manner, a tuple containing two items:
- A bool which indicates whether the daemon should hold onto the task (if true) or if false, the daemon should run the task's finalize function and then drop it to be cleaned up by the garbage collector.
- A task_t list (possibly empty) of new tasks for the daemon to manage. Typically the daemon's server socket would be managed by the daemon function just like any other socket. When the server socket's process_read function accepts a new client connection it would return the new client task in this list.
The actual daemon run loop keeps the tasks in a hash table where the key is the file descriptor. Once the initial set of tasks is in the hash table, the loop basically does the following:
- Find the file descriptors of all the tasks in the hash table which their select_on field set to true (uses Hashtbl.fold).
- Find the minimum wake_time timeout of all the tasks (this is actually done on the same pass over all items in the hash tables as step 1.).
- Pass the file descriptors from step 1. to the select with the timeout value found in 2. (The lists for writable and exception file descriptors are empty.)
- When select returns a list or file descriptors ready to be read, map the file descriptor to a task using the hash table and then run the process_read function of each readable task.
- For each task whose wake_time is exceeded, run its process_wake function.
- For steps 4. and 5., if a task's process function returns false as the first element of the tuple it returns, remove the task from the hash table and run the task's finalize function. Also if the second element in the tuple is a non-empty list, then add the tasks to the hash table.
The above code was placed in a module named Daemon. Using this module, I've whipped up a simple demo program, an echo server the source code of which is available here. The tarball contains four files:
| Makefile | The project's Makefile. |
| daemon.ml | The Daemon module. |
| echo-server.ml | The Echo server. |
| tcp.ml | A module of TCP/IP helper functions. |
To compile this you will need the Ocaml native compiler which can be installed on Debian or Ubuntu using:
sudo apt-get install ocaml-nox
The server can be built using make and when run, you can connect to the server using:
telnet localhost 9301
All lines sent to the server will be immediately echoed back to you.
Posted at: 21:10 | Category: CodeHacking/Ocaml | Permalink
Sat, 24 May 2008
Objects vs Modules.
Although I've been using Ocaml for a several years now, I've not yet been in a situation where I've needed to write an Ocaml class to define a C++/Java/Python/Smalltalk/OO style object. I've found that most of the problems I encountered could be easily solved using functional code and that Ocaml's objects didn't provide an obviously better solution. Until now (or so I thought).
The problem was one of moving around the filesystem keeping track of the old directories so they were easy to return to. The obvious model for this was the pushd and popd built-ins in command shells like GNU Bash. This functionality can be easily wrapped up in an Ocaml object as in the following example and demo code (which needs to be linked to the Unix module):
class dirstack = object
val mutable stack = []
method push dirname =
(* Find the current working directory. *)
let cwd = Unix.getcwd () in
(* Change to the new directory. *)
Unix.chdir dirname ;
(* If successful, push old cwd onto the stack. *)
stack <- cwd :: stack
method pop () =
match stack with
| [] -> failwith "Directory stack is empty."
| head :: tail ->
Unix.chdir head
end
let () =
print_endline (Unix.getcwd ()) ;
let dstack = new dirstack in
dstack#push "/tmp" ;
print_endline (Unix.getcwd ()) ;
dstack#push "/bin" ;
print_endline (Unix.getcwd ()) ;
dstack#pop () ;
print_endline (Unix.getcwd ()) ;
dstack#pop () ;
print_endline (Unix.getcwd ())
However, there are some problems with the above code. Firstly, if the push and pop methods need to be used throughout the program, the dstack object needs to be made more widely accessible using one of the following three methods:
- Being placed in the global scope.
- Being made into a Singleton objecct.
- Being passed around as a parameter to whatever function may need it.
Yuck! Yuck! Double yuck! Suddenly, this object oriented solution didn't look like such a great idea.
Then it struck me. This object can be easily transformed into an Ocaml module like this:
module Dirstack = struct
let stack = ref []
let push dirname =
(* Find the current working directory. *)
let cwd = Unix.getcwd () in
(* Change to the new directory. *)
Unix.chdir dirname ;
(* If successful, push old cwd onto the stack. *)
stack := cwd :: !stack
let pop () =
match !stack with
| [] -> failwith "Directory stack is empty."
| head :: tail ->
stack := tail ;
Unix.chdir head
end
let () =
print_endline (Unix.getcwd ()) ;
Dirstack.push "/tmp" ;
print_endline (Unix.getcwd ()) ;
Dirstack.push "/bin" ;
print_endline (Unix.getcwd ()) ;
Dirstack.pop () ;
print_endline (Unix.getcwd ()) ;
Dirstack.pop () ;
print_endline (Unix.getcwd ())
This solution using a module is much better than the one using an object. The Dirstack module itself is globally accessible and is already a singleton while the stack used to hold past directories is implemented as a list whose scope is limited to the module itself. (Furthermore, if Dirstack is implemented in its own file instead of using a module defined within a larger file, then the stack variable can be hidden completely by not listing it in the Dirstack interface file.)
So while I'm pleased with this solution, it does mean that I'll have to continue my hunt for a problem where an object provides a better solution than any other feature of the Ocaml language. This is particularly ironic because when choosing between two strict statically typed languages, Haskell and Ocaml, I chose Ocaml because I thought I needed objects. However, I stuck with Ocaml because of its pragmatism.
Posted at: 07:45 | Category: CodeHacking/Ocaml | Permalink
Sun, 20 Apr 2008
Cross Compiling for Legacy Win32 Systems (Part 2).
Cross compiling from Linux to Windows requires the installation of a couple of packages. On a Debian or Ubuntu system this can be done using:
sudo apt-get install build-essential sudo apt-get install mingw32 mingw32-binutils mingw32-runtime wine
I'm running Ubuntu's Hardy Heron pre-release and the following is known to work with these versions:
mingw32 4.2.1.dfsg-1ubuntu1 mingw32-binutils 2.17.50-20070129.1-1 mingw32-runtime 3.13-1 wine 0.9.59-0ubuntu5
For an example of a project which can be successfully cross-compiled, I have chosen libogg which is one of the two libraries required to encode and decode Ogg/Vorbis files. I also happen to know that the current libogg sources in the Xiph Foundation's SVN repository cross-compile from Linux to Windows correctly because I committed the patch to make it possible.
However, we need to look ahead a little. After we have cross compiled libogg we will also want to cross compile the associated libvorbis library which relies on libogg. We therefore need to configure libogg so that when we install it, it can be found by the libvorbis configure script.
For me that meant creating a MinGW32 directory in my home directory:
mkdir $HOME/MinGW32
The next step to to grab the libogg source code from the Xiph SVN server. This can be achieved using the command:
svn co http://svn.xiph.org/trunk/ogg libogg
Changing into the libogg directory, we are now ready to configure, test and install the library. That can be done using:
./autogen.sh
./configure --host=i586-mingw32msvc --target=i586-mingw32msvc \
--build=i586-linux --prefix=$HOME/MinGW32
make
make check
make install
The first command above, runs the auto tools to generate that configure script. The second command, configure is broken across two lines. It sets up the generated Makefiles to compile Windows binaries from a Linux host, with the install directory we set up before. The third line builds the windows version of libogg, the fourth line runs the test suite, with the windows executables being run under WINE and the final line installs everything in the MinGW32 directory created earlier.
All of the above commands should pass without errors. If they don't, check your versions of of the mingw cross compiler tools and/or WINE.
Posted at: 20:51 | Category: CodeHacking/MinGWCross | Permalink
Wed, 16 Apr 2008
Cross Compiling for Legacy Win32 Systems (Part 1).
My main two FOSS projects, libsndfile and libsamplerate have significant numbers of users that are tied to that particularly odious legacy system, Microsoft Windows. Since I don't normally use Windows myself, maintaining support for that OS has always been a huge pain in the neck.
Originally I shipped Microsoft project files for libsndfile, but that became unworkable because the different versions of the Microsoft tools (Visual C++ 5, Visual C++ 6, Visual Studio 2003, Visual Studio 2005 etc) used different and incompatible project file formats. I solved this by shipping a simple Makefile that used Microsoft's nmake and the command line compilers to build libsndfile. However, by about 2004, the Microsoft compiler's complete lack of support for the 1999 ISO C Standard made maintaining support too much trouble, so it was dropped.
Instead, I started using Cygwin and MinGW to compile libsndfile on Windows. Both of these tool-sets use a version of the GNU GCC compiler just like Linux and building libsndfile using these two tool-sets was trivial:
./configure make make check
Of course there were howls of protest from Windows users, but since they (with a small number of exceptions) had contributed so little, I didn't fell like I owed them anything. I also started releasing pre-compiled Windows binaries at the same time as the source code tarballs were released.
However, while the MinGW compiler was a huge improvement over the Microsoft one it was still a huge pain in the neck. I had to keep a Windows machine and keep it updated and patched against vulnerabilities. Furthermore, installing and updating MinGW was a painful manual process. Oh how I longed for a Debian/Ubuntu style apt-get command to look for and install updates. Finally, copying source code back and forth between Linux and Windows while debugging Windows issues was another pain point because version control systems like GNU Arch and bzr simply didn't work very well on Windows.
In about 2004, I tried the MinGW Linux to Windows cross compiler, a compiler that runs on Linux but generates binaries for Windows. This compiler worked, but left one rather large problem; how do I run libsndfile's rather large and comprehensive test suite? Compiling libsndfile without running the test suite is a waste of time. I did try to run the tests under WINE (the Windows emulator), but at the time tests were failing under WINE that didn't fail on Windows.
From that time on, I would try running the cross-compiled test suite under WINE once or twice a year. Then, some time in the last year or so, the number of problems with the test suite dropped to one, which was only a FIXME message. A little hacking on the WINE sources resulted in a patch that was sent to the WINE mailing list and has since been applied to the main WINE source tree.
With that bug fixed, I can now cross compile from Linux to Windows and run the full libsndfile test suite under WINE. That means that Windows has just become that little bit less relevant that it was before.
A future post will explain how to set up the cross compiler and WINE and walk through compiling and testing of a standard FOSS project.
Posted at: 23:12 | Category: CodeHacking/MinGWCross | Permalink
Fri, 11 Apr 2008
You Stupid Git!
As far as I can tell, the absolute, canonical, got-to-first documentation for the git distributed version control system (DVCS) can be found here:
This documentation seems comprehensive and well laid out. It explains commits, manipulating-branches, merging, collaborative development and the pretty damn interesting rebase and bisect commands. This documentation is called a user manual but it contains sufficient examples to make it a pretty damn fine tutorial.
Normally something like "here's a link to the documentation" would not be worthy of a blog post. However, failure to find the canonical user manual could lead a person (ie me) to post messages to mailing lists saying things like:
"I'm sure git is very clever and all, but its UI and documentation is probably the most user hateful thing I have seen [since] sendmail's cf files."
or, on finding a one hour long video screen-cast tutorial (apparently aimed at all those Ruby on Rails writing Mac OSX users):
"This makes me wonder, how fscked up does a DVCS have to be that you need tens of megabytes of video to show how it works when Bzr and many others can do it with less than ten kilobytes of html text?"
So while I was wrong about the documentation I still have huge reservations about git's user interface and stand by this statement:
"I am currently trying to learn git and I can see very clearly that git is designed by kernel programmers whose normal approach to a user interface is something like a Unix system call."
I'm sure git is a powerful tool and the rebase feature is something I've been wishing for in other systems for some time, but git's UI is already starting to grate.
Posted at: 20:02 | Category: CodeHacking | Permalink
Sun, 06 Apr 2008
Ocaml : Exception Back Traces in Native Code.
Some time ago I wrote a blog post about exception back traces which at the time of that post only existed for the Ocaml byte code compiler.
However, version 3.10 of the Ocaml compiler which was released about a year ago, included exception back traces for native code as well as byte code. With the imminent release of Ubuntu's Hardy Heron, version 3.10 of the compiler is about to become much more widely available .
Enabling exception back traces is as simple as adding the "-g" option to the ocamlopt command line and then setting a single environment variable as follows.
export OCAMLRUNPARAM="b1"
Posted at: 12:48 | Category: CodeHacking/Ocaml | Permalink
Sun, 30 Mar 2008
libsamplerate 0.1.3.
About a week ago I released a new version of SecretRabbitCode (aka libsamplerate).
The major change was that the new improved SINC based converters I blogged about here are now the default. There were also a couple of minor bug fixes.
The fine people at Infinitewave have now updated their test results to include the new converter and it shows Secret Rabbit Code comes very close to the best of the commercial converters in terms of quality.
Posted at: 15:11 | Category: CodeHacking/SecretRabbitCode | Permalink
Mon, 24 Mar 2008
Cross Compiling with pkg-config.
I'm currently playing with the MinGW cross compiler versions of the GNU C and C++ compilers available via apt-get on Debian and Ubuntu systems. These cross compilers generate windows binaries from a Linux host system which is potentially a much less painful way turning FOSS code into binaries for that particularly odious legacy platform.
Most of the software I'm compiling uses the GNU tools; autoconf, automake, libtool and pkg-config for configuring the software before compiling. Autoconf already has good support for cross compiling and automake and libtool just do what autoconf tells them to do. Pkg-config however is the odd one out.
Pkg-config's job is to retrieve information about installed libraries so that the compiler can find the required header files for inclusion and libraries for linking. For instance, if you wanted compile a program that uses the gconf-2.0 library you could find out the required CFLAGS to be passed to the C compiler and required libraries for linking, by doing something like the following in the Makefile.
GCONF_CFLAGS = $(shell pkg-config --cflags gconf-2.0) GCONF_LIBS = $(shell pkg-config --libs gconf-2.0)
In the above example, when pkg-config is run, it looks in the directory /usr/lib/pkg-config/ and reads information from the file gconf-2.0.pc (each installed library should have one or more of these pkg-config files) which then gets printed out. While the information given by pkg-config would be correct for a native build, it is unlikely to be correct for the cross compiling case.
This issue came up as early as 2003 and there is even a wiki page which suggests some quite extensive changes to pkg-config. Unfortunately I think these suggestions are somewhat fragile and pkg-config itself (I'm using version 0.22) already has features for a better solution.
Like many Unix programs, pkg-config's behaviour can be modified by manipulating certain environment variables. The pkg-config man page explains these variables very well. The first one is PKG_CONFIG_LIBDIR which modifies the default location where pkg-config looks for its per installed library config file. Secondly, the PKG_CONFIG_PATH variable can be set to allow additional pkg-config search paths.
Overriding these two variables results in a MinGW cross pkg-config bash script which I have named i586-mingw32msvc-pkg-config and which looks like this:
#!/bin/bash # This file has no copyright assigned and is placed in the Public Domain. # No warranty is given. # When using the mingw32msvc cross compiler tools, the native Linux # pkg-config executable works fine as long as the default PKG_CONFIG_LIBDIR # is overridden. export PKG_CONFIG_LIBDIR=/usr/i586-mingw32msvc/lib/pkgconfig # Also want to override the standard user defined PKG_CONFIG_PATH with # a mingw32msvc specific one. export PKG_CONFIG_PATH=$PKG_CONFIG_PATH_MINGW32MSVC # Now just execute pkg-config with the given command line args. pkg-config $@
Now autoconf generated configure scripts that realise that the i586-mingw32msvc-gcc cross compiler is being used will run the above script and get suitable information for the cross compiler rather than the native compiler.
The only downside to this solution is that a separate script is required for each cross compiler which uses pkg-config. This however is a minor price to pay and it is unlikely that people will end up with huge numbers of XXXX-pkg-config scripts like was common before the widespread use of pkg-config.
Until a better solution becomes available, this is what I will be using.
Posted at: 13:24 | Category: CodeHacking/MinGWCross | Permalink
Sat, 08 Mar 2008
Progress on the Rabbit.
For over three years now, I have been working on (on and off, but mostly off) a new algorithm for doing audio sample rate conversion in Secret Rabbit Code. The idea for the new algorithm has been rattling around in my head for most of that time, but the problem was always the implementation. While I am making progress it has been slow.
However, a public comparison between a large collection of converters showed that while the conversion quality of Secret Rabbit Code was good, it was nowhere near state of the art.
In order to see if I could get Secret Rabbit Code closer to state of the art quickly, I decided to revisit the existing converter during the xmas/new-year break.
The existing converter had a set of digital filters whose coefficients were generated by a small program written in GNU Octave. My first task was to convert that program to Ocaml which has become my favourite language for technical computing. I then spent quite a bit of time finding and analyzing where the filter design program was loosing precision and finding work arounds. Finally, I spent even more time looking at how the different filter design parameters interact with one another and with the conversion algorithm itself.
Fortunately, all this work has paid off. The result is new versions of the SRC_SINC_MEDIUM_QUALITY and SRC_SINC_BEST_QUALITY converters. The old versions of these converters have been renamed to SRC_OLD_SINC_MEDIUM_QUALITY and SRC_OLD_SINC_BEST_QUALITY. The old versions will be removed once the new versions have been fully validated.
So far, the new converters seem to have significantly improved signal to noise ratio as can be seen from the following to spectrograms (using the methodology described here). It should be obvious from these plots that the new versions of the converters have significantly less artifacts (the purple and blue bits) than the old converters.
Obviously, conversion quality is not the only criterion to evaluate sample rate converters; conversion speed can also be important in some situations. In my preliminary testing, the updated Best SINC converter runs up to 25% slower than the old one. The new best converter also uses significantly more memory than the old one. Storage of filter coefficients has gone up by a factor of 20, which is now over a megabyte for best quality converter alone.
In the tables below I've listed the SNR, throughput speeds and bandwidths as measured by the test suite (the snr_bw_test and throughput_test programs) distributed with the code for a couple of different CPU types.
1.1 GHz Intel Pentium M (32 bit) with 2048 KB cache
| Converter Name | SNR | Throughput | Bandwidth |
|---|---|---|---|
| SRC_OLD_SINC_MEDIUM_QUALITY | |||
| SRC_SINC_MEDIUM_QUALITY | |||
| SRC_OLD_SINC_BEST_QUALITY | |||
| SRC_SINC_BEST_QUALITY |
|
1.8 GHz AMD Opteron 265 (64 bit) with 1024 KB cache
| Converter Name | SNR | Throughput | Bandwidth |
|---|---|---|---|
| SRC_OLD_SINC_MEDIUM_QUALITY | |||
| SRC_SINC_MEDIUM_QUALITY | |||
| SRC_OLD_SINC_BEST_QUALITY | |||
| SRC_SINC_BEST_QUALITY |
|
1.86GHz Intel Core Duo (32 bit) with 2048 KB cache
| Converter Name | SNR | Throughput | Bandwidth |
|---|---|---|---|
| SRC_OLD_SINC_MEDIUM_QUALITY | |||
| SRC_SINC_MEDIUM_QUALITY | |||
| SRC_OLD_SINC_BEST_QUALITY | |||
| SRC_SINC_BEST_QUALITY |
|
A pre-release containing these updated converters is available for download here. Once they have been tested a little more widely I intend to replace the old versions of the converters with the new, higher specification ones.
Anybody who wants to discuss this further should join the SRC mailing list and discuss it there.
Finally, once a version of Secret Rabbit Code with these new converters has been officially released I can get back to the new converter algorithm which should at least match the what I have here in terms of quality but run significantly faster and use at least an order of magnitude less RAM.
Posted at: 14:50 | Category: CodeHacking/SecretRabbitCode | Permalink
Sun, 24 Feb 2008
Functional Programming and Testing.
I read quite a lot of programming related blogs, but its rare for me to find one as muddle headed as this one titled "Quality Begs for Object-Orientation" on the O'Reilly network.
The author, Michael Feathers, starts the post by mentioning that he is dabbling in Ocaml and then makes the assertion that:
"I think that most functional programming languages are fundamentally broken with respect to the software lifecycle."
Now I'm not too sure why he brings up software lifecycle, because all he talks about is testing. However, he does give an example in Java involving testing and wraps up his post by saying that his Java solution is difficult to do in Ocaml, Haskell and Erlang.
Feathers gets two things wrong. Firstly he seems to be writing Java code using Ocaml's syntax and then complains that Ocaml is not enough like Java. His conclusion is hardly surprising. Ocaml is simply not designed for writing Java-like object oriented code.
The second problem is his claim that testing in functional languages is more difficult than with Java. While this may be true when writing Java code with Ocaml's syntax, it is not true for the more general case of writing idiomatic Ocaml or functional code.
So lets look at the testing of Object Oriented code in comparison to Functional code.
With the object orientated approach, a bunch of data fields are bundled up together in an object and methods defined some of which may mutate the state of the object's data fields. When testing objects with mutable fields, its important to test that the state transitions are correct under mutation.
By way of contrast, when doing functional programming, one attempts to write pure functions; functions which have no internal state and where outputs depend only on inputs and constants.
The really nice thing about pure functions is that they are so easy to test. The absence of internal state means that there are no state transitions to test. The only testing left is to collect a bunch of inputs that test for all the boundary conditions, pass each through the function under test and validate the output.
Since testing pure functions is easier that testing objects with mutable state, I would suggest that assuring quality using automated testing is easier for functional code than for object oriented code. This conclusion directly contradicts the title of Feathers' blog post: "Quality Begs for Object-Orientation".
The lesson to be learned here is that if anyone with a purely Java background wants to learn Ocaml or any other functional language, they have to be prepared for a rather large paradigm shift. Old habits and ways of thinking need to be discarded. For Ocaml, that means ignoring Ocaml's object oriented and imperative programming features for as long as possible and attempting to write nothing but pure stateless functions.
Update : 2008-02-26 17:04
Conrad Parker posted this to to reddit and the ensuing discussion was quite interesting.
Posted at: 23:26 | Category: CodeHacking/Ocaml | Permalink
Sat, 24 Nov 2007
Ocaml Snippet : Sqlite3.
One of the really nice things about using Ocaml on Debian and Ubuntu is the large number of really well packaged third party libraries.
Most of these libraries are also well documented from doc strings extracted from the source code files using ocamldoc. However, the documentation for most ocaml libraries is purely reference documentation and its not always obvious how to use the library simply from reading the reference docs. What's really needed is example code to be read in conjunction with the reference docs.
I'm working on a program where I needed a small, fast easy to administer database. With those requitements, Sqlite is really hard to beat and best of all, someone has already written Ocaml bindings. On Debian or Ubuntu, the Ocaml Sqlite bindings can be installed using:
sudo apt-get install libsqlite3-ocaml-dev
In order to get a feel for using it and take my first steps into the world of SQL (which I'd had very minimal exposure to before now), I wrote a small program to test out the features provided by the library.
The following stand alone program should be taken as an example of how to access a Sqlite database from Ocaml. Since I am not an SQL expert, the actual SQL usage should be taken with a grain of salt.
exception E of string
let create_tables db =
(* Create two tables in the database. *)
let tables =
[ "people", "pkey INTEGER PRIMARY KEY, first TEXT, last TEXT, age INTEGER" ;
"cars", "pkey INTEGER PRIMARY KEY, make TEXT, model TEXT" ;
]
in
let make_table (name, layout) =
let stmt = Printf.sprintf "CREATE TABLE %s (%s);" name layout in
match Sqlite3.exec db stmt with
| Sqlite3.Rc.OK -> Printf.printf "Table '%s' created.\n" name
| x -> raise (E (Sqlite3.Rc.to_string x))
in
List.iter make_table tables
let insert_data db =
(* Insert data in both the tables. *)
let people_data =
[ "John", "Smith", 23;
"Helen", "Jones", 29 ;
"Adam", "Von Schmitt", 32 ;
]
in
let car_data =
[ "bugatti", "veyron" ;
"porsche", "911" ;
]
in
let insert_people (first, last, age) =
(* Use NULL for primary key and Sqlite will generate a unique key. *)
let stmt = Printf.sprintf "INSERT INTO people values (NULL, '%s', '%s', %d);"
first last age
in
match Sqlite3.exec db stmt with
| Sqlite3.Rc.OK -> ()
| x -> raise (E (Sqlite3.Rc.to_string x))
in
let insert_car (make, model) =
let stmt = Printf.sprintf "INSERT INTO cars values (NULL, '%s', '%s');"
make model
in
match Sqlite3.exec db stmt with
| Sqlite3.Rc.OK -> ()
| x -> raise (E (Sqlite3.Rc.to_string x))
in
List.iter insert_people people_data ;
List.iter insert_car car_data ;
print_endline "Data inserted."
let list_tables db =
(* List the table names of the given database. *)
let lister row headers =
Printf.printf " %s : '%s'\n" headers.(0) row.(0)
in
print_endline "Tables :" ;
let code = Sqlite3.exec_not_null db ~cb:lister
"SELECT name FROM sqlite_master;"
in
( match code with
| Sqlite3.Rc.OK -> ()
| x -> raise (E (Sqlite3.Rc.to_string x))
) ;
print_endline "------------------------------------------------"
let search_callback db =
(* Perform a simple search using a callback. *)
let print_headers = ref true in
let lister row headers =
if !print_headers then
( Array.iter (fun s -> Printf.printf " %-12s" s) headers ;
print_newline () ;
print_headers := false
) ;
Array.iter (Printf.printf " %-12s") row ;
print_newline ()
in
print_endline "People under 30 years of age :" ;
let code = Sqlite3.exec_not_null db ~cb:lister
"SELECT * FROM people WHERE age < 30;"
in
match code with
| Sqlite3.Rc.OK -> ()
| x -> raise (E (Sqlite3.Rc.to_string x))
let search_iterator db =
(* Perform a simple search. *)
let str_of_rc rc =
match rc with
| Sqlite3.Data.NONE -> "none"
| Sqlite3.Data.NULL -> "null"
| Sqlite3.Data.INT i -> Int64.to_string i
| Sqlite3.Data.FLOAT f -> string_of_float f
| Sqlite3.Data.TEXT s -> s
| Sqlite3.Data.BLOB _ -> "blob"
in
let dump_output s =
Printf.printf " Row Col ColName Type Value\n%!" ;
let row = ref 0 in
while Sqlite3.step s = Sqlite3.Rc.ROW do
for col = 0 to Sqlite3.data_count s - 1 do
let type_name = Sqlite3.column_decltype s col in
let val_str = str_of_rc (Sqlite3.column s col) in
let col_name = Sqlite3.column_name s col in
Printf.printf " %2d %4d %-10s %-8s %s\n%!"
!row col col_name type_name val_str ;
done ;
row := succ !row ;
done
in
print_endline "People over 25 years of age :" ;
let stmt = Sqlite3.prepare db "SELECT * FROM people WHERE age > 25;" in
dump_output stmt ;
match Sqlite3.finalize stmt with
| Sqlite3.Rc.OK -> ()
| x -> raise (E (Sqlite3.Rc.to_string x))
let update db =
print_endline "Helen Jones has just turned 30, so update table." ;
print_endline "Should now only be one person under 30." ;
let stmt = "UPDATE people SET age = 30 WHERE " ^
"first = 'Helen' AND last = 'Jones';"
in
( match Sqlite3.exec db stmt with
| Sqlite3.Rc.OK -> ()
| x -> raise (E (Sqlite3.Rc.to_string x))
) ;
search_callback db
let delete_from db =
print_endline "Bugattis are too expensive, so drop that entry." ;
let stmt = "DELETE FROM cars WHERE make = 'bugatti';" in
match Sqlite3.exec db stmt with
| Sqlite3.Rc.OK -> ()
| x -> raise (E (Sqlite3.Rc.to_string x))
let play_with_database db =
print_endline "" ;
create_tables db ;
print_endline "------------------------------------------------" ;
list_tables db ;
insert_data db ;
print_endline "------------------------------------------------" ;
search_callback db ;
print_endline "------------------------------------------------" ;
search_iterator db ;
print_endline "------------------------------------------------" ;
update db ;
print_endline "------------------------------------------------" ;
delete_from db ;
print_endline "------------------------------------------------"
(* Program main. *)
let () =
(* The database is called test.db. Delete it if it already exists. *)
let db_filename = "test.db" in
( try Unix.unlink db_filename
with _ -> ()
) ;
(* Create a new database. *)
let db = Sqlite3.db_open db_filename in
play_with_database db ;
(* Close database when done. *)
if Sqlite3.db_close db then print_endline "All done.\n"
else print_endline "Cannot close database.\n"
The above code can be run as a script using:
ocaml -I +sqlite3 sqlite3.cma unix.cma sqlite_test.ml
or compiled to a native binary using:
ocamlopt -I +sqlite3 sqlite3.cmxa unix.cmxa sqlite_test.ml -o sqlite_test
When run, the output should look like this:
Table 'people' created.
Table 'cars' created.
------------------------------------------------
Tables :
name : 'people'
name : 'cars'
------------------------------------------------
Data inserted.
------------------------------------------------
People under 30 years of age :
pkey first last age
1 John Smith 23
2 Helen Jones 29
------------------------------------------------
People over 25 years of age :
Row Col ColName Type Value
0 0 pkey INTEGER 2
0 1 first TEXT Helen
0 2 last TEXT Jones
0 3 age INTEGER 29
1 0 pkey INTEGER 3
1 1 first TEXT Adam
1 2 last TEXT Von Schmitt
1 3 age INTEGER 32
------------------------------------------------
Helen Jones has just turned 30, so update table.
Should now only be one person under 30.
People under 30 years of age :
pkey first last age
1 John Smith 23
------------------------------------------------
Bugattis are too expensive, so drop that entry.
------------------------------------------------
All done.
Posted at: 14:20 | Category: CodeHacking/Ocaml | Permalink
Thu, 13 Sep 2007
GNU gcc and -Wmissing-prototypes.
Many people who code in C consider warning messages optional or if they do enable warnings, use gcc's -Wall warning flag and leave it at that. However, there are a number of problems that gcc can warn about but doesn't unless it is specifically told to do so.
For example, consider a rather trivial example consisting of a main program file (main.c) like this:
#include <stdio.h>
#include "other.h"
int
main (void)
{
printf ("two cubed : %f\n", int_power (2.0, 3)) ;
return 0 ;
}
a second C file (other.c) like this:
double
int_power (int pow, double value)
{
double output = value ;
for ( ; pow > 1 ; pow --)
output *= value ;
return output ;
}
and the header file for the above C file (other.h) like this:
double int_power (double value, int pow) ;
Simple.
Compiling this code at the command line can be done like this:
gcc -Wall -Wextra main.c other.c -o program
which gives no warnings. However, when the resulting executable is run, it gives an obviously wrong result:
two cubed : 0.000000
What the ..... ?
Looking at the code to this rather trivial example, its pretty easy to figure out that the error is caused by the main program and the implementation of the function int_power disagreeing on the order of the two parameters.
In a more complicated real world situation, this can lead to seriously difficult to debug problems. The solution of course is to add the -Wmissing-prototype flag to the gcc command line:
gcc -Wall -Wextra -Wmissing-prototypes main.c other.c -o program
Now the compiler gives us a warning message:
other.c:3: warning: no previous prototype for 'int_power'
To get rid of this warning, the file other.c should include other.h. When we do that, we get a compile error telling us that there is a conflict between the function implementation in other.c and the function prototype in other.h:
other.c:6: error: conflicting types for 'int_power' other.h:1: error: previous declaration of 'int_power' was here
The fix of course is to make the implementation of int_power in other.c match the function prototype. Once that is done, the program compiles and even gives the correct result.
But we're not quite done yet. The behavior of the original broken code is slightly different when compiled with a C++ compiler. Compiling with g++:
g++ -Wall -Wextra main.c other.c -o program
results in an error message:
/tmp/cccTLc2H.o: In function `main': main.c:(.text+0x23): undefined reference to `int_power(double, int)' collect2: ld returned 1 exit status
So how does the C++ compiler know that something is wrong here when the C compiler didn't?
The most important thing to notice is that the error is produced by the linker. Secondly, one needs to remember that C++ (unlike C) allows function name overloading; that is, two (or more) functions can have the same name as long as they all have a unique (ordered) set of function argument types.
In the case above, the C++ linker (which may be the same as the C linker but behaves differently when linking C++ object files) knows the function called from main.c takes two parameters, a double followed by an int. However, the file other.c has a function of the same name, but with the order of the parameters reversed and hence can't be used. Since there is no other function of that name the linker gives an error.
Interestingly, the C++ compiler does not accept the -Wmissing-prototypes warning flag. Personally, I think it should, because obvious warnings from the parser stage of the compiler are an order of magnitude better than obscure error messages from the linker.
Finally, some C++ fan-boys might give this as an example of why C++ is a safer language than C. The question I would ask of those people is, "if you are so concerned with programming safety, why are you using C++ instead of Ocaml or Haskell?". I would also suggest that using a good C compiler like GNU gcc with every warning message you can find turned on is just as safe as running the same code through a C++ compiler.
Posted at: 07:03 | Category: CodeHacking | Permalink
Mon, 30 Jul 2007
A Simple Introduction to Parsing with Flex and Bison.
On Friday night I gave a presentation at SLUG with title above. Unfortunately the SLUG video recording people weren't there on the night so no video was captured. I am however making the slides and code available for download here. The code examples demonstrate a simple email date header parser written in both C and Ocaml. The C code is in five different stages so people can see how the parser was developed.
If anyone has any questions about the code, or more generally with the techniques of parsing, I'd be happy to discuss them on the SLUG coders mailing list.
Posted at: 20:35 | Category: CodeHacking | Permalink
Sun, 08 Apr 2007
Horses For Courses.
In my day job, I work with a hardware engineer named Joe. A couple of months ago, he had to do some C coding to talk to a serial port and came to me for pointers. He was basically on the right track, but was using too many global variables, not checking the return values of system calls etc. These weren't horrible problems, but I explained why practices like this can lead to problems later, showed him better solutions to the same problems and introduced him to the gcc warning flags and Valgrind. He was very grateful for my help and was a quick to pick up all the tips I'd given him.
More recently Joe came to me with another programming problem; he had to parse some numerical data out of a plain text log file. He already had about 60 lines of C code that opened a file based on a hard coded filename and he was starting to fiddle around with the fgetc function but was a little stuck on how to go further.
I had a look at his code and since Joe's a nice Irish chap, his predicament brought to mind a joke I once heard:
It is said that there was once an English motorist in Ireland who stopped his car to ask the way to Kilkenny. "Sure and to goodness," replied the Irishman., "If I wanted to go to Kilkenny, I wouldn't be starting from here."
The problem is that C is not a very good language for parsing text data. I told Joe that writing his log file parser in C could certainly be done, but that it would be painful, time consuming and error prone in comparison to other programming languages. So I told him that for this particular task Python would be a much better fit and asked him if he'd like me to teach him the basics of Python. Joe's no dummy; he agreed without hesitation.
First up I showed him the Python Tutorial, the Python Module index and how to used Google Groups advanced search to find Python specific answers from the comp.lang.python Usenet group. I then showed him the basic hello world program in Python:
#!/usr/bin/python print "Hello world!"
Over the next hour we built up a good portion of his program. We used used the sys module to get the file name of the log file to parse, the built in Python file handling functions and the regular expression module. We even used a list comprehension to remove outliers from his data set.
In the end we had about 30 lines of Python code that was very much closer to Joe's end goal than his original 60 lines of C code. Joe was really, really impressed with how easy Python was in comparison to C. It was at this point that I warned Joe about the Blub Paradox. He was well aware that when he only knew C, C was his first choice for this programming task. However, now that he knows Python as well, he'll be able to pick between C and Python depending on the task. I also told him that many Python programmers see Python as the ultimate programming language and are really Blub programmers even if they don't know it.
In my own programming I'm currently using:
- C : The first language I really became proficient in. Its great for low level hacking, good for libraries and wherever speed of execution is an important aspect. I still love C even if it does seem rather archaic in comparison to the others.
- C++ : I'm not real keen on C++ even though I do use it for most of the coding I do at my day job. As a language its incredibly complex and unforgiving. There are a million ways to shoot yourself in the foot and no one lives long enough to experience them all. Its a language that can harbor the most obscure bugs that can be exceedingly difficult to track down. This is a language that as far as I am concerned is long past its use by date.
- Python : Python is a great language for teaching and a great language for small scripts. Some people also use it for bigger projects like Bzr but for me personally, I find its dynamic type system too problematic for use on larger projects.
- Ocaml : This is my current favorite for general purpose programming. It can be run as a script just like Python, but can also be compiled to a really fast native binary. It uses strict static type checking to find coding errors at compile time and run time array bounds checking. It has variant types and pattern matching which are powerful constructs that programmers who have only used mainstream languages like C, C++, Java, Perl, Python etc could only dream about.
- Erlang : Lately I've been learning Erlang, both for a project of my own and for a project at work. I'm learning it because it does parallel, concurrent and distributed programming better than any of the above, probably better than any other language in existence. Its also been pretty easy to pick up because, like Ocaml, its a functional programming language. Like Python, it uses dynamic type checking, but also has quite a lot of static checking in the compiler.
So, with the above languages at my disposal, I can match a programming language to the task at hand. For numerical and mathematical programming I use Ocaml, for low level programming I use C and from now on, for multi-threaded and concurrent programming I will chose Erlang.
More importantly, correctly matching the language to the problem should make the task of developing a solution to the problem far easier than using an inappropriate language.
Posted at: 20:16 | Category: CodeHacking | Permalink
Wed, 04 Apr 2007
Learning Erlang.
The decision has been made, I'm going to learn the Erlang programming language. The main reason for this decision is that Erlang does one thing better than any other programming language I am aware of; parallel, concurrent and distributed processing.
The big problem with parallel and concurrent processing in other languages is that the standard method of communication between threads in most languages is shared data protected by mutexes or semaphores which are difficult to get right when there are a lot of threads or a lot of data to be protected. The standard solution to the problems of dealing with parallelism simply doesn't scale well.
Erlang excels at parallel processing because it forgoes the use of semaphores, mutexes and other synchronisation primitives. It replaces these shared data synchronisation methods with message passing; a much simpler mechanism which is much easier to reason about and much harder, maybe even impossible, to get wrong.
When learning, a new language, my usual approach is to write lots of small demo programs, with each one demonstrating a different feature. These programs come in really useful later as an easy to reference catalogue of language features.
Here's my first complete Erlang program, which takes any number of integer parameters on the command line and prints the factorial of each one:
#!/usr/bin/env escript
-export ([main/1]).
% Naive factorial function.
fac (0) -> 1 ;
fac (N) when N > 0 -> N * fac (N - 1).
% Function to print the factorial of each list element.
print_fact_list ([]) -> ok ;
print_fact_list ([Head | Tail]) ->
% Convert the Head from a string to an int.
Int = list_to_integer (Head),
% Calculate the factorial.
Fact = fac (Int),
% Print the result.
io:format ("fac ~w : ~w~n", [Int, Fact]),
% Call the function recursively with the tail of the list.
print_fact_list (Tail).
% Main function, accepts a list of strings contain argv [1], argv [2] etc.
main (List) ->
case length (List) of
0 -> io:format ("Usage : factorial.erl <number>\n") ;
_ -> print_fact_list (List)
end.
To me, this Erlang code looks a little like Ocaml and a little like Prolog which I used briefly at university over a decade ago. A couple of things to note:
- Comments begin with the percent character.
- All variable names have a leading upper case letter.
- Functions can be defined multiple times and pattern matching is used to decide which function variant is called.
- Strings are stored as lists of characters and converted to integers using the list_to_integer function.
To run this program requires Erlang, which on Debian and Ubuntu means the packages erlang, erlang-base, erlang-dev and erlang-manpages. It also uses escript, which comes standard with Erlang R11b4 and can be obtained here for earlier versions (Ubuntu Feisty has R11b2). Escript allows Erlang code to be run as a script, just like Python or Ruby.
The output of this program when passed the numbers 10, 20 and 30 results in the following output:
fac 10 : 3628800 fac 20 : 2432902008176640000 fac 30 : 265252859812191058636308480000000
Yep, Erlang uses arbitrary precision integers by default. Thats pretty cool.
Posted at: 23:07 | Category: CodeHacking/Erlang | Permalink
Wed, 28 Mar 2007
Tridge Was Right.
At Linux.conf.au 2005, Tridge gave a keynote talk about some of the issues the Samba team had run into when designing Samba4. While discussing the problems of writing a complex server which has to serve multiple simultaneous requests he put up a series of three slides. The first said:
Threads suck!
Having used OS level threads in the past, I was in complete agreement with this. The problems of sharing data across threads and locking/unlocking of that data to make sure the accesses are safe is simply too difficult for mere mortals to get right in anything other than trivial cases.
Tridges' second slide said:
Processes suck!
Splitting multi threaded code into multiple processes fixes the locking problems by removing the ability of the processes to share data (ignoring IPC shared memory of course). Obviously for a server program like Samba, this is not a solution.
The third slide in the series said:
State machines suck!
At the time of Tridge's keynote, I didn't really appreciate what he was saying.
The idea is really quite simple; everything is done in a single process so no locking is required. All I/O is multiplexed using the Unix select system call and a state machine keeps track of state of all of the I/O channels.
The problem with this is that any blocking I/O operation must be replaced with a non-blocking operation. Failure to do this will mean that a single I/O call that blocks will prevent the servicing of all other I/O operations until the blocked operation decides to complete and return control to the state machine.
However, the state machine model does work relatively well for simple examples. Unfortunately, non-blocking I/O leads to a second problem; writing code to do non-blocking I/O is significantly more difficult than for regular blocking I/O.
In my day job I've been working on some C++ classes which talk to a web server using HTTP POST operations over a keep-alive connection. This code had a couple of requirements:
- Must be non-blocking to fit in with the rest of the code.
- Must be capable of HTTPS connections using OpenSSL (which is a particularly nasty to get working in non-blocking mode).
- Must be able to connect via a HTTP proxy in both HTTP and HTTPS modes.
- Must be able to detect a connection that gets broken and gracefully re-establish it.
I now have code that fits these requirements and a pretty comprehensive test suite. With this experience behind me I have to say that getting this working was a royal pain in the neck. I also agree with Tridge; state machines suck almost as much as threads.
Maybe its time for me to learn Erlang.
Posted at: 22:36 | Category: CodeHacking | Permalink
Thu, 22 Mar 2007
Lazy Lists.
Lazy evaluation is a default feature of the Haskell programming language and an optional feature of Ocaml. Most programming languages (Ocaml, C, C++, Perl, Python, Java etc) use eager evaluation; where a result specified by a line of code is calculated as soon as the program gets to that line. Lazy evaluation on the other hand, defers the calculation of a result until that result is needed.
The real beauty of lazy evaluation is that a result that is never used is never
evaluated.
Lazy evaluation also allows the specification of lists which are effectively
infinite, as long as the programmer doesn't actually try to access every element
in the list.
Obviously, attempting to do so would take infinite time and and require infinite
memory to actually hold the list
.
While searching for information on Ocaml's lazy programming features I came across a post at the enchanted mind blog. That post is ok, but the code is just snippets and when put together as it is, doesn't actually work.
After a bit of fiddling around, I managed to get it working. However, once I understood it, I didn't think the example was as good as it could be. Firstly, the input to the lazy list is just a standard finite length Ocaml list, but more importantly it doesn't give any idea of how to do a potentially infinite list which is a much more interesting case.
That left the field open for a nice blog post demonstrating lazy lists in Ocaml. Read on.
Anybody who has done high school or higher mathematics would probably have come across recurrence relations the most well know of which is the Fibonacci sequence.
The Fibonacci sequence is often used as example for teaching the concept of recursion in computer science (even if some people think there are better examples). The Fibonacci sequence can be expressed recursively in Ocaml like this:
let rec fibonacci n =
match n with
| 1 -> 1
| 2 -> 1
| x -> (fibonacci (n - 1)) + (fibonacci (n - 2))
If one wanted to generate a list containing say the first 20 Fibonacci numbers using the above recursive function, the 19th number in the sequence would be calculated twice, the 18th number three times so on. Its simply not efficient.
A better solution is to use a lazy list, which calculates new values of the sequence as they are needed, based on entries already in the list. Here's an example that creates a lazy list of the fibonacci numbers:
type lazy_fib_t =
Node of int * lazy_fib_t Lazy.t
let create_fib_list () =
let rec fib_n minus_2 minus_1 =
let n = minus_1 + minus_2 in
Printf.printf "fib_n %d %d -> %d\n" minus_2 minus_1 n ;
Node (n, lazy (fib_n minus_1 n))
in
lazy (Node (1, lazy (Node (1, lazy (fib_n 1 1)))))
let print_fib_list depth lst =
let rec sub_print current remaining =
if current > depth then ()
else
match Lazy.force remaining with
| Node (head, tail) ->
Printf.printf "%3d : %d\n" current head ;
sub_print (current + 1) tail
in
sub_print 0 lst
let _ =
let fib_list = create_fib_list () in
print_fib_list 4 fib_list ;
print_endline "------------" ;
print_fib_list 6 fib_list ;
This is a complete working Ocaml program. To run it, just save the text to a file, say "lazy_fib.ml" and then do:
ocaml lazy_fib.ml
We'll look at the output in detail later. First lets break it down; looking at the program, from top to bottom we have:
type lazy_fib_t =
Node of int * lazy_fib_t Lazy.t
The above two lines define a recursive type called lazy_fib_t, which has a single variant called Node which contains a tuple of an integer and the head of a lazy list.
let create_fib_list () =
let rec fib_n minus_2 minus_1 =
let n = minus_1 + minus_2 in
Printf.printf "fib_n %d %d -> %d\n" minus_2 minus_1 n ;
Node (n, lazy (fib_n minus_1 n))
in
lazy (Node (1, lazy (Node (1, lazy (fib_n 1 1)))))
The function above, create_fib_list, creates a lazy list. It also contains an internal function, fib_n, which we'll look at later. The last line of the function is where all the magic is; it creates three nodes of a lazy list, the first two containing the first two integers of the Fibonacci sequence and a third node which is a closure, containing a call to the internal function fib_n with the correct parameters to generate the next number in the sequence.
The internal function fib_n takes two parameters, the values of the sequence for n - 1 and n - 2. From these two values, it generates the value for n, prints a message and then constructs a new Node containing the value for n and a lazy evaluation for the next value.
The next function is the function which prints the first n elements of a lazy list. It looks like this:
let print_fib_list depth lst =
let rec sub_print current remaining =
if current > depth then ()
else
match Lazy.force remaining with
| Node (head, tail) ->
Printf.printf "%3d : %d\n" current head ;
sub_print (current + 1) tail
in
sub_print 0 lst
The print_fib_list function contains an internal function sub_print which is called with a current depth of zero and the head of the lazy list to be printed. The internal function recursively moves down the list until current is greater than depth, which cause the recursion to complete and unwind.
At each node of the lazy list where current is less than or equal to depth, the function forces the evaluation of the node. The forcing will only evaluate a node if it hasn't already been evaluated. Once the node has been force evaluated, the value is printed and the function is called recursively.
Finally, the main function of the program is this:
let _ =
let fib_list = create_fib_list () in
print_fib_list 4 fib_list ;
print_endline "------------" ;
print_fib_list 6 fib_list ;
All it does is call the function create_fib_list, and then print the first four Fibonacci numbers of the list, prints a dashed line and then prints the first six Fibonacci numbers of the list. Its important to note that the print function is called with the same list on both occasions.
When the program is run, the output should look like this:
0 : 1
1 : 1
fib_n 1 1 -> 2
2 : 2
fib_n 1 2 -> 3
3 : 3
fib_n 2 3 -> 5
4 : 5
------------
0 : 1
1 : 1
2 : 2
3 : 3
4 : 5
fib_n 3 5 -> 8
5 : 8
fib_n 5 8 -> 13
6 : 13
As can be seen above, the first time the print function is called, the fib_n closure is called for all values of n greater than one. Each time fib_n is called a new node is generated in the list. When the print function is called the second time, it fib_n is only called for values that weren't evaluated on the first call to the print function just as was expected.
One of the few problems with the above implementation is that it uses integers which in Ocaml on 32 bit CPU platforms is only a 31 bit integer. It would however be relatively easy to use Ocaml's Big_int module which provides arbitrary length integers.
Posted at: 21:43 | Category: CodeHacking/Ocaml | Permalink
Wed, 21 Mar 2007
Xtreme Numerical Accuracy.
I'm working on a digital filter design program in Ocaml which was suffering from some numerical issues with Ocaml's native 64 bit floats. The problem was that the algorithm operates on both large floating point numbers and small floating point numbers. These numbers eventually end up in a matrix, and I then use Gaussian elimination to solve a set of simultaneous equations.
Anyone who has done any numerical computation will know that adding large floating point numbers to small floating numbers is a recipe for numerical inaccuracy. For me, the numerical issues were screwing things up badly.
When faced with a problem like this there are two possible solutions:
- Do all the computations symbolically, and only substitute numbers at the very last stage and then being careful to process the numerical parts in a way to minimize rounding and truncation error.
- Replace the floating point operations with operations on a number type which can provide higher (and maybe even arbitrary) precision.
The first option, doing all the computations symbolically was not practical due to the complexity of the computation. That left only the second option.
Looking around for what was available for Ocaml, I found the contfrac project on Sourceforge. As all the math geeks (hi Mark) have probably guessed by now, contfrac expresses numbers in terms of a really cool mathematical concept called continued fractions.
The idea is that any number can be represented by a (potentially infinite) list of integers [ a0 ; a1, a2, a3, ...]. Given the list of integers, the number itself can be calculated using:
All rational numbers have a finite length continued fraction expansion. For example, the rational number 75/99 is expressed as [ 0 ; 1, 3, 8 ].
Not surprisingly, all the irrational numbers have infinite length continued fraction expansions. The surprising thing (for me at least) is that many of the irrational numbers have CF expansions that are surprisingly regular. The square root of two is expressed as [ 1 ; 2, 2, 2, ...] with an infinitely repeating list of 2s. The natural logarithm e is expressed as [ 2 ; 1, 2, 1, 1, 4, 1, 1, 6, ...] which again has a regular pattern, as does the golden ratio, [ 1 ; 1, 1, 1, ...]. While all the previous CF expansions have a degree of regularity, the expansion of pi, is [ 3 ; 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 2, 1, 1, 2, 2, 2, 2, 1, 84, 2, 1, 1, 15, 3, 13,...], which looks completely random.
With numbers expressed as continued fractions, the Ocaml contfrac module then implements addition, subtraction, multiplication and division. Once the four arithmetic operations are defined, contfrac then implements a number of trigonometric and transcendental functions in terms of the same continued fractions.
Unfortunately, the module doesn't implement everything I need so I'm going to have to hack on some extra functionality. The actual Ocaml implementation uses Ocaml's lazy lists which is an aspect of Ocaml I hadn't played with yet. Time for some fiddling with lazy lists.
Posted at: 20:49 | Category: CodeHacking/Ocaml | Permalink
Wed, 14 Feb 2007
GNU gcc Stack Protection.
Wow, this is new. Version 4.1 of GNU gcc compiler shipped with Ubuntu Feisty includes stack smashing protection by default!
Consider the following code containing a buffer overflow of a stack based buffer :
#include <stdio.h>
static void
kill_my_stack (void)
{
char buffer [10] ;
int k ;
for (k = 0 ; k < 20 ; k++)
buffer [k] = 'a' + k ;
} /* kill_my_stack */
int
main (void)
{
kill_my_stack () ;
return 0 ;
} /* main */
Compiling this with the default gcc compiler in Feisty produces an executable which when run gives the following error:
*** stack smashing detected ***: /home/erikd/stack-protect-demo terminated
Aborted
Obviously, for an error as simple as this even basic static analysis should find it, but we know that the vast majority of people don't use static analysis. In fact many don't even compile with a sensible set of compiler flags turned on. Well now, those people are protected from themselves.
Posted at: 19:13 | Category: CodeHacking | Permalink
Thu, 01 Feb 2007
Spectrogram Fun!
Inspired by the spectrograms used in the SRC Comparison I decided to write a program that generates similar spectrograms from any given sound file. The program is now basically working and when run over a full song (the song "Vehicle" by the band "Golden Section") it produced this, which I think is quite beautiful:
The program is written in C and uses libsndfile (of course) for reading the sound file, FFTW for generating the spectrum data and the wonderful Cairo library for the image generation back-end.
I intend to release the code for this under the GPL as soon as I can clean it up a bit, add handling for multi-channel files and improve the command line option handling.
Posted at: 22:03 | Category: CodeHacking | Permalink
Tue, 30 Jan 2007
SRC Comparison.
One of my Free Software projects is Secret Rabbit Code, aka libsamplerate, aka the Rabbit, a library for performing sample rate conversion (Wikipedia) on audio signals. Recently, a company in Canada did a comparison of a number sample rate converters in professional audio software and also included the Rabbit in that test.
The tests were carried out by generating a input signal at a sampling rate of 96 kHz, configuring each sample rate converter to to do a conversion from 96 kHz input sample rate to 44.1 kHz output sample rate and passing the input signal through each converter and capturing each converter's output. The input test signal was a sine wave which sweeps from a low frequency of about 100 Hz at the start to a frequency of 44.1 kHz at the end. Finally, a spectrogram is then generated from each output signal.
The spectrogram of the output of Secret Rabbit Code's Best Sinc converter looks like this:
The spectrogram shows time in seconds along the x-axis and frequency in Hertz along the y-axis. The colour indicates the signal strength at each point in time and frequency, with white being the strongest signal (0 decibels) and black being the weakest signal (-180 decibels).
The tricky thing about the sample rate conversion process is that for any given sample rate fs, the highest frequency signal that can be correctly represented is at fs/2. When sample rate converting from 96 kHz to 44.1 kHz, all frequencies above half of the destination sample rate must be removed during the conversion process. Failure to do so will result in audio distortion and noise in the output signal.
Looking at the spectrogram of the Rabbit's output, its easy to see that the the main sweep (in bright white) clearly goes from some low frequency at the start to 22.05 kHz (half of the output sample rate) at 5 seconds. After about 5 seconds, the input signal's sine wave frequency goes above half the destination sample rate and the Rabbit does the correct thing and almost completely removes it.
The rest of the colour in the spectrogram is an artifact of the conversion process but by referencing the colour scale, its possible to confirm that all of these artifacts are 100 decibels below the level of the main signal. Ideally they shouldn't be there at all, but if they are the should be as low as possible.
Anyone who has read this far can now go to the comparison page pick any two converters and compare them. They can also confirm for themselves that although the Rabbit (Best Sinc) wasn't the best converter among the ones tested (that award would have to go to r8brain and iZotope), it certainly didn't disgrace itself either. A number of the commercial converters in expensive software packages (like Sony Vegas and Digital Performer) didn't perform all that well in comparison.
The good news is that the existence of commercial closed source converters that are better than the Rabbit gives me some incentive to come up with a better converter for inclusion in the Rabbit.
Posted at: 23:18 | Category: CodeHacking/SecretRabbitCode | Permalink
Thu, 21 Dec 2006
The Size of 'cp' (Update).
André Pang read my blog post about the size of the compiled Haskell 'cp' executable and suggested that something was wrong. So, I looked at it again.
My laptop is running Ubuntu Edgy and for some reason Edgy installs version 6.4.2 of the Glasgow Haskell Compiler. I also have a desktop machine running Debian Testing which has version 6.6 of of ghc.
Sure enough, ghc 6.6 generates a 255 kilobyte executable which is a huge improvement over the 1.5 megabyte executable produced by version 6.4.2.
Posted at: 21:18 | Category: CodeHacking | Permalink
Tue, 19 Dec 2006
The Size of 'cp'.
Conrad Parker blogged recently showing some simple examples in Haskell. I've been wanting to learn Haskell for a while so I took special interest in Conrad's post. For instance, the program implementing the basic functionality of the Unix cp in Haskell is small and extremely elegant:
import System.Environment
main = do
[infile, outfile] <- getArgs
s <- readFile infile
writeFile outfile s
However, on my machine (i686 laptop running Ubuntu Edgy), the generated executable is 1.5 megabytes in size even after being stripped. By way of contrast, the /bin/cp executable written in C is 56 kilobytes. WTF?
So lets look at the Ocaml version:
let _ =
let srcfile = open_in Sys.argv.(1) in
let destfile = open_out Sys.argv.(2) in
let maxlen = 8192 in
let str = String.create maxlen in
let count = ref 1 in
while !count > 0 do
count := input srcfile str 0 maxlen ;
output destfile str 0 !count ;
done
This is pure imperative code and doesn't use any of the functional language features of Ocaml, but it compiles to a 79 kilobyte stripped executable. Compared to the C executable, the Ocaml executable is 40% bigger and the Haskell one is 2500% bigger.
Obviously, the size of the executable is not the only determining factor in choice of programming language, but Haskell's executables do seem unreasonably large.
Posted at: 21:15 | Category: CodeHacking | Permalink
Thu, 16 Nov 2006
Non-recursive Automake.
A lot of people (yeah, you know who you are) bitch about Automake and the associated tools like autoconf and libtool. While I do agree that these tools do have problems and limitations, they are also a better soultion to the problem than any of the alternatives I have looked at.
The thing I really like about automake is that it does automatic dependency checking so that if the file foo.cc includes foo.h which includes bar.h which includes baz.h and baz.h changes, automake knows that foo.c needs to be recompiled. Manually keeping track of dependencies like these is a royal pain in the neck and getting it wrong can lead to really obscure Heisenbugs; for example, two C++ object files disagreeing on the parameter list of a method of a class.
I've had a number of projects that have used automake for years. However, all of these projects used the traditional recursive make scheme where there is a Makefile.am in each directory of the source tree. I continued to do it this way with automake even after reading Peter Miller's excellent paper Recursive Make Considered Harmful, but with hand written Makefiles, I usually took Peter's advice.
My first test of a non-recursive automake solution was for a project I'm doing at work. The project started out with a standard single top level non-recursive Makefile which handled the compiling of about 150 C++ source files which compiled to a couple of static convenience libraries, a main executable and a couple of test programs.
The big problem with the existing standard Makefile was that it didn't properly encode dependancies and hence I often had to do a "make clean" followed by a make to get the thing built correctly. Fixing this issue was the prime motivator for moving to automake.
One slightly unusual aspect of the project was the way project specific internal include files were referenced within the project. As a result of the project having been developed with a single top level Makefile from the beginning, all hash includes within the project are of the form:
#include <path/to/header.h>
with "path" being a directory in the same top level directory as the Makefile. What this means is that no source file which includes a header from within the project should need any extra project internal include path other than "-I.". This means that the resulting compile lines produced by automake (and libtool if it is in the picture) are considerably shorter than they would have been otherwise.
So, what does the Makefile.am look like?
Here's an example non-recursive Makefile.am which is basically a stripped down version of the one I'm using on my project but with some extra comments. Anyone who has hacked a Makefile.am before should be able to understand what is going on.
# Tell automake to put the object file for apple/apple.c in dir apple/
AUTOMAKE_OPTIONS := subdir-objects
# The installable executable.
bin_PROGRAMS = apple/apple
# Couple of python scripts used using build.
EXTRA_DIST = apple/version_create.py apple/tests/test_wrapper.py
# Convienience libraries required during build.
noinst_LTLIBRARIES = lib/libcore.la apple/libapple.la
# All the project related headers required for building.
noinst_HEADER = $(libcore_includes) $(libapple_includes)
# Test programs that will not be installed.
noinst_PROGRAMS = apple/tests/skin_test lib/pip/test/pip_test
# A couple of autogenerted header files.
nodist_include_HEADERS = apple/version.h
DISTCLEANFILES = apple/version.h
#=========================================================
# libcore : The core library routines.
lib_libcore_includes = \
lib/red.h lib/green.h lib/blue.h
lib_libcore_la_SOURCES = $(lib_libcore_includes) \
lib/red.cc lib/green.cc lib/blue.cc
#=========================================================
# libpip : All the pips.
apple_pip_libpip_includes = \
apple/pip/cat.h apple/pip/dog.h apple/pip/mouse.h
apple_pip_libpip_la_SOURCES = $(apple_pip_libpip_includes) \
apple/pip/cat.cc apple/pip/dog.cc apple/pip/mouse.cc
#=========================================================
# libapple : Everything in the application except main.cc.
libapple_includes = \
apple/granny.h apple/smith.h apple/johnathon.h
libapple_la_SOURCES = $(libapple_includes) \
apple/granny.cc apple/smith.cc apple/johnathon.cc
#=========================================================
# apple : The application.
apple_apple_SOURCES = apple/main.cc apple/version.h
apple_apple_LDADD = apple/libapple.la lib/libcore.la \
apple/pip/libpip.la $(EXT_A_LIBS) $(EXT_B_LIBS)
#=========================================================
# Test programs.
apple_tests_skin_test_SOURCES = apple/tests/skin_test.cc
apple_tests_skin_test_LDADD = lib/libcore.la apple/libapple.la
lib_pip_test_pip_test_SOURCES = lib/pip/test/pip_test.cc
lib_pip_test_pip_test_LDADD = lib/libcore.la apple/pip/libpip.la \
$(EXT_A_LIBS) $(EXT_B_LIBS)
check : $(noinst_PROGRAMS)
./apple/tests/skin_test
./lib/pip/test/pip_test
$(top_srcdir)/apple/tests/test_wrapper.py
@echo
@echo "All tests passed."
@echo
#=========================================================
# Autogenerated files and their dependancies.
apple/main.o : apple/version.h
apple/version.h : this_file_does_not_exist
$(top_srcdir)/apple/version_create.py $@
.PHONY : this_file_does_not_exist
The thing that surprised me most about converting this project to automake was how easy it was and how well it worked. I also immediately noticed that the autogenerted non-recursive make seemed to run a lot faster than I was used to with recursive make, but that is one of the benefits mentioned in Peter's paper.
Since this was such a success I'm going to look into applying this to some of my other projects.
Posted at: 21:49 | Category: CodeHacking | Permalink
Tue, 14 Nov 2006
Autoconf and #ifdef Considered Harmful.
I recently got an email from someone suggesting that my example for detecting the presence of libsamplerate was somewhat problematic. The crux of the complaint is that my configure.ac snippet ends up setting the HAVE_SAMPLERATE variable to 0 or 1 and that most people tend to use something like this in their C or C++ code:
#ifdef HAVE_SAMPLERATE
/* Some code which uses libsamplerate here. */
#endif
Obviously with HAVE_SAMPLERATE defined to 0 or 1, this code is not going to work as the developer expected. Instead, they should be using something like:
#if HAVE_SAMPLERATE
/* Some code which uses libsamplerate here. */
#endif
I know that the former idiom is more common, but I chose the second method for a reason; I believe that it is more robust. The problem of course is that autoconf and its related tools are fragile, but the standard idiom certainly doesn't help.
Consider the following code (from a bug I found in XMMS ):
#ifndef WORDS_BIGENDIAN
/* Some little endian machine specific code. */
#endif
The bug resulted from an error in configure.ac where the author had forgotten to invoke the autoconf macro which sets the WORDS_BIGENDIAN variable. The result was that this code compiled perfectly and ran perfectly on little endian machines. On big endian machines, it compiled perfectly and under certain circumstances failed badly causing really horrible noises to come out of my headphones.
Now consider my version where WORDS_BIGENDIAN gets set to either 0 or 1. In this case, if the author forgot to invoke the autoconf macro, WORDS_BIGENDIAN would have been undefined and this code:
#if (WORDS_BIGENDIAN == 0)
/* Some little endian machine specific code. */
#endif
would have failed to compile on both big and little endian machines.
It also doesn't take a genius to see that this is a symptom of a larger and much more common problem. In fact my solution is a classic example of moving something up Rusty's spectrum of interface simplicity. The original code was at position 11 on the scale (follow common convention and you'll get it wrong) and my alternative is at position 1 on the scale (compiler/linker won't let you get it wrong).
Posted at: 20:27 | Category: CodeHacking | Permalink
Wed, 01 Nov 2006
Monads Made Easy (and Hard).
For some time I've been rather keen on learning the Haskell programming language. The big problem for me was that when I started out trying to solve particular problems I quickly ran into a strange catch-22. Haskell uses a concept called Monads but it seemed to me that in order to understand Haskell one needs to understand Monads and in order to understand Monads, one needs to understand Haskell.
There are however numerous tutorials and explanations on Monads. For instance:
- The Monadic Way.
- What are Monads.
- All about Monads.
- Monads as Containers.
- Crash Monad Tutorial.
- Monads for the Working Haskell Programmer.
- You Could have Invented Monads.
I've looked at many of these tutorials but never managed to get Monads. Its not that I can't understand difficult concepts or that I can't handle weird ass programming languages. The problem was that all these tutorials explained monads from the point of view of people who already understood Monads.
Anyway, my difficulty with Monads is at an end. I've just found an explanation of Monads called Of monads and spacesuits written by Eric Kow. It explains Monads using astronauts, space stations and space suits. I finally get it!
And now that I do, I can play with Monads in Ocaml and also go ahead and learn Haskell.
However, I will not be using Monads in C++ because the C++ implementation is just too damn weird. It makes a concept that can already be difficult even harder as well as making it unreadable. A pox on C++.
Posted at: 20:24 | Category: CodeHacking | Permalink
Sun, 22 Oct 2006
Ocaml : Exception Backtraces.
There's a paper dated December 2002 by Kevin Murphy where he explains why he was looking at Ocaml. That article was recently linked on programming.reddit.com and there was a comment complaining that Ocaml couldn't print out backtraces on exceptions. Someone posted later that this was not right, but I've heard this complaint often enough that I thought I should blog about how to do it.
First off, Ocaml has two compilers, one which produces bytecode and one which produces native binaries. The native code compiler is not currently able to produce exception backtraces and this is where the Reddit commenter got the idea. However, there is a patch in the Ocaml bug tracker which adds backtrace capabilities. I'm hoping that this goes into the compiler proper in the next release or two.
For a project that is currently compiling with ocamlopt (the native code compiler), changing the to bytecode compiler is as simple as editing the Makefile and replacing all invocations of "ocamlopt" with "ocamlc -g" where the "-g" turns on exception backtraces. You can then rebuild the application. The final step is to turn on backtraces in the bytecode run time environment which is done by setting an environment variable:
export OCAMLRUNPARAM="b1"
Once compiled to bytecode and with the environment variable set, the application can be run and should produce the required backtrace. The following is an example of a backtrace from something I'm working on at the moment (I hacked the code to make sure I could get one).
Fatal error: exception Invalid_argument("index out of bounds")
Raised by primitive operation at unknown location
Called from file "meyers_diff.ml", line 93, characters 1-31
Called from file "meyers_diff.ml", line 200, characters 10-52
Called from file "meyers_diff.ml", line 221, characters 16-60
Called from file "meyers_diff.ml", line 264, characters 11-148
Called from file "meyers_diff.ml", line 305, characters 17-50
Called from file "array.ml", line 130, characters 31-51
Called from file "meyers_diff.ml", line 323, characters 1-316
Obviously it would be nicer if function names were included here, but this is more than sufficient for debugging purposes.
Posted at: 10:39 | Category: CodeHacking/Ocaml | Permalink
Wed, 04 Oct 2006
On Design Patterns.
The book "Design Patterns : Elements of Reusable Object-Oriented Software" by Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides is one of the most well known books about the concept of design patterns ; the idea of codifying generic solutions to recurring programming problems. The book is so well known that the authors came to be dubbed the Gang Of Four. When programmers talk of the Gang of Four they are usually talking about this book and these authors.
Unfortunately, even though I do own a copy I simply haven't managed to find the time to actually read it. As a coder whose two favorite languages are Ocaml and C this book is not directly applicable to my coding in those languages, especially since I rarely feel the need to use Ocaml's OO features. However, I do code in C++ and Python, both in my own time and at work and I suspect that reading and understanding this book would help my coding in both those languages.
So while I may not have read the actual book I do like to read articles and blog entries about the book. One that recently came to my notice was an article by Mark Dominus which spawned a response from one of the Gang of Four authors which in turn spawned another article from Mark Dominus.
Mark Dominus' original article and his follow up suggest that design patterns arise to make up for weakness in the languages they are implemented in. In particular, patterns that exist in one language may disappear completely in a more advanced language which replaces the pattern with something far more simple and not worthy of being called a "pattern".
I really do hope that I find time to read the book because it will be interesting to make up my own mind on this issue. It will be especially interesting to see how many of the patterns in this book apply in Ocaml.
Posted at: 22:18 | Category: CodeHacking | Permalink
Thu, 14 Sep 2006
N Squared Problems.
Important note to self : When working on an algorithm which displays N squared (or cubed or worse) behavior in Big Oh Notation fiddling with the algorithm is useless if N is large. The only ways to improve performance is to reduce the complexity to O(NlogN) or O(N) or alternatively, find a way to drastically reduce the size of the N.
This problem came up while working on my current main non-day-job application which will be released under the GPL when its ready. Even on reasonable input data sets, the value of N was well into the tens of thousands. If any significant processing needed to be done, the total time blows out really badly. This was simply not acceptable.
With many problems, processing can be traded off against memory usage. Unfortunately, for this problem, memory usage was also O(N2) and I didn't consider it acceptable for my application to chew up hundreds of megabytes of memory for reasonably small data sets.
What did finally work for this problem was a divide and conquer trick that allowed me to reduce N by two orders of magnitude and do more work on the smaller chunks of data instead.
I'll elaborate on this issue and its solution in the paper I'm writing for OSDC 2006. I'll also be submitting a paper proposal for Linux.conf.au 2007.
Posted at: 16:43 | Category: CodeHacking | Permalink
Mon, 11 Sep 2006
Ocaml : Code for Variant Types and Pattern Matching.
Since my blog post on Ocaml's Variant Types and Pattern Matching I've had two requests for the code, so here it is:
type expr_t =
| Var of string
| Plus of expr_t * expr_t
| Times of expr_t * expr_t
| Divide of expr_t * expr_t
let rec simplify expr =
match expr with
| Times (a, Plus (b, c)) ->
Plus (simplify (Times (a, b)), simplify (Times (a, c)))
| Divide (Divide (a, b), Divide (c, d)) ->
Divide (simplify (Times (a, d)), simplify (Times (b, c)))
| anything -> anything (* Comment : default case *)
let rec str_of_expr expr =
match expr with
| Var v -> v
| Plus (a, b) ->
"(" ^ (str_of_expr a) ^ "+" ^ (str_of_expr b) ^ ")"
| Times (a, b) ->
(str_of_expr a) ^ "*" ^ (str_of_expr b)
| Divide (a, b) ->
(str_of_expr a) ^ " / " ^ (str_of_expr b)
let _ =
let expr = Times (Var "x", Plus (Var "y", Var "z")) in
Printf.printf " orig : %s\n" (str_of_expr expr) ;
let expr = simplify expr in
Printf.printf " new : %s\n" (str_of_expr expr)
The above code is a single self contained program; to run that program, save it to to a file named say "cas.ml" then run it (assuming you have ocaml installed) using the command "ocaml cas.ml" which should result in the following output:
orig : x*(y+z) new : (x*y+x*z)
Obviously, this is just just a demo, but it should be pretty clear that this code could easily be extended to something more useful.
Posted at: 18:41 | Category: CodeHacking/Ocaml | Permalink
Sun, 03 Sep 2006
Ocaml : Variant Types and Pattern Matching.
At last month's SLUG meeting, Mark Greenaway asked if anybody knew of any good Computer Algebra Systems (CAS) available under Linux. I spoke up and told him that I looked around for the same thing some time ago, couldn't find anything that I liked so I ended up writing something that fit my particular needs from scratch.
Later that night I was talking to Robert Collins and Andrew Cowie about stretch languages; languages that differ radically from the languages a programmer already knows so that learning the new language teaches them new programming concepts and paradigms.
For me, my last stretch language was Ocaml which I started using around mid 2004. I discovered Ocaml when I went looking for a language to implement my Computer Algebra System (CAS) in. I did do a trial implementation in C, but that was simply too much of a pain in the neck. I also knew that C++ was not a sufficiently big step away from C to be useful for this project. My other main language at the time was Python, but I knew Python's dynamic typing would make my life difficult.
It was at about this time that I asked my friend André Pang to suggest a language. André had recently given a talk at SLUG titled "Beyond C, C++, Python and Perl" and seemed to know about a whole bunch of different languages. I told him that I was looking for something that was strongly typed, statically typed, had garbage collection for memory management and had Object Oriented (OO) features.
One of André's suggestions was Java which I was already familiar with. However, I disliked the fact that Java does not allow one to write code outside of a class and Java was also a little too verbose for my tastes. He also tried to push Haskell, but Haskell doesn't have traditional OO features. In retrospect this wouldn't have been a problem, but at the time I rejected Haskell for this reason. However, his final suggestion was Ocaml which seemed to fit all of my requirements. While investigating Ocaml I found a small example on the Ocaml Tutorial that implemented a bare bones CAS.
The two things that makes Ocaml really great for CAS are variant types and pattern matching on these variants. I'll look at these separately.
Variant Types in Ocaml.
Ocaml's variant types are a little like a type safe, bullet proof version of unions in C and C++. In Ocaml one defines a variant type like this:
type expr_t =
| Var of string
| Plus of expr_t * expr_t
| Times of expr_t * expr_t
| Divide of expr_t * expr_t
So, here we have a type named expr_t (a mathematical expression) that can hold one of four things:
- A variable name which is just a string.
- A plus node with a left sub expression and a right sub expression.
- A times node with a left sub expression and a right sub expression.
- A divide node with a numerator sub expression and a denominator sub expression.
All of the sub expressions are of the same type, expr_t, which makes this a recursive type. Using this recursive variant type, an expression like "x * (y + z)" can be build like this:
let expr = Times (Var "x", Plus (Var "y", Var "z")) ;;
which results in a tree structure with each operator and our variables x, y and z being held in a node of type expr_t and represented by a circle in this diagram:
with the variable expr being the Times node at the top of the diagram.
The really nice thing about variants is that each instance knows which variant it is. That means that its not possible (by mistake or on purpose) to access a node of one variant as another variant. The Ocaml compiler simply won't let that happen.
Compare this to a C version using unions where the programmer has be make sure he/she accesses each instance correctly, or the acres of code required to do the same thing with objects in C++ or Java.
Pattern Matching on Variants.
So once we can construct a mathematical expression we would also want to print it out. Thats where Ocaml's pattern matching comes in. Here's a function to convert any expression tree into a string representation that can be printed.
let rec str_of_expr expr =
match expr with
| Var v -> v
| Plus (a, b) ->
"(" ^ (str_of_expr a) ^ "+" ^ (str_of_expr b) ^ ")"
| Times (a, b) ->
(str_of_expr a) ^ "*" ^ (str_of_expr b)
| Divide (a, b) ->
(str_of_expr a) ^ " / " ^ (str_of_expr b)
The function is called str_of_expr and the "rec" just before the function name means that the function is recursive. The function takes a single parameter of type expr_t and returns a string.
The "match expr with" on the next line is a bit like a switch statement in C, C++, Java and other languages. On the lines following the match there are four options, one for each of the variants of the expr_t type. So for instance, if the expr is a Var variant the function just returns the string that is held by Var and if its a Plus node with two sub expressions, a and b, then the function is called on each of the sub expressions and a string is built using Ocaml's string concatenation operator "^".
The above usage of pattern matching is pretty simple and can done almost as easily in other languages. So lets look at something a little more complicated.
More Advanced Pattern Matching.
One of the many things one might want to do in a CAS is applying mathematical transforms on an expression. For instance, we might want to be able to expand out our expressions above "x * (y + z)" to give "(x * y + x * z)". Fortunately, using Ocaml's advanced pattern matching this is really easy. Here's an example:
let rec simplify expr =
match expr with
| Times (a, Plus (b, c)) ->
Plus (simplify (Times (a, b)), simplify (Times (a, c)))
| Divide (Divide (a, b), Divide (c, d)) ->
Divide (simplify (Times (a, d)), simplify (Times (b, c)))
| anything -> anything (* Comment : default case *)
The function simplify has two transformations and a default case which does nothing. Again, the function is recursive, but the first two match cases match on much more complex expression trees. In fact, the first match case will exactly match our expression and generate the expression we're after, "(x * y + x * z)".
Obviously, to make a real Computer Algebra System requires quite a bit more than what I have here. However, as you can see, Ocaml's variant types and pattern matching are a perfect fit for the problems a programmer writing a CAS would face. In fact, few other languages, with the possible exception of Haskell, would have fit this problem as well.
Posted at: 12:53 | Category: CodeHacking/Ocaml | Permalink
Thu, 31 Aug 2006
libsndfile 1.0.17.
I've just released a new version of libsndfile. The changes are:
- Add a C++ header file which acts as a wrapper around libsndfile's C API.
- Fix the documentation which describes using the precompiled windows DLL with evil ware compilers.
- Minor bug fixes and cleanups.
Posted at: 22:19 | Category: CodeHacking/libsndfile | Permalink
Wed, 16 Aug 2006
Why I Don't Like Dynamic Typing.
I've just spent the last week or so coding in Python, a language I first picked up about eight years ago. For a number of years, I used Python quite extensively but have found in the last year or so that I'm tending to favor Ocaml for many of the tasks where I used to use Python.
A large part of the reason for this switch has to do with type checking; the process where the compiler or run time environment checks whether operations on data held in variables is valid for the type of that data. For instance, the concatenation operation is probably valid for strings and lists, but simply doesn't make sense for integers.
The big difference between Ocaml and Python is that Ocaml uses static type checking while Python uses dynamic type checking. For those too lazy to follow the links, static typing means that type and related correct-ness checking of the program is done by the compiler at compile time wh