Tue, 28 Oct 2008

Zytouch Driver.

At bCODE, we're using some projected capacitive touchscreens by a UK company called Zytronic for our Ubuntu Dapper based embedded devices.

When was first looking for touchscreens back in 2006, we chose the ones form Zytronic because they connected via USB and they had Linux drivers. Unfortunately, these drivers turned out to be binary only and were compiled for Redhat 9. Of course the big difference between Redhat 9 and Dapper was the Redhat used Xfree86 and Dapper was using Xorg. In spite of that, the binary drivers did sort of work, but were flaky and at times chewed up way too much CPU for no apparent reason.

In order to get something working quickly, I snooped the USB bus of a touchscreen connected to a windows machine to get a basic idea of the USB communication. It turned out that under normal conditions, all data traffic was from the touchscreen to the host using USB bulk transfers. The odd thing was that the amount of data transferred per unit time was much more than one would expect from something like a touchscreen.

In order to explore the data more fully, I then used libusb to whip together a Linux program that could find the device (Product:Vendor identifier of 14c8:0002) on the USB bus and then sit in a loop reading the USB data and printing it to the screen in hexadecimal.

Watching the hex data scroll by, it quickly became obvious that the data was the raw X/Y capacitive sensor readings for the screen. One complete read from the screen would consist of 32 bytes, one for each of 16 x-direction columns and one for each of 16 y-direction rows. Getting good mouse pointer performance out of this data was a little difficult due to a number of factors:

Fortunately I had quite a bit of Digital Signal Processing knowledge and already had all the conceptual tools I needed to deal with this raw data. In the end I came up with a solution [Note 1] that consisted of:

Once we had the X/Y position data we needed to get it into the Xserver. The easiest and quickest way to do this was using the XTest extensions.

bCODE has now been using this touchscreen driver for two years, even updating it to use a later model from Zytronic (Product:Vendor identifier of 14c8:0003) which did its own filtering and spat out high quality X/Y values.

The reason I am blogging all this now is that bCODE has decided to release the code for this driver under the terms of the GNU GPL V3. The code is copyright bCODE Pty Ltd and I am listed as the author and maintainer and the code with live on my web site. Most of the nitty gritty details are in the Readme.txt file. The driver source code also includes a calibration utility written in Ocaml using the Ocaml Cairo bindings.

Note 1: It should be noted that I did most of the reverse engineering work during a period when I was insanely busy. After that week, I had a very rough proof of concept driver that still needed quite a bit of work. In the following three weeks, my manager Sean, in his spare time, hacked on the code, fixed bugs and even improved X/Y positioning algorithm.

Posted at: 22:03 | Category: CodeHacking | Permalink

Sat, 18 Oct 2008

Foobar 2000 and the Rabbit.

Foobar 2000 (see also the Wikipedia entry) is a media player for a legacy operating system commonly known as Microsoft Windows. Secret Rabbit Code is an audio sample rate converter that I wrote, and released under the terms of the GNU GPL 2002.

As the sole author and copyright owner of Secret Rabbit Code I have also made it available under a commercial use license (PDF) that is currently earning me a small income. However, developing Secret Rabbit Code was difficult, took a huge amount of research and the development of many prototypes which were thrown away. Now, after 6 years, that income is coming close to covering the cost of developing that initial version. It still has some way to go to cover the cost of the subsequent maintenance and the improvements I have made.

In 2005 I became aware that someone had released a binary only plugin for Foobar 2000 that used Secret Rabbit Code to do sample rate conversion. The fact that this was a binary only release was not the only problem. There was also the problem of Foobar being under a license that was not GPL compatible.

First off, I emailed the ISP in France where the binary was being hosted and asked for the download to be taken down. I also tried to track down the author of the plugin since authorship was not obvious from the download. Within a day or two I was able to track him down via the Hydrogen Audio forums.

The final result of the discussion on that forum was that I decided I would write and release a Secret Rabbit Code based plugin for Foobar. This of course involved development for windows on windows a platform I rarely if ever use and which I actually prefer not to work on. However after some 10 or 12 hours work I had a working plugin that I posted on my website. I also added a Paypal donation button on the page hoping to get paid for the time and effort I put into creating the plugin.

Unfortunately, the donations have been few and far between. Since 2005 when the plugin was released I have only had about 10 people pay the measly US$10 I am asking for. That is despite the fact that I have proven interest in this plugin. The page has had over 30000 hits since 2005, and I get an email every couple of weeks asking if there is a more recent version using later versions of Secret Rabbit Code or a version for other versions of Foobar 2000.

So here's how it stands:

Since most users of this plugin don't think I should be paid for my work creating it for use with Foobar on Windows, I currently have no intention of releasing a new version of this plugin. When and if I get a decent stream of payments for the work I have put in so far I will roll out a new version and maybe even an updated plugin for other versions of Foobar.

I will answer Foobar related emails from people who have donated or work for companies that paid for a commercial use license. The vast majority of other emails regarding the Foobar plugin will be directed to this blog post.

Posted at: 09:21 | Category: CodeHacking/SecretRabbitCode | Permalink

Wed, 20 Aug 2008

Just Drawing Stuff on the Screen.

Richard Jones laments that drawing stuff on the screen is harder than it should be. I haven't seen his code, but it looks like he might be trying to do it with Ocaml and GTK which probably is more difficult than it should be. GTK isn't really meant for that sort of stuff.

Fortunately, there is a really well designed and thoroughly thought out library for doing graphics called Cairo, which even has a really great set of Ocaml bindings. On Debian/Ubuntu, the Cairo bindings can be installed using:


   sudo apt-get install libcairo-ocaml-dev

I messed about with Ocaml and Cairo about a year ago and came up with this little demo.


  (*
  **    http://www.e-dsp.com/what-are-fourier-coefficients-and-how-to-calculate-them/
  **
  **    http://en.wikipedia.org/wiki/Fourier_series#Definition
  *)

  type fourier_series_t =
  {   a0 : float ;
      an : float array ;
      bn : float array ;
      }


  let initial_size = 200

  let two_pi = 8.0 *. atan 1.0

  let sum_float_array ary =
      Array.fold_left (fun x y -> x +. y) 0.0 ary


  let calc_series max_n ary =
      (*
      **    This uses a rough numerical approximation to integration.
      **    As long as the array is long enough (say 1000 or more elements), the
      **    results should be reasonable.
      *)
      let len = float_of_int (Array.length ary) in
      let calc_Xn trig_func n =
          let n = n + 1 in
          let ary = Array.mapi (
		  		fun i x -> x *.
                trig_func ((float_of_int (n * i)) *. two_pi /. (len -. 1.0))
				) ary
          in
          2.0 *. (sum_float_array ary) /. len
      in
      let a0 = (sum_float_array ary) /. len in
      let an = Array.init max_n (calc_Xn cos) in
      let bn = Array.init max_n (calc_Xn sin) in
      { a0 = a0 ; an = an ; bn = bn }


  let waveform_of_series outlen series =
      (*
      **  Given a fourier series, calculate a single cycle waveform of the
      **  specified length.
      *)
      let calc_point i =
          let x = two_pi *. (float_of_int i) /. (float_of_int (outlen - 1)) in
          let asum = sum_float_array (Array.mapi (
                    fun i an -> an *. (cos (float_of_int (i + 1) *. x))) series.an
                    )
          in
          let bsum = sum_float_array (Array.mapi (
                    fun i bn -> bn *. (sin (float_of_int (i + 1) *. x))) series.bn
                    )
          in
          series.a0 +. asum +. bsum
      in
      Array.init outlen calc_point


  let fold_over_clipped_sine gain len =
      let point i =
          let x = gain *. sin (two_pi *. (float_of_int i) /. (float_of_int len)) in
          if x > 1.0 then x -. 2.0
          else if x < -1.0 then x +. 2.0
          else x
      in
      Array.init len point


  let redraw w series _ =
      let cr = Cairo_lablgtk.create w#misc#window in
      let { Gtk.width = width ; Gtk.height = height } = w#misc#allocation in
      Cairo.save cr ;
      (   Cairo.identity_matrix cr ;
          let border = 20.0 in
          Cairo.move_to cr border border ;
          Cairo.line_to cr border (float_of_int height -. border) ;
          Cairo.stroke cr ;

          let wave_width = width - 100 - (int_of_float border) in
          let middle = float_of_int height /. 2.0 in
          let wave_height = 0.7 *. (middle -. border) in

          Cairo.move_to cr border middle ;
          Cairo.line_to cr (border +. float_of_int wave_width) middle ;
          Cairo.stroke cr ;

          Cairo.move_to cr (border +. float_of_int wave_width) border ;
          Cairo.line_to cr (border +. float_of_int wave_width)
                                      (float_of_int height -. border) ;
          Cairo.stroke cr ;

          Cairo.set_source_rgb cr 1.0 0.0 0.0 ;
          let wave_data = waveform_of_series wave_width series in
          Cairo.move_to cr border (float_of_int height /. 2.0) ;
          Array.iteri (fun i x ->
                        Cairo.line_to cr (border +. float i)
						      (middle -. wave_height *. x))
                        wave_data ;
          Cairo.stroke cr ;
          ) ;
      Cairo.restore cr ;
      true


  let () =
      if Array.length Sys.argv != 2 then
      (   Printf.printf "Usage : %s <series length>\n\n" Sys.argv.(0) ;
          exit 0 ;
          ) ;

      let series_len = int_of_string (Sys.argv.(1)) in

      let w = GWindow.window ~title:"Fourier Series Demo" ~width:600 ~height:400 () in
      ignore (w#connect#destroy GMain.quit) ;

      let b = GPack.vbox ~spacing:6 ~border_width:12  ~packing:w#add () in
      let f = GBin.frame ~shadow_type:`IN ~packing:(b#pack ~expand:true ~fill:true) () in
      let area = GMisc.drawing_area ~width:initial_size ~height:initial_size
                    ~packing:f#add ()
      in
      let array_len = 1000 in
      let wave = fold_over_clipped_sine 1.2 array_len in
      let series = calc_series series_len wave in

      ignore (area#event#connect#expose (redraw area series)) ;

      w#show () ;
      GMain.main ()

The above code can be compiled using:


    ocamlopt -I +cairo -I +lablgtk2 cairo.cmxa lablgtk.cmxa cairo_lablgtk.cmxa \
	    gtkInit.cmx fsdemo.ml -o fsdemo

and the output looks like this:


Fourier Series Demo screen shot

So while I agree that the 140 of lines of code here is about 30 times as much as Richard's code from his ZX80 days, I also think the results are at least 30 times as good.

Posted at: 22:29 | Category: CodeHacking/Ocaml | Permalink

Mon, 18 Aug 2008

Nemiver : A GUI debugger for GNOME.

The many years, Linux has lacked a good GUI debugger for C and C++ programs. Yes, everyone knows about GNU GDB, but that is a command line debugger and really not very useful for stepping through a program. There was also the Data Display Debugger (DDD) which uses the Motif widget set, usually supplied by the Lesstif Project. Unfortunately, Lesstif development has basically been abandoned and OpenMotif is not really an option because its license fails to meet term 8 of the Open Software Definition.

This means that for many years, developers on Linux have tended to avoid the "stepping through code with a debugger" approach to debugging. While I think that single stepping is not the most applicable to every debugging problem, there are times when single stepping is useful and possibly also the fastest way to track down a problem.

However, I was recently made aware of a new GUI debugger for the GNOME (ie really the Linux) desktop, Nemiver.


nemiver screen shot

The only problem with nemiver is that the version in Ubuntu Hardy is a little old and was giving me a few troubles. However, after building and installing a version 2.2.0-2 package of libgtksourceviewmm-2.0 from source I was able to build nemiver from SVN and so far its working way better than DDD ever did.

So, here it is, a good looking, stable and capable GUI debugger for Linux.

Posted at: 19:39 | Category: CodeHacking | Permalink

Sat, 19 Jul 2008

Ocaml and Unix.select.

At the June meeting of FP-Syd, Tim Docker gave a presentation about his Tuple Space Server written in Haskell. This presentation rather intrigued me because I have had a long term interest in numerical analysis and numerical optimisation problems which lend themselves very well to parallel and distributed computing. I decided I should write a Tuple Space Server myself, in Ocaml.

Tim's Tuple Space server used threads and Software Transactional Memory (STM) to handle the connection of multiple masters and workers to the server itself. Although the Ocaml CoThreads library does have an STM module I thought there was probably an easier way.

In my day job I'm working on some C++ code that handles multiple network sockets and open file descriptors using the POSIX select system call. On Linux at least, there is a select tutorial man page which gives a example of using select written in C.

The beauty of select is that it allows a single process to multiplex multiple sockets and/or file descriptors without resorting to threads. However, the C example in the tutorial clearly demonstrates that this system call is a bit of a pain to use directly. Fortunately, for the project at work, I had some really great C++ base classes written by my colleague Peter to build on top of. These base classes hide all the nastiness of dealing with the system call itself by wrapping the select call into a daemon class and providing a simple base class which clients of the select call can inherit from.

For Ocaml there is a thin wrapper around the C library function in the Unix module and it has the following signature:


  val select :
    file_descr list -> file_descr list -> file_descr list -> float ->
      file_descr list * file_descr list * file_descr list

It takes three lists of file descriptors (one descriptor list for each of read, write and exceptions), a float value for a timeout and returns a tuple of three lists; one each for the file descriptors ready for reading, writing and exception handling.

Whereas the C++ solution had a daemon class, the Ocaml version instead has a daemon function. The daemon function operates on a set of tasks, with one file descriptor per task. Each file descriptor was embedded in a struct which I named task_t:


  type task_t =
  {   fd : Unix.file_descr ;
  
      mutable wake_time : float option ;
  
      mutable select_on : bool ;

      mutable process_read : task_t -> bool * task_t list ;
  
      mutable process_wake : task_t -> bool * task_t list ;
  
      finalize : task_t -> unit ;
      }

The fields of the struct are as follows:

The first thing to note in the above is the careful use of an immutable field for the file descriptor and mutable fields for process_read, process_wake and wake_time. The file descriptor is immutable so that any client code does not change its value behind the back of the daemon.

The others fields of the struct are purposely made to be mutable so that they can be changed on the fly. The functions process_read and process_wake both return their results in the same manner, a tuple containing two items:

The actual daemon run loop keeps the tasks in a hash table where the key is the file descriptor. Once the initial set of tasks is in the hash table, the loop basically does the following:

  1. Find the file descriptors of all the tasks in the hash table which their select_on field set to true (uses Hashtbl.fold).
  2. Find the minimum wake_time timeout of all the tasks (this is actually done on the same pass over all items in the hash tables as step 1.).
  3. Pass the file descriptors from step 1. to the select with the timeout value found in 2. (The lists for writable and exception file descriptors are empty.)
  4. When select returns a list or file descriptors ready to be read, map the file descriptor to a task using the hash table and then run the process_read function of each readable task.
  5. For each task whose wake_time is exceeded, run its process_wake function.
  6. For steps 4. and 5., if a task's process function returns false as the first element of the tuple it returns, remove the task from the hash table and run the task's finalize function. Also if the second element in the tuple is a non-empty list, then add the tasks to the hash table.

The above code was placed in a module named Daemon. Using this module, I've whipped up a simple demo program, an echo server the source code of which is available here. The tarball contains four files:


Makefile The project's Makefile.
daemon.ml The Daemon module.
echo-server.ml The Echo server.
tcp.ml A module of TCP/IP helper functions.

To compile this you will need the Ocaml native compiler which can be installed on Debian or Ubuntu using:


  sudo apt-get install ocaml-nox

The server can be built using make and when run, you can connect to the server using:


  telnet localhost 9301

All lines sent to the server will be immediately echoed back to you.

Posted at: 21:10 | Category: CodeHacking/Ocaml | Permalink

Sat, 24 May 2008

Objects vs Modules.

Although I've been using Ocaml for a several years now, I've not yet been in a situation where I've needed to write an Ocaml class to define a C++/Java/Python/Smalltalk/OO style object. I've found that most of the problems I encountered could be easily solved using functional code and that Ocaml's objects didn't provide an obviously better solution. Until now (or so I thought).

The problem was one of moving around the filesystem keeping track of the old directories so they were easy to return to. The obvious model for this was the pushd and popd built-ins in command shells like GNU Bash. This functionality can be easily wrapped up in an Ocaml object as in the following example and demo code (which needs to be linked to the Unix module):


  class dirstack = object
      val mutable stack = []

      method push dirname =
          (* Find the current working directory. *)
          let cwd = Unix.getcwd () in
          (* Change to the new directory. *)
          Unix.chdir dirname ;
          (* If successful, push old cwd onto the stack. *)
          stack <- cwd :: stack

      method pop () =
          match stack with
          |    [] -> failwith "Directory stack is empty."
          |    head :: tail ->
                  Unix.chdir head

  	end

  let () =
      print_endline (Unix.getcwd ()) ;
      let dstack = new dirstack in
      dstack#push "/tmp" ;
      print_endline (Unix.getcwd ()) ;
      dstack#push "/bin" ;
      print_endline (Unix.getcwd ()) ;
      dstack#pop () ;
      print_endline (Unix.getcwd ()) ;
      dstack#pop () ;
      print_endline (Unix.getcwd ())


However, there are some problems with the above code. Firstly, if the push and pop methods need to be used throughout the program, the dstack object needs to be made more widely accessible using one of the following three methods:

  1. Being placed in the global scope.
  2. Being made into a Singleton objecct.
  3. Being passed around as a parameter to whatever function may need it.

Yuck! Yuck! Double yuck! Suddenly, this object oriented solution didn't look like such a great idea.

Then it struck me. This object can be easily transformed into an Ocaml module like this:


  module Dirstack = struct
      let stack = ref []

      let push dirname =
          (* Find the current working directory. *)
          let cwd = Unix.getcwd () in
          (* Change to the new directory. *)
          Unix.chdir dirname ;
          (* If successful, push old cwd onto the stack. *)
          stack := cwd :: !stack

      let pop () =
          match !stack with
          |    [] -> failwith "Directory stack is empty."
          |    head :: tail ->
                  stack := tail ;
                  Unix.chdir head

     end

  let () =
      print_endline (Unix.getcwd ()) ;
      Dirstack.push "/tmp" ;
      print_endline (Unix.getcwd ()) ;
      Dirstack.push "/bin" ;
      print_endline (Unix.getcwd ()) ;
      Dirstack.pop () ;
      print_endline (Unix.getcwd ()) ;
      Dirstack.pop () ;
      print_endline (Unix.getcwd ())

This solution using a module is much better than the one using an object. The Dirstack module itself is globally accessible and is already a singleton while the stack used to hold past directories is implemented as a list whose scope is limited to the module itself. (Furthermore, if Dirstack is implemented in its own file instead of using a module defined within a larger file, then the stack variable can be hidden completely by not listing it in the Dirstack interface file.)

So while I'm pleased with this solution, it does mean that I'll have to continue my hunt for a problem where an object provides a better solution than any other feature of the Ocaml language. This is particularly ironic because when choosing between two strict statically typed languages, Haskell and Ocaml, I chose Ocaml because I thought I needed objects. However, I stuck with Ocaml because of its pragmatism.

Posted at: 07:45 | Category: CodeHacking/Ocaml | Permalink

Sun, 20 Apr 2008

Cross Compiling for Legacy Win32 Systems (Part 2).

Cross compiling from Linux to Windows requires the installation of a couple of packages. On a Debian or Ubuntu system this can be done using:


  sudo apt-get install build-essential
  sudo apt-get install mingw32 mingw32-binutils mingw32-runtime wine

I'm running Ubuntu's Hardy Heron pre-release and the following is known to work with these versions:


  mingw32               4.2.1.dfsg-1ubuntu1
  mingw32-binutils      2.17.50-20070129.1-1
  mingw32-runtime       3.13-1
  wine                  0.9.59-0ubuntu5

For an example of a project which can be successfully cross-compiled, I have chosen libogg which is one of the two libraries required to encode and decode Ogg/Vorbis files. I also happen to know that the current libogg sources in the Xiph Foundation's SVN repository cross-compile from Linux to Windows correctly because I committed the patch to make it possible.

However, we need to look ahead a little. After we have cross compiled libogg we will also want to cross compile the associated libvorbis library which relies on libogg. We therefore need to configure libogg so that when we install it, it can be found by the libvorbis configure script.

For me that meant creating a MinGW32 directory in my home directory:


  mkdir $HOME/MinGW32

The next step to to grab the libogg source code from the Xiph SVN server. This can be achieved using the command:


  svn co http://svn.xiph.org/trunk/ogg libogg

Changing into the libogg directory, we are now ready to configure, test and install the library. That can be done using:


  ./autogen.sh
  ./configure --host=i586-mingw32msvc --target=i586-mingw32msvc \
      --build=i586-linux --prefix=$HOME/MinGW32
  make
  make check
  make install

The first command above, runs the auto tools to generate that configure script. The second command, configure is broken across two lines. It sets up the generated Makefiles to compile Windows binaries from a Linux host, with the install directory we set up before. The third line builds the windows version of libogg, the fourth line runs the test suite, with the windows executables being run under WINE and the final line installs everything in the MinGW32 directory created earlier.

All of the above commands should pass without errors. If they don't, check your versions of of the mingw cross compiler tools and/or WINE.

Posted at: 20:51 | Category: CodeHacking/MinGWCross | Permalink

Wed, 16 Apr 2008

Cross Compiling for Legacy Win32 Systems (Part 1).

My main two FOSS projects, libsndfile and libsamplerate have significant numbers of users that are tied to that particularly odious legacy system, Microsoft Windows. Since I don't normally use Windows myself, maintaining support for that OS has always been a huge pain in the neck.

Originally I shipped Microsoft project files for libsndfile, but that became unworkable because the different versions of the Microsoft tools (Visual C++ 5, Visual C++ 6, Visual Studio 2003, Visual Studio 2005 etc) used different and incompatible project file formats. I solved this by shipping a simple Makefile that used Microsoft's nmake and the command line compilers to build libsndfile. However, by about 2004, the Microsoft compiler's complete lack of support for the 1999 ISO C Standard made maintaining support too much trouble, so it was dropped.

Instead, I started using Cygwin and MinGW to compile libsndfile on Windows. Both of these tool-sets use a version of the GNU GCC compiler just like Linux and building libsndfile using these two tool-sets was trivial:


  ./configure
  make
  make check

Of course there were howls of protest from Windows users, but since they (with a small number of exceptions) had contributed so little, I didn't fell like I owed them anything. I also started releasing pre-compiled Windows binaries at the same time as the source code tarballs were released.

However, while the MinGW compiler was a huge improvement over the Microsoft one it was still a huge pain in the neck. I had to keep a Windows machine and keep it updated and patched against vulnerabilities. Furthermore, installing and updating MinGW was a painful manual process. Oh how I longed for a Debian/Ubuntu style apt-get command to look for and install updates. Finally, copying source code back and forth between Linux and Windows while debugging Windows issues was another pain point because version control systems like GNU Arch and bzr simply didn't work very well on Windows.

In about 2004, I tried the MinGW Linux to Windows cross compiler, a compiler that runs on Linux but generates binaries for Windows. This compiler worked, but left one rather large problem; how do I run libsndfile's rather large and comprehensive test suite? Compiling libsndfile without running the test suite is a waste of time. I did try to run the tests under WINE (the Windows emulator), but at the time tests were failing under WINE that didn't fail on Windows.

From that time on, I would try running the cross-compiled test suite under WINE once or twice a year. Then, some time in the last year or so, the number of problems with the test suite dropped to one, which was only a FIXME message. A little hacking on the WINE sources resulted in a patch that was sent to the WINE mailing list and has since been applied to the main WINE source tree.

With that bug fixed, I can now cross compile from Linux to Windows and run the full libsndfile test suite under WINE. That means that Windows has just become that little bit less relevant that it was before.

A future post will explain how to set up the cross compiler and WINE and walk through compiling and testing of a standard FOSS project.

Posted at: 23:12 | Category: CodeHacking/MinGWCross | Permalink

Fri, 11 Apr 2008

You Stupid Git!

As far as I can tell, the absolute, canonical, got-to-first documentation for the git distributed version control system (DVCS) can be found here:

http://www.kernel.org/pub/software/scm/git/docs/user-manual.html

This documentation seems comprehensive and well laid out. It explains commits, manipulating-branches, merging, collaborative development and the pretty damn interesting rebase and bisect commands. This documentation is called a user manual but it contains sufficient examples to make it a pretty damn fine tutorial.

Normally something like "here's a link to the documentation" would not be worthy of a blog post. However, failure to find the canonical user manual could lead a person (ie me) to post messages to mailing lists saying things like:

"I'm sure git is very clever and all, but its UI and documentation is probably the most user hateful thing I have seen [since] sendmail's cf files."

or, on finding a one hour long video screen-cast tutorial (apparently aimed at all those Ruby on Rails writing Mac OSX users):

"This makes me wonder, how fscked up does a DVCS have to be that you need tens of megabytes of video to show how it works when Bzr and many others can do it with less than ten kilobytes of html text?"

So while I was wrong about the documentation I still have huge reservations about git's user interface and stand by this statement:

"I am currently trying to learn git and I can see very clearly that git is designed by kernel programmers whose normal approach to a user interface is something like a Unix system call."

I'm sure git is a powerful tool and the rebase feature is something I've been wishing for in other systems for some time, but git's UI is already starting to grate.

Posted at: 20:02 | Category: CodeHacking | Permalink

Sun, 06 Apr 2008

Ocaml : Exception Back Traces in Native Code.

Some time ago I wrote a blog post about exception back traces which at the time of that post only existed for the Ocaml byte code compiler.

However, version 3.10 of the Ocaml compiler which was released about a year ago, included exception back traces for native code as well as byte code. With the imminent release of Ubuntu's Hardy Heron, version 3.10 of the compiler is about to become much more widely available .

Enabling exception back traces is as simple as adding the "-g" option to the ocamlopt command line and then setting a single environment variable as follows.


  export OCAMLRUNPARAM="b1"

Posted at: 12:48 | Category: CodeHacking/Ocaml | Permalink

Sun, 30 Mar 2008

libsamplerate 0.1.3.

About a week ago I released a new version of SecretRabbitCode (aka libsamplerate).

The major change was that the new improved SINC based converters I blogged about here are now the default. There were also a couple of minor bug fixes.

The fine people at Infinitewave have now updated their test results to include the new converter and it shows Secret Rabbit Code comes very close to the best of the commercial converters in terms of quality.

Posted at: 15:11 | Category: CodeHacking/SecretRabbitCode | Permalink

Mon, 24 Mar 2008

Cross Compiling with pkg-config.

I'm currently playing with the MinGW cross compiler versions of the GNU C and C++ compilers available via apt-get on Debian and Ubuntu systems. These cross compilers generate windows binaries from a Linux host system which is potentially a much less painful way turning FOSS code into binaries for that particularly odious legacy platform.

Most of the software I'm compiling uses the GNU tools; autoconf, automake, libtool and pkg-config for configuring the software before compiling. Autoconf already has good support for cross compiling and automake and libtool just do what autoconf tells them to do. Pkg-config however is the odd one out.

Pkg-config's job is to retrieve information about installed libraries so that the compiler can find the required header files for inclusion and libraries for linking. For instance, if you wanted compile a program that uses the gconf-2.0 library you could find out the required CFLAGS to be passed to the C compiler and required libraries for linking, by doing something like the following in the Makefile.


  GCONF_CFLAGS = $(shell pkg-config --cflags gconf-2.0)
  GCONF_LIBS = $(shell pkg-config --libs gconf-2.0)

In the above example, when pkg-config is run, it looks in the directory /usr/lib/pkg-config/ and reads information from the file gconf-2.0.pc (each installed library should have one or more of these pkg-config files) which then gets printed out. While the information given by pkg-config would be correct for a native build, it is unlikely to be correct for the cross compiling case.

This issue came up as early as 2003 and there is even a wiki page which suggests some quite extensive changes to pkg-config. Unfortunately I think these suggestions are somewhat fragile and pkg-config itself (I'm using version 0.22) already has features for a better solution.

Like many Unix programs, pkg-config's behaviour can be modified by manipulating certain environment variables. The pkg-config man page explains these variables very well. The first one is PKG_CONFIG_LIBDIR which modifies the default location where pkg-config looks for its per installed library config file. Secondly, the PKG_CONFIG_PATH variable can be set to allow additional pkg-config search paths.

Overriding these two variables results in a MinGW cross pkg-config bash script which I have named i586-mingw32msvc-pkg-config and which looks like this:


  #!/bin/bash

  # This file has no copyright assigned and is placed in the Public Domain.
  # No warranty is given.

  # When using the mingw32msvc cross compiler tools, the native Linux
  # pkg-config executable works fine as long as the default PKG_CONFIG_LIBDIR
  # is overridden.
  export PKG_CONFIG_LIBDIR=/usr/i586-mingw32msvc/lib/pkgconfig

  # Also want to override the standard user defined PKG_CONFIG_PATH with
  # a mingw32msvc specific one.
  export PKG_CONFIG_PATH=$PKG_CONFIG_PATH_MINGW32MSVC

  # Now just execute pkg-config with the given command line args.
  pkg-config $@

Now autoconf generated configure scripts that realise that the i586-mingw32msvc-gcc cross compiler is being used will run the above script and get suitable information for the cross compiler rather than the native compiler.

The only downside to this solution is that a separate script is required for each cross compiler which uses pkg-config. This however is a minor price to pay and it is unlikely that people will end up with huge numbers of XXXX-pkg-config scripts like was common before the widespread use of pkg-config.

Until a better solution becomes available, this is what I will be using.

Posted at: 13:24 | Category: CodeHacking/MinGWCross | Permalink

Sat, 08 Mar 2008

Progress on the Rabbit.

For over three years now, I have been working on (on and off, but mostly off) a new algorithm for doing audio sample rate conversion in Secret Rabbit Code. The idea for the new algorithm has been rattling around in my head for most of that time, but the problem was always the implementation. While I am making progress it has been slow.

However, a public comparison between a large collection of converters showed that while the conversion quality of Secret Rabbit Code was good, it was nowhere near state of the art.

In order to see if I could get Secret Rabbit Code closer to state of the art quickly, I decided to revisit the existing converter during the xmas/new-year break.

The existing converter had a set of digital filters whose coefficients were generated by a small program written in GNU Octave. My first task was to convert that program to Ocaml which has become my favourite language for technical computing. I then spent quite a bit of time finding and analyzing where the filter design program was loosing precision and finding work arounds. Finally, I spent even more time looking at how the different filter design parameters interact with one another and with the conversion algorithm itself.

Fortunately, all this work has paid off. The result is new versions of the SRC_SINC_MEDIUM_QUALITY and SRC_SINC_BEST_QUALITY converters. The old versions of these converters have been renamed to SRC_OLD_SINC_MEDIUM_QUALITY and SRC_OLD_SINC_BEST_QUALITY. The old versions will be removed once the new versions have been fully validated.

So far, the new converters seem to have significantly improved signal to noise ratio as can be seen from the following to spectrograms (using the methodology described here). It should be obvious from these plots that the new versions of the converters have significantly less artifacts (the purple and blue bits) than the old converters.


[Sweep test for old mid quality converter]


[Sweep test for new mid quality converter]


[Sweep test for old high quality converter]


[Sweep test for new high quality converter]

Obviously, conversion quality is not the only criterion to evaluate sample rate converters; conversion speed can also be important in some situations. In my preliminary testing, the updated Best SINC converter runs up to 25% slower than the old one. The new best converter also uses significantly more memory than the old one. Storage of filter coefficients has gone up by a factor of 20, which is now over a megabyte for best quality converter alone.

In the tables below I've listed the SNR, throughput speeds and bandwidths as measured by the test suite (the snr_bw_test and throughput_test programs) distributed with the code for a couple of different CPU types.

1.1 GHz Intel Pentium M (32 bit) with 2048 KB cache

Converter Name SNR Throughput Bandwidth
SRC_OLD_SINC_MEDIUM_QUALITY
97.46 dB
648800 samples/sec
90.68 %
SRC_SINC_MEDIUM_QUALITY
121.33 dB
593673 samples/sec
90.55 %
SRC_OLD_SINC_BEST_QUALITY
97.35 dB
223025 samples/sec
96.96 %
SRC_SINC_BEST_QUALITY
145.68 dB
163735 samples/sec
96.08 %

1.8 GHz AMD Opteron 265 (64 bit) with 1024 KB cache

Converter Name SNR Throughput Bandwidth
SRC_OLD_SINC_MEDIUM_QUALITY
97.46 dB
1088447 samples/sec
90.68 %
SRC_SINC_MEDIUM_QUALITY
121.33 dB
1088447 samples/sec
90.55 %
SRC_OLD_SINC_BEST_QUALITY
97.35 dB
179116 samples/sec
96.96 %
SRC_SINC_BEST_QUALITY
145.68 dB
187755 samples/sec
96.08 %

1.86GHz Intel Core Duo (32 bit) with 2048 KB cache

Converter Name SNR Throughput Bandwidth
SRC_OLD_SINC_MEDIUM_QUALITY
97.46 dB
1167840 samples/sec
90.68 %
SRC_SINC_MEDIUM_QUALITY
121.33 dB
1042334 samples/sec
90.55 %
SRC_OLD_SINC_BEST_QUALITY
97.35 dB
395102 samples/sec
96.96 %
SRC_SINC_BEST_QUALITY
145.68 dB
302773 samples/sec
96.08 %

A pre-release containing these updated converters is available for download here. Once they have been tested a little more widely I intend to replace the old versions of the converters with the new, higher specification ones.

Anybody who wants to discuss this further should join the SRC mailing list and discuss it there.

Finally, once a version of Secret Rabbit Code with these new converters has been officially released I can get back to the new converter algorithm which should at least match the what I have here in terms of quality but run significantly faster and use at least an order of magnitude less RAM.

Posted at: 14:50 | Category: CodeHacking/SecretRabbitCode | Permalink

Sun, 24 Feb 2008

Functional Programming and Testing.

I read quite a lot of programming related blogs, but its rare for me to find one as muddle headed as this one titled "Quality Begs for Object-Orientation" on the O'Reilly network.

The author, Michael Feathers, starts the post by mentioning that he is dabbling in Ocaml and then makes the assertion that:

"I think that most functional programming languages are fundamentally broken with respect to the software lifecycle."

Now I'm not too sure why he brings up software lifecycle, because all he talks about is testing. However, he does give an example in Java involving testing and wraps up his post by saying that his Java solution is difficult to do in Ocaml, Haskell and Erlang.

Feathers gets two things wrong. Firstly he seems to be writing Java code using Ocaml's syntax and then complains that Ocaml is not enough like Java. His conclusion is hardly surprising. Ocaml is simply not designed for writing Java-like object oriented code.

The second problem is his claim that testing in functional languages is more difficult than with Java. While this may be true when writing Java code with Ocaml's syntax, it is not true for the more general case of writing idiomatic Ocaml or functional code.

So lets look at the testing of Object Oriented code in comparison to Functional code.

With the object orientated approach, a bunch of data fields are bundled up together in an object and methods defined some of which may mutate the state of the object's data fields. When testing objects with mutable fields, its important to test that the state transitions are correct under mutation.

By way of contrast, when doing functional programming, one attempts to write pure functions; functions which have no internal state and where outputs depend only on inputs and constants.

The really nice thing about pure functions is that they are so easy to test. The absence of internal state means that there are no state transitions to test. The only testing left is to collect a bunch of inputs that test for all the boundary conditions, pass each through the function under test and validate the output.

Since testing pure functions is easier that testing objects with mutable state, I would suggest that assuring quality using automated testing is easier for functional code than for object oriented code. This conclusion directly contradicts the title of Feathers' blog post: "Quality Begs for Object-Orientation".

The lesson to be learned here is that if anyone with a purely Java background wants to learn Ocaml or any other functional language, they have to be prepared for a rather large paradigm shift. Old habits and ways of thinking need to be discarded. For Ocaml, that means ignoring Ocaml's object oriented and imperative programming features for as long as possible and attempting to write nothing but pure stateless functions.

Update : 2008-02-26 17:04

Conrad Parker posted this to to reddit and the ensuing discussion was quite interesting.

Posted at: 23:26 | Category: CodeHacking/Ocaml | Permalink

Sat, 24 Nov 2007

Ocaml Snippet : Sqlite3.

One of the really nice things about using Ocaml on Debian and Ubuntu is the large number of really well packaged third party libraries.

Most of these libraries are also well documented from doc strings extracted from the source code files using ocamldoc. However, the documentation for most ocaml libraries is purely reference documentation and its not always obvious how to use the library simply from reading the reference docs. What's really needed is example code to be read in conjunction with the reference docs.

I'm working on a program where I needed a small, fast easy to administer database. With those requitements, Sqlite is really hard to beat and best of all, someone has already written Ocaml bindings. On Debian or Ubuntu, the Ocaml Sqlite bindings can be installed using:


  sudo apt-get install libsqlite3-ocaml-dev

In order to get a feel for using it and take my first steps into the world of SQL (which I'd had very minimal exposure to before now), I wrote a small program to test out the features provided by the library.

The following stand alone program should be taken as an example of how to access a Sqlite database from Ocaml. Since I am not an SQL expert, the actual SQL usage should be taken with a grain of salt.


  exception E of string

  let create_tables db =
      (* Create two tables in the database. *)
      let tables =
      [    "people", "pkey INTEGER PRIMARY KEY, first TEXT, last TEXT, age INTEGER" ;
          "cars", "pkey INTEGER PRIMARY KEY, make TEXT, model TEXT" ;
          ]
      in
      let make_table (name, layout) =
          let stmt = Printf.sprintf "CREATE TABLE %s (%s);" name layout in
          match Sqlite3.exec db stmt with
          |    Sqlite3.Rc.OK -> Printf.printf "Table '%s' created.\n" name
          |    x -> raise (E (Sqlite3.Rc.to_string x))
      in
      List.iter make_table tables


  let insert_data db =
      (* Insert data in both the tables. *)
      let people_data =
      [    "John", "Smith", 23;
          "Helen", "Jones", 29 ;
          "Adam", "Von Schmitt", 32 ;
          ]
      in
      let car_data =
      [    "bugatti", "veyron" ;
          "porsche", "911" ;
          ]
      in
      let insert_people (first, last, age) =
          (* Use NULL for primary key and Sqlite will generate a unique key. *)
          let stmt = Printf.sprintf "INSERT INTO people values (NULL, '%s', '%s', %d);"
                                     first last age
          in
          match Sqlite3.exec db stmt with
          |    Sqlite3.Rc.OK -> ()
          |    x -> raise (E (Sqlite3.Rc.to_string x))
      in
      let insert_car (make, model) =
          let stmt = Printf.sprintf "INSERT INTO cars values (NULL, '%s', '%s');"
                                     make model
		  in
          match Sqlite3.exec db stmt with
          |    Sqlite3.Rc.OK -> ()
          |    x -> raise (E (Sqlite3.Rc.to_string x))
      in
      List.iter insert_people people_data ;
      List.iter insert_car car_data ;
      print_endline "Data inserted."


  let list_tables db =
      (* List the table names of the given database. *)
      let lister row headers =
          Printf.printf "    %s : '%s'\n" headers.(0) row.(0)
      in
      print_endline "Tables :" ;
      let code = Sqlite3.exec_not_null db ~cb:lister
                          "SELECT name FROM sqlite_master;"
      in
      (    match code with
          |    Sqlite3.Rc.OK -> ()
          |    x -> raise (E (Sqlite3.Rc.to_string x))
          ) ;
      print_endline "------------------------------------------------"


  let search_callback db =
      (* Perform a simple search using a callback. *)
      let print_headers = ref true in
      let lister row headers =
          if !print_headers then
          (    Array.iter (fun s -> Printf.printf "  %-12s" s) headers ;
              print_newline () ;
              print_headers := false
              ) ;
          Array.iter (Printf.printf "  %-12s") row ;
          print_newline ()
      in
      print_endline "People under 30 years of age :" ;
      let code = Sqlite3.exec_not_null db ~cb:lister
                                 "SELECT * FROM people WHERE age < 30;"
      in
      match code with
      |    Sqlite3.Rc.OK -> ()
      |    x -> raise (E (Sqlite3.Rc.to_string x))



  let search_iterator db =
      (* Perform a simple search. *)
      let str_of_rc rc =
          match rc with
          |    Sqlite3.Data.NONE -> "none"
          |    Sqlite3.Data.NULL -> "null"
          |    Sqlite3.Data.INT i -> Int64.to_string i
          |    Sqlite3.Data.FLOAT f -> string_of_float f
          |    Sqlite3.Data.TEXT s -> s
          |    Sqlite3.Data.BLOB _ -> "blob"
      in
      let dump_output s =
          Printf.printf "  Row   Col   ColName    Type       Value\n%!"  ;
          let row = ref 0 in
          while Sqlite3.step s = Sqlite3.Rc.ROW do
              for col = 0 to Sqlite3.data_count s - 1 do
                  let type_name = Sqlite3.column_decltype s col in
                  let val_str = str_of_rc (Sqlite3.column s col) in
                  let col_name = Sqlite3.column_name s col in
                  Printf.printf "  %2d  %4d    %-10s %-8s   %s\n%!"
                                 !row col col_name type_name val_str ;
                  done ;
              row := succ !row ;
              done
      in
      print_endline "People over 25 years of age :" ;
      let stmt = Sqlite3.prepare db "SELECT * FROM people WHERE age > 25;" in
      dump_output stmt    ;
      match Sqlite3.finalize stmt with
      |    Sqlite3.Rc.OK -> ()
      |    x -> raise (E (Sqlite3.Rc.to_string x))


  let update db =
      print_endline "Helen Jones has just turned 30, so update table." ;
      print_endline "Should now only be one person under 30." ;
      let stmt = "UPDATE people SET age = 30 WHERE " ^
                      "first = 'Helen' AND last = 'Jones';"
      in
      (    match Sqlite3.exec db stmt with
          |    Sqlite3.Rc.OK -> ()
          |    x -> raise (E (Sqlite3.Rc.to_string x))
          ) ;
      search_callback db


  let delete_from db =
      print_endline "Bugattis are too expensive, so drop that entry." ;
      let stmt = "DELETE FROM cars WHERE make = 'bugatti';" in
      match Sqlite3.exec db stmt with
      |    Sqlite3.Rc.OK -> ()
      |    x -> raise (E (Sqlite3.Rc.to_string x))


  let play_with_database db =
      print_endline "" ;
      create_tables db ;
      print_endline "------------------------------------------------" ;
      list_tables db ;
      insert_data db ;
      print_endline "------------------------------------------------" ;
      search_callback db ;
      print_endline "------------------------------------------------" ;
      search_iterator db ;
      print_endline "------------------------------------------------" ;
      update db ;
      print_endline "------------------------------------------------" ;
      delete_from db ;
      print_endline "------------------------------------------------"


  (* Program main. *)

  let () =
      (* The database is called test.db. Delete it if it already exists. *)
      let db_filename = "test.db" in
      (    try Unix.unlink db_filename
          with _ -> ()
          ) ;

      (* Create a new database. *)
      let db = Sqlite3.db_open db_filename in

      play_with_database db ;

      (* Close database when done. *)
      if Sqlite3.db_close db then print_endline "All done.\n"
      else print_endline "Cannot close database.\n"

The above code can be run as a script using:


  ocaml -I +sqlite3 sqlite3.cma unix.cma sqlite_test.ml

or compiled to a native binary using:


  ocamlopt -I +sqlite3 sqlite3.cmxa unix.cmxa sqlite_test.ml -o sqlite_test

When run, the output should look like this:


  Table 'people' created.
  Table 'cars' created.
  ------------------------------------------------
  Tables :
      name : 'people'
      name : 'cars'
  ------------------------------------------------
  Data inserted.
  ------------------------------------------------
  People under 30 years of age :
    pkey          first         last          age
    1             John          Smith         23
    2             Helen         Jones         29
  ------------------------------------------------
  People over 25 years of age :
    Row   Col   ColName    Type       Value
     0     0    pkey       INTEGER    2
     0     1    first      TEXT       Helen
     0     2    last       TEXT       Jones
     0     3    age        INTEGER    29
     1     0    pkey       INTEGER    3
     1     1    first      TEXT       Adam
     1     2    last       TEXT       Von Schmitt
     1     3    age        INTEGER    32
  ------------------------------------------------
  Helen Jones has just turned 30, so update table.
  Should now only be one person under 30.
  People under 30 years of age :
    pkey          first         last          age
    1             John          Smith         23
  ------------------------------------------------
  Bugattis are too expensive, so drop that entry.
  ------------------------------------------------
  All done.

Posted at: 14:20 | Category: CodeHacking/Ocaml | Permalink

Thu, 13 Sep 2007

GNU gcc and -Wmissing-prototypes.

Many people who code in C consider warning messages optional or if they do enable warnings, use gcc's -Wall warning flag and leave it at that. However, there are a number of problems that gcc can warn about but doesn't unless it is specifically told to do so.

For example, consider a rather trivial example consisting of a main program file (main.c) like this:


  #include <stdio.h>
  
  #include "other.h"
  
  int
  main (void)
  {
      printf ("two cubed : %f\n", int_power (2.0, 3)) ;
      return 0 ;
  }

a second C file (other.c) like this:

  double
  int_power (int pow, double value)
  {
      double output = value ;
  
      for ( ; pow > 1 ; pow --)
          output *= value ;
  
      return output ;
  }

and the header file for the above C file (other.h) like this:

  double int_power (double value, int pow) ;

Simple.

Compiling this code at the command line can be done like this:


  gcc -Wall -Wextra main.c other.c -o program

which gives no warnings. However, when the resulting executable is run, it gives an obviously wrong result:


  two cubed : 0.000000

What the ..... ?

Looking at the code to this rather trivial example, its pretty easy to figure out that the error is caused by the main program and the implementation of the function int_power disagreeing on the order of the two parameters.

In a more complicated real world situation, this can lead to seriously difficult to debug problems. The solution of course is to add the -Wmissing-prototype flag to the gcc command line:


  gcc -Wall -Wextra -Wmissing-prototypes main.c other.c -o program

Now the compiler gives us a warning message:


  other.c:3: warning: no previous prototype for 'int_power'

To get rid of this warning, the file other.c should include other.h. When we do that, we get a compile error telling us that there is a conflict between the function implementation in other.c and the function prototype in other.h:


  other.c:6: error: conflicting types for 'int_power'
  other.h:1: error: previous declaration of 'int_power' was here

The fix of course is to make the implementation of int_power in other.c match the function prototype. Once that is done, the program compiles and even gives the correct result.

But we're not quite done yet. The behavior of the original broken code is slightly different when compiled with a C++ compiler. Compiling with g++:


  g++ -Wall -Wextra main.c other.c -o program

results in an error message:


  /tmp/cccTLc2H.o: In function `main':
  main.c:(.text+0x23): undefined reference to `int_power(double, int)'
  collect2: ld returned 1 exit status

So how does the C++ compiler know that something is wrong here when the C compiler didn't?

The most important thing to notice is that the error is produced by the linker. Secondly, one needs to remember that C++ (unlike C) allows function name overloading; that is, two (or more) functions can have the same name as long as they all have a unique (ordered) set of function argument types.

In the case above, the C++ linker (which may be the same as the C linker but behaves differently when linking C++ object files) knows the function called from main.c takes two parameters, a double followed by an int. However, the file other.c has a function of the same name, but with the order of the parameters reversed and hence can't be used. Since there is no other function of that name the linker gives an error.

Interestingly, the C++ compiler does not accept the -Wmissing-prototypes warning flag. Personally, I think it should, because obvious warnings from the parser stage of the compiler are an order of magnitude better than obscure error messages from the linker.

Finally, some C++ fan-boys might give this as an example of why C++ is a safer language than C. The question I would ask of those people is, "if you are so concerned with programming safety, why are you using C++ instead of Ocaml or Haskell?". I would also suggest that using a good C compiler like GNU gcc with every warning message you can find turned on is just as safe as running the same code through a C++ compiler.

Posted at: 07:03 | Category: CodeHacking | Permalink

Mon, 30 Jul 2007

A Simple Introduction to Parsing with Flex and Bison.

On Friday night I gave a presentation at SLUG with title above. Unfortunately the SLUG video recording people weren't there on the night so no video was captured. I am however making the slides and code available for download here. The code examples demonstrate a simple email date header parser written in both C and Ocaml. The C code is in five different stages so people can see how the parser was developed.

If anyone has any questions about the code, or more generally with the techniques of parsing, I'd be happy to discuss them on the SLUG coders mailing list.

Posted at: 20:35 | Category: CodeHacking | Permalink

Sun, 08 Apr 2007

Horses For Courses.

A race horse

In my day job, I work with a hardware engineer named Joe. A couple of months ago, he had to do some C coding to talk to a serial port and came to me for pointers. He was basically on the right track, but was using too many global variables, not checking the return values of system calls etc. These weren't horrible problems, but I explained why practices like this can lead to problems later, showed him better solutions to the same problems and introduced him to the gcc warning flags and Valgrind. He was very grateful for my help and was a quick to pick up all the tips I'd given him.

More recently Joe came to me with another programming problem; he had to parse some numerical data out of a plain text log file. He already had about 60 lines of C code that opened a file based on a hard coded filename and he was starting to fiddle around with the fgetc function but was a little stuck on how to go further.

I had a look at his code and since Joe's a nice Irish chap, his predicament brought to mind a joke I once heard:

It is said that there was once an English motorist in Ireland who stopped his car to ask the way to Kilkenny. "Sure and to goodness," replied the Irishman., "If I wanted to go to Kilkenny, I wouldn't be starting from here."

The problem is that C is not a very good language for parsing text data. I told Joe that writing his log file parser in C could certainly be done, but that it would be painful, time consuming and error prone in comparison to other programming languages. So I told him that for this particular task Python would be a much better fit and asked him if he'd like me to teach him the basics of Python. Joe's no dummy; he agreed without hesitation.

First up I showed him the Python Tutorial, the Python Module index and how to used Google Groups advanced search to find Python specific answers from the comp.lang.python Usenet group. I then showed him the basic hello world program in Python:


  #!/usr/bin/python

  print "Hello world!"

Over the next hour we built up a good portion of his program. We used used the sys module to get the file name of the log file to parse, the built in Python file handling functions and the regular expression module. We even used a list comprehension to remove outliers from his data set.

In the end we had about 30 lines of Python code that was very much closer to Joe's end goal than his original 60 lines of C code. Joe was really, really impressed with how easy Python was in comparison to C. It was at this point that I warned Joe about the Blub Paradox. He was well aware that when he only knew C, C was his first choice for this programming task. However, now that he knows Python as well, he'll be able to pick between C and Python depending on the task. I also told him that many Python programmers see Python as the ultimate programming language and are really Blub programmers even if they don't know it.

In my own programming I'm currently using:

So, with the above languages at my disposal, I can match a programming language to the task at hand. For numerical and mathematical programming I use Ocaml, for low level programming I use C and from now on, for multi-threaded and concurrent programming I will chose Erlang.

More importantly, correctly matching the language to the problem should make the task of developing a solution to the problem far easier than using an inappropriate language.

Posted at: 20:16 | Category: CodeHacking | Permalink

Wed, 04 Apr 2007

Learning Erlang.

The decision has been made, I'm going to learn the Erlang programming language. The main reason for this decision is that Erlang does one thing better than any other programming language I am aware of; parallel, concurrent and distributed processing.

The big problem with parallel and concurrent processing in other languages is that the standard method of communication between threads in most languages is shared data protected by mutexes or semaphores which are difficult to get right when there are a lot of threads or a lot of data to be protected. The standard solution to the problems of dealing with parallelism simply doesn't scale well.

Erlang excels at parallel processing because it forgoes the use of semaphores, mutexes and other synchronisation primitives. It replaces these shared data synchronisation methods with message passing; a much simpler mechanism which is much easier to reason about and much harder, maybe even impossible, to get wrong.

When learning, a new language, my usual approach is to write lots of small demo programs, with each one demonstrating a different feature. These programs come in really useful later as an easy to reference catalogue of language features.

Here's my first complete Erlang program, which takes any number of integer parameters on the command line and prints the factorial of each one:


  #!/usr/bin/env escript

  -export ([main/1]).

  % Naive factorial function.
  fac (0) -> 1 ;
  fac (N) when N > 0 -> N * fac (N - 1).

  % Function to print the factorial of each list element.
  print_fact_list ([]) -> ok ;

  print_fact_list ([Head | Tail]) ->
      % Convert the Head from a string to an int.
      Int = list_to_integer (Head),
      % Calculate the factorial.
      Fact = fac (Int),
      % Print the result.
      io:format ("fac ~w : ~w~n", [Int, Fact]),
      % Call the function recursively with the tail of the list.
      print_fact_list (Tail).

  % Main function, accepts a list of strings contain argv [1], argv [2] etc.
  main (List) ->
      case length (List) of
          0 -> io:format ("Usage : factorial.erl <number>\n") ;
          _ -> print_fact_list (List)
      end.

To me, this Erlang code looks a little like Ocaml and a little like Prolog which I used briefly at university over a decade ago. A couple of things to note:

To run this program requires Erlang, which on Debian and Ubuntu means the packages erlang, erlang-base, erlang-dev and erlang-manpages. It also uses escript, which comes standard with Erlang R11b4 and can be obtained here for earlier versions (Ubuntu Feisty has R11b2). Escript allows Erlang code to be run as a script, just like Python or Ruby.

The output of this program when passed the numbers 10, 20 and 30 results in the following output:


  fac 10 : 3628800
  fac 20 : 2432902008176640000
  fac 30 : 265252859812191058636308480000000

Yep, Erlang uses arbitrary precision integers by default. Thats pretty cool.

Posted at: 23:07 | Category: CodeHacking/Erlang | Permalink

Wed, 28 Mar 2007

Tridge Was Right.

At Linux.conf.au 2005, Tridge gave a keynote talk about some of the issues the Samba team had run into when designing Samba4. While discussing the problems of writing a complex server which has to serve multiple simultaneous requests he put up a series of three slides. The first said:


Threads suck!

Having used OS level threads in the past, I was in complete agreement with this. The problems of sharing data across threads and locking/unlocking of that data to make sure the accesses are safe is simply too difficult for mere mortals to get right in anything other than trivial cases.

Tridges' second slide said:


Processes suck!

Splitting multi threaded code into multiple processes fixes the locking problems by removing the ability of the processes to share data (ignoring IPC shared memory of course). Obviously for a server program like Samba, this is not a solution.

The third slide in the series said:


State machines suck!

At the time of Tridge's keynote, I didn't really appreciate what he was saying.

The idea is really quite simple; everything is done in a single process so no locking is required. All I/O is multiplexed using the Unix select system call and a state machine keeps track of state of all of the I/O channels.

The problem with this is that any blocking I/O operation must be replaced with a non-blocking operation. Failure to do this will mean that a single I/O call that blocks will prevent the servicing of all other I/O operations until the blocked operation decides to complete and return control to the state machine.

However, the state machine model does work relatively well for simple examples. Unfortunately, non-blocking I/O leads to a second problem; writing code to do non-blocking I/O is significantly more difficult than for regular blocking I/O.

In my day job I've been working on some C++ classes which talk to a web server using HTTP POST operations over a keep-alive connection. This code had a couple of requirements:

I now have code that fits these requirements and a pretty comprehensive test suite. With this experience behind me I have to say that getting this working was a royal pain in the neck. I also agree with Tridge; state machines suck almost as much as threads.

Maybe its time for me to learn Erlang.

Posted at: 22:36 | Category: CodeHacking | Permalink

Thu, 22 Mar 2007

Lazy Lists.

Lazy evaluation is a default feature of the Haskell programming language and an optional feature of Ocaml. Most programming languages (Ocaml, C, C++, Perl, Python, Java etc) use eager evaluation; where a result specified by a line of code is calculated as soon as the program gets to that line. Lazy evaluation on the other hand, defers the calculation of a result until that result is needed.

The real beauty of lazy evaluation is that a result that is never used is never evaluated. Lazy evaluation also allows the specification of lists which are effectively infinite, as long as the programmer doesn't actually try to access every element in the list. Obviously, attempting to do so would take infinite time and and require infinite memory to actually hold the list :-).

While searching for information on Ocaml's lazy programming features I came across a post at the enchanted mind blog. That post is ok, but the code is just snippets and when put together as it is, doesn't actually work.

After a bit of fiddling around, I managed to get it working. However, once I understood it, I didn't think the example was as good as it could be. Firstly, the input to the lazy list is just a standard finite length Ocaml list, but more importantly it doesn't give any idea of how to do a potentially infinite list which is a much more interesting case.

That left the field open for a nice blog post demonstrating lazy lists in Ocaml. Read on.

Anybody who has done high school or higher mathematics would probably have come across recurrence relations the most well know of which is the Fibonacci sequence.

The Fibonacci sequence is often used as example for teaching the concept of recursion in computer science (even if some people think there are better examples). The Fibonacci sequence can be expressed recursively in Ocaml like this:


  let rec fibonacci n =
      match n with
      |    1 -> 1
      |    2 -> 1
      |    x -> (fibonacci (n - 1)) + (fibonacci (n - 2))

If one wanted to generate a list containing say the first 20 Fibonacci numbers using the above recursive function, the 19th number in the sequence would be calculated twice, the 18th number three times so on. Its simply not efficient.

A better solution is to use a lazy list, which calculates new values of the sequence as they are needed, based on entries already in the list. Here's an example that creates a lazy list of the fibonacci numbers:


  type lazy_fib_t =
      Node of int * lazy_fib_t Lazy.t

  let create_fib_list () =
      let rec fib_n minus_2 minus_1 =
          let n = minus_1 + minus_2 in
          Printf.printf "fib_n %d %d -> %d\n" minus_2 minus_1 n ;
          Node (n, lazy (fib_n minus_1 n))
      in
      lazy (Node (1, lazy (Node (1, lazy (fib_n 1 1)))))

  let print_fib_list depth lst =
      let rec sub_print current remaining =
          if current > depth then ()
          else
          match Lazy.force remaining with
          |    Node (head, tail) ->
                  Printf.printf "%3d : %d\n" current head ;
                  sub_print (current + 1) tail
      in
      sub_print 0 lst

  let _ =
      let fib_list = create_fib_list () in
      print_fib_list 4 fib_list ;
      print_endline "------------" ;
      print_fib_list 6 fib_list ;

This is a complete working Ocaml program. To run it, just save the text to a file, say "lazy_fib.ml" and then do:


  ocaml lazy_fib.ml

We'll look at the output in detail later. First lets break it down; looking at the program, from top to bottom we have:


  type lazy_fib_t =
      Node of int * lazy_fib_t Lazy.t

The above two lines define a recursive type called lazy_fib_t, which has a single variant called Node which contains a tuple of an integer and the head of a lazy list.


  let create_fib_list () =
      let rec fib_n minus_2 minus_1 =
          let n = minus_1 + minus_2 in
          Printf.printf "fib_n %d %d -> %d\n" minus_2 minus_1 n ;
          Node (n, lazy (fib_n minus_1 n))
      in
      lazy (Node (1, lazy (Node (1, lazy (fib_n 1 1)))))

The function above, create_fib_list, creates a lazy list. It also contains an internal function, fib_n, which we'll look at later. The last line of the function is where all the magic is; it creates three nodes of a lazy list, the first two containing the first two integers of the Fibonacci sequence and a third node which is a closure, containing a call to the internal function fib_n with the correct parameters to generate the next number in the sequence.

The internal function fib_n takes two parameters, the values of the sequence for n - 1 and n - 2. From these two values, it generates the value for n, prints a message and then constructs a new Node containing the value for n and a lazy evaluation for the next value.

The next function is the function which prints the first n elements of a lazy list. It looks like this:


  let print_fib_list depth lst =
      let rec sub_print current remaining =
          if current > depth then ()
          else
          match Lazy.force remaining with
          |    Node (head, tail) ->
                  Printf.printf "%3d : %d\n" current head ;
                  sub_print (current + 1) tail
      in
      sub_print 0 lst

The print_fib_list function contains an internal function sub_print which is called with a current depth of zero and the head of the lazy list to be printed. The internal function recursively moves down the list until current is greater than depth, which cause the recursion to complete and unwind.

At each node of the lazy list where current is less than or equal to depth, the function forces the evaluation of the node. The forcing will only evaluate a node if it hasn't already been evaluated. Once the node has been force evaluated, the value is printed and the function is called recursively.

Finally, the main function of the program is this:


  let _ =
      let fib_list = create_fib_list () in
      print_fib_list 4 fib_list ;
      print_endline "------------" ;
      print_fib_list 6 fib_list ;

All it does is call the function create_fib_list, and then print the first four Fibonacci numbers of the list, prints a dashed line and then prints the first six Fibonacci numbers of the list. Its important to note that the print function is called with the same list on both occasions.

When the program is run, the output should look like this:


    0 : 1
    1 : 1
  fib_n 1 1 -> 2
    2 : 2
  fib_n 1 2 -> 3
    3 : 3
  fib_n 2 3 -> 5
    4 : 5
  ------------
    0 : 1
    1 : 1
    2 : 2
    3 : 3
    4 : 5
  fib_n 3 5 -> 8
    5 : 8
  fib_n 5 8 -> 13
    6 : 13

As can be seen above, the first time the print function is called, the fib_n closure is called for all values of n greater than one. Each time fib_n is called a new node is generated in the list. When the print function is called the second time, it fib_n is only called for values that weren't evaluated on the first call to the print function just as was expected.

One of the few problems with the above implementation is that it uses integers which in Ocaml on 32 bit CPU platforms is only a 31 bit integer. It would however be relatively easy to use Ocaml's Big_int module which provides arbitrary length integers.

Posted at: 21:43 | Category: CodeHacking/Ocaml | Permalink

Wed, 21 Mar 2007

Xtreme Numerical Accuracy.

I'm working on a digital filter design program in Ocaml which was suffering from some numerical issues with Ocaml's native 64 bit floats. The problem was that the algorithm operates on both large floating point numbers and small floating point numbers. These numbers eventually end up in a matrix, and I then use Gaussian elimination to solve a set of simultaneous equations.

Anyone who has done any numerical computation will know that adding large floating point numbers to small floating numbers is a recipe for numerical inaccuracy. For me, the numerical issues were screwing things up badly.

When faced with a problem like this there are two possible solutions:

The first option, doing all the computations symbolically was not practical due to the complexity of the computation. That left only the second option.

Looking around for what was available for Ocaml, I found the contfrac project on Sourceforge. As all the math geeks (hi Mark) have probably guessed by now, contfrac expresses numbers in terms of a really cool mathematical concept called continued fractions.

The idea is that any number can be represented by a (potentially infinite) list of integers [ a0 ; a1, a2, a3, ...]. Given the list of integers, the number itself can be calculated using:

equation

All rational numbers have a finite length continued fraction expansion. For example, the rational number 75/99 is expressed as [ 0 ; 1, 3, 8 ].

Not surprisingly, all the irrational numbers have infinite length continued fraction expansions. The surprising thing (for me at least) is that many of the irrational numbers have CF expansions that are surprisingly regular. The square root of two is expressed as [ 1 ; 2, 2, 2, ...] with an infinitely repeating list of 2s. The natural logarithm e is expressed as [ 2 ; 1, 2, 1, 1, 4, 1, 1, 6, ...] which again has a regular pattern, as does the golden ratio, [ 1 ; 1, 1, 1, ...]. While all the previous CF expansions have a degree of regularity, the expansion of pi, is [ 3 ; 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 2, 1, 1, 2, 2, 2, 2, 1, 84, 2, 1, 1, 15, 3, 13,...], which looks completely random.

With numbers expressed as continued fractions, the Ocaml contfrac module then implements addition, subtraction, multiplication and division. Once the four arithmetic operations are defined, contfrac then implements a number of trigonometric and transcendental functions in terms of the same continued fractions.

Unfortunately, the module doesn't implement everything I need so I'm going to have to hack on some extra functionality. The actual Ocaml implementation uses Ocaml's lazy lists which is an aspect of Ocaml I hadn't played with yet. Time for some fiddling with lazy lists.

Posted at: 20:49 | Category: CodeHacking/Ocaml | Permalink

Wed, 14 Feb 2007

GNU gcc Stack Protection.

Wow, this is new. Version 4.1 of GNU gcc compiler shipped with Ubuntu Feisty includes stack smashing protection by default!

Consider the following code containing a buffer overflow of a stack based buffer :


    #include <stdio.h>

    static void
    kill_my_stack (void)
    {
        char buffer [10] ;
        int k ;

        for (k = 0 ; k < 20 ; k++)
            buffer [k] = 'a' + k ;
    } /* kill_my_stack */

    int
    main (void)
    {
        kill_my_stack () ;
        return 0 ;
    } /* main */

Compiling this with the default gcc compiler in Feisty produces an executable which when run gives the following error:


    *** stack smashing detected ***: /home/erikd/stack-protect-demo terminated
    Aborted

Obviously, for an error as simple as this even basic static analysis should find it, but we know that the vast majority of people don't use static analysis. In fact many don't even compile with a sensible set of compiler flags turned on. Well now, those people are protected from themselves.

Posted at: 19:13 | Category: CodeHacking | Permalink

Thu, 01 Feb 2007

Spectrogram Fun!

Inspired by the spectrograms used in the SRC Comparison I decided to write a program that generates similar spectrograms from any given sound file. The program is now basically working and when run over a full song (the song "Vehicle" by the band "Golden Section") it produced this, which I think is quite beautiful:


[Secret Rabbit Code sweep test]

The program is written in C and uses libsndfile (of course) for reading the sound file, FFTW for generating the spectrum data and the wonderful Cairo library for the image generation back-end.

I intend to release the code for this under the GPL as soon as I can clean it up a bit, add handling for multi-channel files and improve the command line option handling.

Posted at: 22:03 | Category: CodeHacking | Permalink

Tue, 30 Jan 2007

SRC Comparison.

One of my Free Software projects is Secret Rabbit Code, aka libsamplerate, aka the Rabbit, a library for performing sample rate conversion (Wikipedia) on audio signals. Recently, a company in Canada did a comparison of a number sample rate converters in professional audio software and also included the Rabbit in that test.

The tests were carried out by generating a input signal at a sampling rate of 96 kHz, configuring each sample rate converter to to do a conversion from 96 kHz input sample rate to 44.1 kHz output sample rate and passing the input signal through each converter and capturing each converter's output. The input test signal was a sine wave which sweeps from a low frequency of about 100 Hz at the start to a frequency of 44.1 kHz at the end. Finally, a spectrogram is then generated from each output signal.

The spectrogram of the output of Secret Rabbit Code's Best Sinc converter looks like this:


[Secret Rabbit Code sweep test] [Color key]

The spectrogram shows time in seconds along the x-axis and frequency in Hertz along the y-axis. The colour indicates the signal strength at each point in time and frequency, with white being the strongest signal (0 decibels) and black being the weakest signal (-180 decibels).

The tricky thing about the sample rate conversion process is that for any given sample rate fs, the highest frequency signal that can be correctly represented is at fs/2. When sample rate converting from 96 kHz to 44.1 kHz, all frequencies above half of the destination sample rate must be removed during the conversion process. Failure to do so will result in audio distortion and noise in the output signal.

Looking at the spectrogram of the Rabbit's output, its easy to see that the the main sweep (in bright white) clearly goes from some low frequency at the start to 22.05 kHz (half of the output sample rate) at 5 seconds. After about 5 seconds, the input signal's sine wave frequency goes above half the destination sample rate and the Rabbit does the correct thing and almost completely removes it.

The rest of the colour in the spectrogram is an artifact of the conversion process but by referencing the colour scale, its possible to confirm that all of these artifacts are 100 decibels below the level of the main signal. Ideally they shouldn't be there at all, but if they are the should be as low as possible.

Anyone who has read this far can now go to the comparison page pick any two converters and compare them. They can also confirm for themselves that although the Rabbit (Best Sinc) wasn't the best converter among the ones tested (that award would have to go to r8brain and iZotope), it certainly didn't disgrace itself either. A number of the commercial converters in expensive software packages (like Sony Vegas and Digital Performer) didn't perform all that well in comparison.

The good news is that the existence of commercial closed source converters that are better than the Rabbit gives me some incentive to come up with a better converter for inclusion in the Rabbit.

Posted at: 23:18 | Category: CodeHacking/SecretRabbitCode | Permalink

Thu, 21 Dec 2006

The Size of 'cp' (Update).

André Pang read my blog post about the size of the compiled Haskell 'cp' executable and suggested that something was wrong. So, I looked at it again.

My laptop is running Ubuntu Edgy and for some reason Edgy installs version 6.4.2 of the Glasgow Haskell Compiler. I also have a desktop machine running Debian Testing which has version 6.6 of of ghc.

Sure enough, ghc 6.6 generates a 255 kilobyte executable which is a huge improvement over the 1.5 megabyte executable produced by version 6.4.2.

Posted at: 21:18 | Category: CodeHacking | Permalink

Tue, 19 Dec 2006

The Size of 'cp'.

Conrad Parker blogged recently showing some simple examples in Haskell. I've been wanting to learn Haskell for a while so I took special interest in Conrad's post. For instance, the program implementing the basic functionality of the Unix cp in Haskell is small and extremely elegant:


  import System.Environment

  main = do
      [infile, outfile] <- getArgs
      s <- readFile infile 
      writeFile outfile s

However, on my machine (i686 laptop running Ubuntu Edgy), the generated executable is 1.5 megabytes in size even after being stripped. By way of contrast, the /bin/cp executable written in C is 56 kilobytes. WTF?

So lets look at the Ocaml version:


  let _ =
      let srcfile = open_in Sys.argv.(1) in
      let destfile = open_out Sys.argv.(2) in
      let maxlen = 8192 in
      let str = String.create maxlen in
      let count = ref 1 in
      while !count > 0 do
          count := input srcfile str 0 maxlen ;
          output destfile str 0 !count ;
          done

This is pure imperative code and doesn't use any of the functional language features of Ocaml, but it compiles to a 79 kilobyte stripped executable. Compared to the C executable, the Ocaml executable is 40% bigger and the Haskell one is 2500% bigger.

Obviously, the size of the executable is not the only determining factor in choice of programming language, but Haskell's executables do seem unreasonably large.

Update here.

Posted at: 21:15 | Category: CodeHacking | Permalink

Thu, 16 Nov 2006

Non-recursive Automake.

A lot of people (yeah, you know who you are) bitch about Automake and the associated tools like autoconf and libtool. While I do agree that these tools do have problems and limitations, they are also a better soultion to the problem than any of the alternatives I have looked at.

The thing I really like about automake is that it does automatic dependency checking so that if the file foo.cc includes foo.h which includes bar.h which includes baz.h and baz.h changes, automake knows that foo.c needs to be recompiled. Manually keeping track of dependencies like these is a royal pain in the neck and getting it wrong can lead to really obscure Heisenbugs; for example, two C++ object files disagreeing on the parameter list of a method of a class.

I've had a number of projects that have used automake for years. However, all of these projects used the traditional recursive make scheme where there is a Makefile.am in each directory of the source tree. I continued to do it this way with automake even after reading Peter Miller's excellent paper Recursive Make Considered Harmful, but with hand written Makefiles, I usually took Peter's advice.

My first test of a non-recursive automake solution was for a project I'm doing at work. The project started out with a standard single top level non-recursive Makefile which handled the compiling of about 150 C++ source files which compiled to a couple of static convenience libraries, a main executable and a couple of test programs.

The big problem with the existing standard Makefile was that it didn't properly encode dependancies and hence I often had to do a "make clean" followed by a make to get the thing built correctly. Fixing this issue was the prime motivator for moving to automake.

One slightly unusual aspect of the project was the way project specific internal include files were referenced within the project. As a result of the project having been developed with a single top level Makefile from the beginning, all hash includes within the project are of the form:


  #include <path/to/header.h>

with "path" being a directory in the same top level directory as the Makefile. What this means is that no source file which includes a header from within the project should need any extra project internal include path other than "-I.". This means that the resulting compile lines produced by automake (and libtool if it is in the picture) are considerably shorter than they would have been otherwise.

So, what does the Makefile.am look like?

Here's an example non-recursive Makefile.am which is basically a stripped down version of the one I'm using on my project but with some extra comments. Anyone who has hacked a Makefile.am before should be able to understand what is going on.


  # Tell automake to put the object file for apple/apple.c in dir apple/
  AUTOMAKE_OPTIONS := subdir-objects

  # The installable executable.
  bin_PROGRAMS = apple/apple

  # Couple of python scripts used using build.
  EXTRA_DIST = apple/version_create.py apple/tests/test_wrapper.py

  # Convienience libraries required during build.
  noinst_LTLIBRARIES = lib/libcore.la apple/libapple.la

  # All the project related headers required for building.
  noinst_HEADER = $(libcore_includes) $(libapple_includes)

  # Test programs that will not be installed.
  noinst_PROGRAMS = apple/tests/skin_test lib/pip/test/pip_test

  # A couple of autogenerted header files.
  nodist_include_HEADERS = apple/version.h

  DISTCLEANFILES = apple/version.h

  #=========================================================
  # libcore : The core library routines.

  lib_libcore_includes = \
      lib/red.h lib/green.h lib/blue.h

  lib_libcore_la_SOURCES = $(lib_libcore_includes) \
      lib/red.cc lib/green.cc lib/blue.cc

  #=========================================================
  # libpip : All the pips.

  apple_pip_libpip_includes = \
      apple/pip/cat.h apple/pip/dog.h apple/pip/mouse.h

  apple_pip_libpip_la_SOURCES = $(apple_pip_libpip_includes) \
      apple/pip/cat.cc apple/pip/dog.cc apple/pip/mouse.cc

  #=========================================================
  # libapple : Everything in the application except main.cc.

  libapple_includes = \
      apple/granny.h apple/smith.h apple/johnathon.h

  libapple_la_SOURCES = $(libapple_includes) \
      apple/granny.cc apple/smith.cc apple/johnathon.cc

  #=========================================================
  # apple : The application.

  apple_apple_SOURCES = apple/main.cc apple/version.h

  apple_apple_LDADD = apple/libapple.la lib/libcore.la \
      apple/pip/libpip.la $(EXT_A_LIBS) $(EXT_B_LIBS)

  #=========================================================
  # Test programs.

  apple_tests_skin_test_SOURCES = apple/tests/skin_test.cc
  apple_tests_skin_test_LDADD = lib/libcore.la apple/libapple.la

  lib_pip_test_pip_test_SOURCES = lib/pip/test/pip_test.cc
  lib_pip_test_pip_test_LDADD = lib/libcore.la apple/pip/libpip.la \
      $(EXT_A_LIBS) $(EXT_B_LIBS)

  check : $(noinst_PROGRAMS)
      ./apple/tests/skin_test
      ./lib/pip/test/pip_test
      $(top_srcdir)/apple/tests/test_wrapper.py
      @echo
      @echo "All tests passed."
      @echo

  #=========================================================
  # Autogenerated files and their dependancies.

  apple/main.o : apple/version.h
  apple/version.h : this_file_does_not_exist
      $(top_srcdir)/apple/version_create.py $@

  .PHONY : this_file_does_not_exist

The thing that surprised me most about converting this project to automake was how easy it was and how well it worked. I also immediately noticed that the autogenerted non-recursive make seemed to run a lot faster than I was used to with recursive make, but that is one of the benefits mentioned in Peter's paper.

Since this was such a success I'm going to look into applying this to some of my other projects.

Posted at: 21:49 | Category: CodeHacking | Permalink

Tue, 14 Nov 2006

Autoconf and #ifdef Considered Harmful.

I recently got an email from someone suggesting that my example for detecting the presence of libsamplerate was somewhat problematic. The crux of the complaint is that my configure.ac snippet ends up setting the HAVE_SAMPLERATE variable to 0 or 1 and that most people tend to use something like this in their C or C++ code:


    #ifdef  HAVE_SAMPLERATE
         /* Some code which uses libsamplerate here. */
    #endif

Obviously with HAVE_SAMPLERATE defined to 0 or 1, this code is not going to work as the developer expected. Instead, they should be using something like:


    #if  HAVE_SAMPLERATE
         /* Some code which uses libsamplerate here. */
    #endif

I know that the former idiom is more common, but I chose the second method for a reason; I believe that it is more robust. The problem of course is that autoconf and its related tools are fragile, but the standard idiom certainly doesn't help.

Consider the following code (from a bug I found in XMMS ):


    #ifndef  WORDS_BIGENDIAN
         /* Some little endian machine specific code. */
    #endif

The bug resulted from an error in configure.ac where the author had forgotten to invoke the autoconf macro which sets the WORDS_BIGENDIAN variable. The result was that this code compiled perfectly and ran perfectly on little endian machines. On big endian machines, it compiled perfectly and under certain circumstances failed badly causing really horrible noises to come out of my headphones.

Now consider my version where WORDS_BIGENDIAN gets set to either 0 or 1. In this case, if the author forgot to invoke the autoconf macro, WORDS_BIGENDIAN would have been undefined and this code:


    #if  (WORDS_BIGENDIAN == 0)
         /* Some little endian machine specific code. */
    #endif

would have failed to compile on both big and little endian machines.

It also doesn't take a genius to see that this is a symptom of a larger and much more common problem. In fact my solution is a classic example of moving something up Rusty's spectrum of interface simplicity. The original code was at position 11 on the scale (follow common convention and you'll get it wrong) and my alternative is at position 1 on the scale (compiler/linker won't let you get it wrong).

Posted at: 20:27 | Category: CodeHacking | Permalink

Wed, 01 Nov 2006

Monads Made Easy (and Hard).

For some time I've been rather keen on learning the Haskell programming language. The big problem for me was that when I started out trying to solve particular problems I quickly ran into a strange catch-22. Haskell uses a concept called Monads but it seemed to me that in order to understand Haskell one needs to understand Monads and in order to understand Monads, one needs to understand Haskell.

There are however numerous tutorials and explanations on Monads. For instance:

I've looked at many of these tutorials but never managed to get Monads. Its not that I can't understand difficult concepts or that I can't handle weird ass programming languages. The problem was that all these tutorials explained monads from the point of view of people who already understood Monads.

Anyway, my difficulty with Monads is at an end. I've just found an explanation of Monads called Of monads and spacesuits written by Eric Kow. It explains Monads using astronauts, space stations and space suits. I finally get it!

And now that I do, I can play with Monads in Ocaml and also go ahead and learn Haskell.

However, I will not be using Monads in C++ because the C++ implementation is just too damn weird. It makes a concept that can already be difficult even harder as well as making it unreadable. A pox on C++.

Posted at: 20:24 | Category: CodeHacking | Permalink

Sun, 22 Oct 2006

Ocaml : Exception Backtraces.

There's a paper dated December 2002 by Kevin Murphy where he explains why he was looking at Ocaml. That article was recently linked on programming.reddit.com and there was a comment complaining that Ocaml couldn't print out backtraces on exceptions. Someone posted later that this was not right, but I've heard this complaint often enough that I thought I should blog about how to do it.

First off, Ocaml has two compilers, one which produces bytecode and one which produces native binaries. The native code compiler is not currently able to produce exception backtraces and this is where the Reddit commenter got the idea. However, there is a patch in the Ocaml bug tracker which adds backtrace capabilities. I'm hoping that this goes into the compiler proper in the next release or two.

For a project that is currently compiling with ocamlopt (the native code compiler), changing the to bytecode compiler is as simple as editing the Makefile and replacing all invocations of "ocamlopt" with "ocamlc -g" where the "-g" turns on exception backtraces. You can then rebuild the application. The final step is to turn on backtraces in the bytecode run time environment which is done by setting an environment variable:


  export OCAMLRUNPARAM="b1"

Once compiled to bytecode and with the environment variable set, the application can be run and should produce the required backtrace. The following is an example of a backtrace from something I'm working on at the moment (I hacked the code to make sure I could get one).


  Fatal error: exception Invalid_argument("index out of bounds")
  Raised by primitive operation at unknown location
  Called from file "meyers_diff.ml", line 93, characters 1-31
  Called from file "meyers_diff.ml", line 200, characters 10-52
  Called from file "meyers_diff.ml", line 221, characters 16-60
  Called from file "meyers_diff.ml", line 264, characters 11-148
  Called from file "meyers_diff.ml", line 305, characters 17-50
  Called from file "array.ml", line 130, characters 31-51
  Called from file "meyers_diff.ml", line 323, characters 1-316

Obviously it would be nicer if function names were included here, but this is more than sufficient for debugging purposes.

Posted at: 10:39 | Category: CodeHacking/Ocaml | Permalink

Wed, 04 Oct 2006

On Design Patterns.

picture of book

The book "Design Patterns : Elements of Reusable Object-Oriented Software" by Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides is one of the most well known books about the concept of design patterns ; the idea of codifying generic solutions to recurring programming problems. The book is so well known that the authors came t