Wed, 01 Jul 2009

Three More for the Debian New Queue.

Over the last couple of weeks I've managed to get three new packages into the Debian NEW queue :

Thanks to Simon Horman for sponsoring/uploading the first two of and Matt Palmer for sponsoring/uploading haskell-json.

Posted at: 18:32 | Category: CodeHacking/Debian | Permalink

Sun, 21 Jun 2009

FP-Syd #16.

On Thursday June 18th we held the 16th meeting of FP-Syd, the Sydney Functional Programming group. The meeting was held at Google's Sydney offices and we had about 25 people attending to hear our two presentations.

This month we tried something a little different, 5 minute lightning talks. We had four lightning presenters :

Interestingly, none of the presenters managed to stay within their allotted 5 minutes but the time limits were not enforced.

Our main speaker of the evening was Eric Willigers who gave us an excellent introduction to the Clojure language, a LISP like language for the Java Virtual Machine. Eric started with the main data structures; lists, vectors, maps and sets, moved on to functions and macros before explaining Clojure's concurrency primitives.

All in all this was another most inspiring and enjoyable meeting. Thanks to all our speakers as well as Shane Stephens and Google for providing the venue.

Posted at: 13:03 | Category: FP-Syd | Permalink

Mon, 15 Jun 2009

Two More for the Debian New Queue.

I've got two new packages in the Debian NEW queue :

These two libraries are obviously useful in their own right, but I was particularly interested in packaging these because they are build dependencies for hoogle which I'm currently working on packaging.

Thanks to Simon Horman for sponsoring these uploads and the previous ones.

Posted at: 22:31 | Category: CodeHacking/Debian | Permalink

Tue, 09 Jun 2009

Debian Maintainer.

With the closing of Debian bug #531811 I am now officially a Debian Maintainer.

Although I've been using Debian and related distributions for nearly a decade, for most of that time I've been pretty happy just being an upstream developer and a downstream user.

That changed when I started learning and using Haskell some 6 months ago. After using Ocaml on Debian for about four years I had grown spoilt by the work of the Ocaml Debian Maintainers who do a stellar job in keeping the everything Ocaml related in Debian up to date. Any time I needed a new Ocaml library, it was nearly always just an apt-get install away.

Unfortunately Haskell on Debian is in a poor state in comparison to Ocaml. My main aim in becoming a Debian Maintainer is to improve Haskell on Debian. First of all, I will be packaging Haskell libraries. So far I have make Debian packages of the following three Hackage packages :

I have also joined the Debian Haskell Mailing List and hope to help out on packaging and improving the compiler version transition process.

Posted at: 19:52 | Category: CodeHacking/Debian | Permalink

Fri, 05 Jun 2009

Jim Henson Co.

I received an email this morning:


  From: David XXX <xxx@creatureshop.com>
  To: <erikd@xxx.com>
  Subject: libsndfile usage
  Date: Thu, 4 Jun 2009 10:52:02 -0700
  User-Agent: Mozilla-Thunderbird 2.0.0.19 (X11/20090103)

  Erik, just thought I would mention we have been using libsndfile
  internally at the Jim Henson Co. for a while, we recently got approval
  to open source some of our utilities which mostly use libsndfile to read
  in wav files for encoding into a quicktime file, using libquicktime. We
  released them as usage examples and extras for libquicktime.

  http://libquicktime.cvs.sourceforge.net/viewvc/libquicktime/lqt_utils/

  --
  David XXX
  Pipeline Tools Programmer
  Jim Henson Creature Shop
  xxx@creatureshop.com

This is so cool.

Of course Jim Henson and Company are probably best known for the Muppets and Sesame Street but a couple of years ago my then four year old daughter absolutely loved watching Bear in the Big Blue House.

In a subsequent email David explained that the Bear program was before his time, and that libsndfile was being used as part of the render process in the production of Sid the Science Kid which I don't think has reached this country yet.

Posted at: 21:06 | Category: CodeHacking/libsndfile | Permalink

Sun, 24 May 2009

FP-Syd #15.

On Thursday May 21st we held the 15th meeting of FP-Syd, the Sydney Functional Programming group. The meeting was held at Google's Sydney offices and we had about 22 people attending to hear our two presentations, an Erlang double header.

The first presentations was by Ben Johnston, a PhD candidate at University of Technology, Sydney. Ben's talk was titled "How to build a robotic bear in two weeks (or less)." and in it he explained how the need to get some code running on a new robot quickly led him to try Erlang. He explained how he used C for low level control of hardware devices, SWI Prolog for the AI and Erlang for all the glue to hold it all together.

Next up we had Matt Palmer talk about the Erlang database called Mnesia. Matt showed the close correspondence between Erlang queries and their equivalent in SQL. He also talked about the ease of replications, sharding and in-memory databases in Mnesia.

A big thanks goes to Ben and Matt for their presentations Thanks also to Shane Stephens and Google's Sydney office for making their facilities available.

Posted at: 21:23 | Category: FP-Syd | Permalink

Mon, 18 May 2009

FP-Syd #14.

Some time ago, on Thursday April 16th, we held the 14th meeting of FP-Syd, the Sydney Functional Programming group. The meeting was held at Google's Sydney offices and we had about 25 people attending to hear our two presenters.

The first presentations was by Tom Davies titled "CAL/OpenQuark and Java" about CAL and Haskell 98 like language for the Java Virtual machine. CL was released as open source under a BSD-like license as OpenQuark in 2007. Tom gives a quite comprehensive list of the differences between between CAL and Haskell in his slides and a PDF file of the development history of Quark in a supplementary PDF. The main problem with CAL/OpenQuark is that it is effectively abandon-ware and that no-one is actively maintaining it.

The second presentation was by Simon Winwood and was titled "Program extraction from the Coq Theorem Prover". I've been trying unsuccessfully to contact Simon to get hold of his slides. I'll update this when I do finally have a link.

Update 2009/06/05 : Simon's slides are now available here.

A big thanks goes to Shane Stephens and Google's Sydney office for making their facilities available. Thanks also to Tom and Simon for their presentations.

Posted at: 19:15 | Category: FP-Syd | Permalink

Sun, 17 May 2009

Suspend to Disk in Jaunty.

I upgraded my laptop to Ubuntu 9.04 early in the beta release phase and my only disappointment with this release was that suspend to disk, which had worked so well in Hardy and Intrepid was broken in Jaunty. I even raised a Launchpad bug about it.

Unfortunately it seemed that nothing was being done about this bug until another user Mathew Mueller posted a solution to the bug report.

I am now a happy camper. Thanks Mathew and thanks Ubuntu.

Posted at: 17:38 | Category: Tech | Permalink

Thu, 14 May 2009

libsndfile 1.0.20.

There's a new release of libsndfile available in the usual place. This is a security bug fix release which fixes a potential heap overflow in VOC files found and reported by Tobias Klein ( http://www.trapkit.de/ ) and another in the AIFF file parser found by me.

I am also making available patches for older versions of libsndfile:

Hopefully the next release will contain new features instead of just bug fixes.

Posted at: 20:53 | Category: CodeHacking/libsndfile | Permalink

Thu, 02 Apr 2009

FP-Syd #13.

Just today, Richard Collins reminded me that I forgot to blog the last FP-Syd meeting. I'm correcting that now.

Last Thursday (March 26th) was the 13th meeting of FP-Syd, the Sydney Functional Programming group. The meeting was held at Google's new offices in Sydney and we had about 35 people attending to hear our two presenters.

The first presentation was by Simon (Horms) Horman, who volunteered to give a presentation the morning of meeting after our scheduled speaker pulled out due to illness. Horms' presentation was titled "Using Haskell to Write a Very Small Portion of Xen". He started out by explaining how he is relatively new to Haskell and that most of his coding is actually in C, for the Xen hypervisor. Xen hypervisor Advanced Configuration and Power Interface (APCI) which is programmed in a very simple C like language called ACPI Source Language (ASL). Programming ASL is a rather repetitive and error prone task, so Horms' solution was to write a simple Haskell program to generate the ASL code. Horms' slides, Haskell code and output ASL code is available in this tarball .

The second presentation was by Mark Wotton on his attempts to improve predictive text entry on mobile phones. Mark explained how he wanted to improve predictive text by using the previous three words instead of just the current word. He chose Haskell for its rapid prototyping capabilities and explained some of the difficulties he had with performance, especially with large corpora and how he got around these problems.

Mark has put up his slides up here and even has a demo available on the web.

A big thanks goes to Shane Stephens and Google's Sydney office for making their facilities available. Thanks also to Horms and Mark for their presentations.

Posted at: 21:36 | Category: FP-Syd | Permalink

Tue, 03 Mar 2009

libsndfile 1.0.19.

There's a new release of libsndfile available in the usual place. There is also a back story about this release.

At about the same time as my blog post entitled "Security Hyperventilating" I released version 1.0.18 of libsndfile. A couple of days later I received an email from Alin Rad Pop of Secunia Research (the company who got it right last time) informing me of a real, potentially exploitable bug in my newly released code. I've been told that this security vulnerability will be made public over the next couple of days as CVE-2009-0186.

While I certainly didn't believe my code was bug free I have worked very hard to reduce the bugs as much as possible. I have a full and rather comprehensive test suite, I run valgrind over the code regularly, I have learnt from past mistakes and I when I find one bug I nearly always take the time to search for other instances of the same class of bug in the code.

I even wrote a program to do automated Fuzz Testing. This program takes an existing sound file, modifies it, writes it to disk and then runs it through libsndfile. If libsndfile segfaults or does anything else wildly wrong, the problem file is saved for later review. That review usually results in a bug fix.

In spite of all this testing, there was still a security vulnerability. Thats mainly because libsndfile is one of those projects that needs to parse untrusted binary data. Remember all those exploitable bugs against libjpeg and libgif in the late 1990s? Well both of those projects only parse one file type; libsndfile parses more than a dozen. The only other project I can think of with large numbers of different things to parse is the Samba Project which needs to parse dozens of different kinds of messages in the CIFS protocol.

Of course there are other tools to for finding bugs; static analysis tools. In this field there are FOSS products like Sparse and Splint. The first is rather new and developed specifically to find bugs in Linux kernel source code and the second is basically unmaintained. Both are rather intrusive in that they require special annotations to help them ignore valid code and find buggy code.

Neither of these FOSS programs compare well against commercial offerings like Coverity's Prevent but Coverity is widely regarded as being rather expensive. However, for FOSS projects, Coverity does have a program where it scans FOSS projects and feeds the scan results back to the projects so they can fix bugs.

libsndfile was added to Coverity scan of FOSS projects well over a year ago. I fixed all of the issues (mostly minor) pretty much immediately and then asked to go on to the next rung of the ladder. Unfortunately, I was told that I had to wait for all the projects that were currently on rung 0 to fix their issues before that could happen. Eventually, that did happen, but I don't remember anyone contacting me about it. This slow progress was frustrating.

However, on hearing that there was a CVE about to be published against libsndfile I was fortunate enough to have someone offer to run libsndfile through one of the commercial static analysis tools. The result of that was a report containing 68 warnings, split into four roughly equally sized groups:

The interesting thing about the above test was that the bug that resulted in the CVE was still present in the code analysed by static analysis tool, but was not found. So while this tool did find a bunch of errors it is still not able to find every error. Obviously there is room for improvement here, both in my code and in the static analysis tool.

Posted at: 19:53 | Category: CodeHacking/libsndfile | Permalink

Tue, 24 Feb 2009

Calling from C into Ocaml.

I've got a project where it would be nice to be able to call Ocaml code from a C program. Although interfacing Ocaml and C is covered in the official manual and the O'Reilly Ocaml book, neither of these sources have a complete example.

As a firm believer in the idea that a 100 lines of working code is worth a thousand lines of explanatory text in a book or on the web, I thought I'd put together a small but complete example. First off, here is the Ocaml code (download ocaml-called-from-c.ml):


  let ocaml_puts name =
      Printf.printf "Program name is '%s'.\n" name ;
      (* Must flush stdout before returning to C. *)
      flush stdout

  let ocaml_string_join join arr =
      (* Create and return a string. *)
      String.concat join (Array.to_list arr)

  (* On program initialisation, register functions to be called from C. *)
  let () =
      Callback.register "ocaml_puts" ocaml_puts ;
      Callback.register "ocaml_string_join" ocaml_string_join

There are two functions that will be called from C, ocaml_puts and ocaml_string_join and both functions must be registered as callbacks with the Ocaml runtime using Callback.register. To find the function signatures of these functions we can use the ocamlc program:


  prompt > ocamlc -i ocaml-called-from-c.ml
  val ocaml_puts : string -> unit
  val ocaml_string_join : string -> string array -> string

The first function has a single string parameter and returns nothing, while the second takes two parameters, a string and an array of strings and returns a string.

The C program which calls these two functions looks like this (download c-main-calls-ocaml.c):


  #include <stdio.h>
  #include <stdlib.h>
  #include <string.h>
  
  #include <caml/alloc.h>
  #include <caml/mlvalues.h>
  #include <caml/memory.h>
  #include <caml/callback.h>
  
  
  static void
  call_ocaml_void (const char * name)
  {   CAMLparam0 () ;
      CAMLlocal1 (ostr) ;
  
      ostr = caml_copy_string (name);
  
      value * func = caml_named_value ("ocaml_puts") ;
  
      if (func == NULL)
          puts ("caml_named_value failed!") ;
      else
          caml_callback (*func, ostr) ;
  
      CAMLreturn0 ;
  } /* call_ocaml_void */
  
  
  static void
  call_ocaml_string (char * join, char const ** argv)
  {   CAMLparam0 () ;
      CAMLlocal3 (ojoin, oargv, ores) ;
  
      ojoin = caml_copy_string (join);
      oargv = caml_alloc_array (caml_copy_string, argv) ;
  
      value * func = caml_named_value ("ocaml_string_join") ;
  
      if (func == NULL)
          puts ("caml_named_value failed!") ;
      else
          ores = caml_callback2 (*func, ojoin, oargv) ;
  
      printf ("Ocaml returned : '%s'\n", String_val (ores)) ;
  
      CAMLreturn0 ;
  } /* call_ocaml_string */
  
  
  int
  main (int argc, char ** argv)
  {   const char * progname ;
      int k, count ;
      
      progname = argv [0] ;
      if (strstr (progname, "./") == progname)
          progname += 2 ;
  
      if (argc < 2)
      {   puts ("Need at least 1 command line argument.") ;
          exit (1) ;
          } ;
  
      count = argc >= 2 ? atoi (argv [1]) : 1 ;
      count = count < 1 ? 1 : count ;
      printf ("Count : %d\n", count) ;

      /* Must call this before calling any Ocaml code. */
      caml_startup (argv) ;
  
      for (k = 0 ; k < count ; k++)
          call_ocaml_void (progname) ;
  
      for (k = 0 ; k < count ; k++)
          call_ocaml_string (" ", (char const **) (argv + 1)) ;
  
      return 0 ;
  } /* main */


The main function is mostly self explanatory; the only thing to note is that if we want to call any Ocaml code from C, we must call caml_startup first. Looking at the functions that call into Ocaml, note that these functions begin with a call to CAMLparam0 and ends with a call to CAMLreturn0. These are both macros, the first of which sets up the Ocaml specific stack requirements and the second of which cleans up after the first. The '0' at the end of their names indicates that there are zero Ocaml managed data objects passed into and returned from the C function respectively.

For values to be passed to Ocaml, we use local Ocaml managed variables set up with CAMLlocal1 if we only have one, or CAMLlocal3 if we have 3. Data can be copied into these local Ocaml variables using the caml_copy_* and caml_alloc_* families of functions.

The Ocaml functions we want to call can be looked up by name using caml_named_value and the function actually called using caml_callback if we only have one parameter to pass or caml_callback2 for two parameters.

For the call to ocaml_string_join which returns a string, we can extract the return value from the Ocaml wrapper using String_val. There are also other functions to retrieve other data types, the only real caveat being that if the type isn't atomic (eg int or double) and you want to return it from the C function it will be necessary allocate memory for it and copy it because the memory area returned from Ocaml will be invalid after the call to CAMLreturn0.

Finally, building this simple example can be done as follows (using version 3.10.2 of the Ocaml compiler):


  ocamlopt -c ocaml-called-from-c.ml -o ocaml-called-from-c.cmx
  ocamlopt -output-obj -o camlcode.o ocaml-called-from-c.cmx 
  gcc -g -Wall -Wextra  -c c-main-calls-ocaml.c -o c-main-calls-ocaml.o
  gcc camlcode.o c-main-calls-ocaml.o -ldl -lm -L /usr/lib/ocaml/3.10.2 \
         -lasmrun -o c-main-calls-ocaml

The first line compiles to Ocaml file into an Ocaml object (*.cmx) using the native code compiler, the second takes the Ocaml object and all the other Ocaml objects needed and generates a object file (camlcode.o) that can be linked to C code. The last two lines compile the C code into an object file and then links all the C objects and required libraries into an executable.


  prompt > ./c-main-calls-ocaml 4 abc wxyz
  Count : 4
  Program name is 'c-main-calls-ocaml'.
  Program name is 'c-main-calls-ocaml'.
  Program name is 'c-main-calls-ocaml'.
  Program name is 'c-main-calls-ocaml'.
  Ocaml returned : '4 abc wxyz'
  Ocaml returned : '4 abc wxyz'
  Ocaml returned : '4 abc wxyz'
  Ocaml returned : '4 abc wxyz'

At this point its probably a good idea to run the program under valgrind and vary the first parameter to prove to oneself that un-freed memory when the program terminates is a constant (due to the Ocaml runtime) and does not vary in proportion to the number of times the Ocaml code is called (which would indicate a memory leak in the interface code).

Posted at: 22:31 | Category: CodeHacking/Ocaml | Permalink

Sun, 22 Feb 2009

FP-Syd #12.

Last Thursday night was the 12th meeting of FP-Syd, the Sydney Functional Programming group and we had just under 20 people show up at the Sydney offices of Thoughtworks to hear our presenter.

Numbers were down a bit this month, probably because we only had one speaker. While we had a decent number of people volunteering to present when FP-Syd started it looks like we have, to a certain extent, pumped the well dry. The presentations committee is going to look into ways of encouraging more people to come forward and tell us what they're working on.

Anyway, our presenter for the evening was Tim Docker, talking about his Haskell graphing library Graphics.Rendering.Chart which is also on Hackage.

Tim started off with a demo of what the library could do and explained how it was built on top of hscairo the Haskell bindings to the very wonderful Cairo library as well as Gtk2Hs, the Haskell bindings to GTK. He then went on to discuss the difficulties of building a generic graphing library and of creating an API that exposes all the various features in an as-easy-as-possible to use way.

Internally, Tim's library uses Haskell data types and records, often with ten or more separate fields, in addition to having types and records nested within types and records. He went on to explain that accessing these deep and wide structures in Haskell was painful. In particular, the problems are:

Tim's solution to this problem was the use of the data-accessor library and Template Haskell for automatic generation of accessors. While this was an adequate solution, Tim felt that it was working around a limitation inherent to the Haskell language.

Finally, Tim spoke about making his Charts library extensible using Higher Order Functions.

A big thanks goes to Nick Carroll and Thoughtworks for making their facilities available for FP-Syd for the last 4 or 5 meetings. The next FP-Syd meeting will be a Google's new head quarters in Pyrmont. Thanks also to Tim for the presentation.

Posted at: 08:54 | Category: FP-Syd | Permalink

Sun, 15 Feb 2009

Ten Years of libsndfile.

Today, February 15th 2009, is the ten year anniversary of the first release of libsndfile.

Like most FOSS projects, libsndfile started off as an urge to scratch an itch. I was interested in Digital Signal Processing (DSP) and wanted an easy way to get digital audio into and out of software I was writing to try out various DSP algorithms. Secondly, I wanted to a sound file editor and one important part of such an editor is an ability to read and write various sound file formats. I did however look at a couple of existing sound file editors and found that most of them available at the time had buggy and incorrect WAV file handling. So I started out getting that part fixed. Nowadays, most sound file editors on Linux and many on other platforms use libsndfile for file I/O.

In its 10 years of existence, libsndfile has grown from some 5000 lines of code to over 45000 lines of code (not counting the test suite and the example programs). The earliest versions could read WAV, AIFF and AU file formats while the latest version supports 25 formats and is still growing.

It was originally written to run on Linux and other UNIX-like systems but soon ended up running on windows and Mac OS9 (the old non-Unix Apple Macintosh) operating system. Fortunately Mac OS9 has been assigned to the dustbin of history leaving windows as the only operating system that was difficult or painful to support. Recently, the windows development has moved to a system where the only way I support building of libsndfile for that OS is to cross compile from Linux, with the test suite being run under Wine. This has made my life significantly easier since I also release pre-compiled windows binaries.

One surprise for me was that a Wikipedia entry as added in 2006. The page says:

"libsndfile is a widely-used [citation needed] C library"

and I think that the ten year anniversary of the first release may be a good time to look at where libsndfile is actually being used. With a little research and some help from the libsndfile mailing lists, this is what I found (in no particular order):

On top of that there are language bindings for Fortran 77, Free Pascal, Haskell, Perl, Python ( PySndfile, AudioLab, libsndfile-python and possibly others ), Ruby and probably many others.

Overall its been a fun ten years. I've learnt a lot about writing reliable and portable cross platform code and become a much better coder for it. Having libsndfile as a hobby project has definitely helped my employment prospects and my career as a professional software engineer.

The next ten years of libsndfile will mainly be maintenance, but new file formats (I'm currently working on Ogg/Speex) and features will be added as needed.

Posted at: 11:31 | Category: CodeHacking/libsndfile | Permalink

Sat, 14 Feb 2009

Security Hyperventilating.

Just today, David Cournapeau and Lev Givon found a bug in all versions of libsamplerate up to and including 0.1.6. The bug causes a segfault and is basically the same as a bug I thought I fixed last year that resulted in the following entry in the ChangeLog:


  2008-07-02  Erik de Castro Lopo  <erikd AT mega-nerd DOT com>

     * src/src_sinc.c
     Fix buffer overrrun bug at extreme low conversion ratios. Thanks to Russell
     O'Connor for the report.

That bug fix went into version 0.1.4 of libsamplerate which was released on July 2nd, 2008. As a result, I was contacted by computer security firm Secunia Research on July 3rd and asked the following questions:

I replied with a reasonably full explanation and stated that while this bug, if triggered, would crash the program, it was not, in my opinion, exploitable. Secunia Research was happy with my explanation and analysis and I thought that was that.

However, in November of 2008 there was a flurry of security advisories which quoted the above ChangeLog entry but got the security implications completely wrong. The curious thing was that no-one, not a single person contacted me or the project mailing list to ask about the severity of the problem.

The earliest of these advisories I can find was dated November 3rd and appeared on the Openwall OSS-Security mailing list. However, by December 2nd, the Gentoo security team was saying:

"A remote attacker could entice a user or automated system to process a specially crafted audio file possibly leading to the execution of arbitrary code with the privileges of the user running the application."

I on the other hand, remain convinced that this bug is not exploitable under anything other than purely theoretical circumstances.

Since I have now found another variant of the same bug I decided I should document the problem more fully than last time. The problem occurs deep in a rather complicated piece of code that looks like this:


  static void
  prepare_data (SINC_FILTER *filter, SRC_DATA *data, int half_filter_chan_len)
  {   int len = 0 ;
  
      /* Calculation of value of len elided. */
  
      len = MIN (filter->in_count - filter->in_used, len) ;
      len -= (len % filter->channels) ;
  
      memcpy (filter->buffer + filter->b_end, data->data_in + filter->in_used,
                          len * sizeof (float)) ;

The problem is that under conversion ratios near the lower bound of allowed values, the variable len can end up as a small negative value (I saw values in the range [-20, -1]). That small negative number then gets multiplied by 4 (sizeof float) and then passed as the third parameter to memcpy.

However, the third parameter to memcpy is of type size_t which is an unsigned value the same size as sizeof (void*). In the C programming language, when a small signed value is converted to an unsigned value, it gets converted into a very large positive value very near the maximum representable positive unsigned value, ie 4Gig on 32 bit systems or 2 to the power of 64 on 64 bit systems.

Due to the conversion, the memcpy tries to write to a huge chunk of memory (nearly 4 Gig on 32 bit systems, and a hell of a lot more on 64 bit systems). Under a protected memory system (Linux, Windows NT or later and Mac OSX), the only way this bug could possibly exploitable would be if the memcpy returns.

However, since the memcpy is trying to overwrite the vast majority of the CPU's memory space, the chances of the memcpy returning are essentially zero for any sane operating system. Instead, the process containing the call into libsamplerate will attempt to write outside its allocated memory space and will therefore be terminated by the operating system with a segmentation fault.

I have now fixed the function containing the memcpy so that it checks the value of len and returns from the function if it is outside a sane range.

Now I do think that security researchers and security specialists at Linux distributions do a very important and rather thankless job. I also think that in this case some of them fell down rather badly. Secunia Research did the right thing and asked me about the implications of the bug. The others, extrapolated incorrectly, from very little information and got it wrong, without ever contacting me. I'm hoping that the ChangeLog entry (below)


  2009-02-14  Erik de Castro Lopo  <erikd AT mega-nerd DOT com>

      * src/src_sinc.c
      Fix a segfault which occurs when memcpy is passed a bad length parameter.
      This bug has zero security implications beyond the ability to cause a
      program hitting this bug to exit immediately with a segfault.
      See : http://www.mega-nerd.com/erikd/Blog/2009/Feb/14/index.html
      Thanks to David Cournapeau.

will prevent the kind of security hyperventilating I saw last time. The new version of libsamplerate, version 0.1.7, has just been released.

Update : 2009-02-15 11:19

David Cournapeau informs me that Lev Givon, as a user of David's libsamplerate Python bindings, reported this bug to him and that it was actually Lev that found the bug.

Posted at: 22:58 | Category: CodeHacking/SecretRabbitCode | Permalink

Sat, 07 Feb 2009

libsndfile 1.0.18.

Over two years since the last one I've finally managed to do a new release of libsndfile. The changes are:

The Win32 installer and binaries have been tested under both Wine and Windows XP. The Win64 installer and binaries have received much less testing and should be considered alpha quality.

For those compiling from source, it should be noted that the configure script requires libvorbis version 1.2.1 or greater which is currently only available from Xiph SVN.

Posted at: 17:01 | Category: CodeHacking/libsndfile | Permalink

Sun, 01 Feb 2009

FP-Syd #11.

Last Thursday night was the 11th meeting of FP-Syd, the Sydney Functional Programming group and we had just over 20 people show up at the Sydney offices of Thoughtworks to hear two speakers.

First up was your correspondent with a short presentation on the use Functional Programming languages (Ocaml and a bit of Haskell) at bCODE Pty Ltd a small venture capital funded startup in Sydney Australia.

The second presentation was by Paul Steckler who gave a 10000 meter view of PLT Scheme and its associated tools. Included in this presentation was a quick demo of Typed Scheme which adds static type checking (ie at compile time) to an otherwise dynamically typed language.

A big thanks goes to Nick Carroll and Thoughtworks for making their facilities available for FP-Syd. Thanks also to Paul for the scheme presentation.

Posted at: 11:14 | Category: FP-Syd | Permalink

Sat, 31 Jan 2009

Leonard Cohen Concert (28/01/2009).


[Leonard Cohen]
Photo : Jon Reid (SMH)

I was introduced to the magic of Leonard Cohen's music when I was fifteen years old. I have now been listening to that music for two thirds of my life and on Wednesday night I got to see him in concert at the Sydney Entertainment Center.

Although he is 74 years old, Cohen gave a amazing performance backed by a nine piece band of exceptional musicians. He even played acoustic guitar on a few tracks himself. The first set was an hour long and the second, including encores, was one hour and 40 minutes.

As long time fan, I knew the vast majority of the songs they played and more importantly the favourites were all there; from earlier songs like "Suzanne" through to a large slab of the album "Various Positions" the one I consider to be his best.

Highlights of the evening for me were "Suzanne", "Tower of Song", "Bird on a Wire", "Everybody Knows", "A Thousand Kisses Deep", "Take This Waltz" and "So Long Marianne". The most moving song of the night was a personal favourite of mine, "If It Be Your Will". Cohen started it by reciting the prayer-like lyrics and then handed it over to two of his backing vocalists, the Webb Sisters who completed it. The two women sang beautifully with a sparse backing of acoustic guitar, harp and organ. You can get an idea of how spellbinding this was from the video of a very similar performance at London's Royal Albert Hall.

The day after the concert I googled a bit and found that Cohen has been touring since mid 2008 and that the shows have been receiving close to universally rapturous reviews ( Dublin, Manchester, Italy, Berlin and many many more). Personally I am not surprised. This was a truly inspired and exhilarating performance.

Posted at: 16:34 | Category: Music | Permalink

Fri, 23 Jan 2009

The Setting SUN.

Back in 2003, when the SCO Group launched their crazy ass lawsuit against companies using Linux, I was working at Sun Microsystems. About a after the lawsuit was announced, the SUN CEO at the time, Scott McNealy was quoted [Note] in this article:

"Don't touch open source software unless you have a team of intellectual property lawyers prepared to scour every single piece [of the open source code]. We offer indemnification, but many suppliers do not. A lot of companies are going to get very disappointed as we move forward. It will become a very challenging intellectual property issue."

I also clearly remember the reaction of Alan, one of my SUN co-workers when he read this. Alan jumped up out of his seat and said quite loudly "Go Scott!". In fact, it was so loud that I'm sure most of the people working on the same floor heard it. I also remember saying to Alan that SCO had started a lawsuit, that they still had to win it and that until they do win, Scott McNealy "should shut the f#@k up".

I bring all this up now, some 6 years later, because SCO Group is on life support and has been reduced to a litigation only company while Redhat, one of the many companies SCO Group had in its sights is on the verge of surpassing the market capitalisation of Sun Microsystems.

Note: This quote was extremely hard to track down. I did a bunch of searches that finally resulted in a blog post with a dead link to a page on Infoconomy. Fortunately, the very wonderful Wayback Machine had a copy of that one single Infoconomy article that contained the damning quote.

Posted at: 19:14 | Category: | Permalink

Sun, 11 Jan 2009

libsamplerate 0.1.5.

There is a new release of libsamplerate available here. This release contains some improvements to the optimisations that were mentioned in this blog post.

After announcing the previous optimisation work on the Secret Rabbit Code mailing list someone asked for a special case optimisation of 6 channels for doing the 5.1 surround sound format. Then there was also a request for an 8 channel special case for doing 7.1 surround.

I added the 6 channel special case immediately and then found another optimisation for the multi-channel case which uses a horrible hack called Duff's Device in the inner most loop. The results for these optimisations can be seen on the following throughput graphs:


[Throughput graphs]

Obviously, the special cases for 1, 2, 4, and 6 channels have significant improvements, especially for 4 and 6 channels. The general multi channel code path is used for 3, 5, 7 and more channels. Interestingly, if one draws a line through the 1, 2, 4 and 6 channel through-puts for and then a line through the general multi channel through-puts, they intersect at about 8 channels, suggesting that there is very little to be gained by adding a special case code path for 8 channels.

The only downside of this new release is that it uses a C construct that is part of the 1999 ISO C Standard which the Microsoft C compiler does not support. Windows users that want to compile libsamplerate should either use GNU GCC or the Intel compiler with the -c99 command line option.

Posted at: 20:47 | Category: CodeHacking/SecretRabbitCode | Permalink

Wed, 31 Dec 2008

Parsec and Expression Parsing.

Haskell's Parsec parsing library (distributed with the GHC compiler since at least version 6.8.2) contains a very powerful expression parsing module Text.ParserCombinators.Parsec.Expr. This module makes it easy to correctly parse arithmetic expressions like the following according to the usual programming language precedence rules.


  2 + a * 4 - b

Once parsed, the abstract syntax tree for that expression should look like this:


hackfest living room

The original Parsec paper by Daan Leijen even gives a small example of using the expression parser. As can be seen below, the operators are defined in a table (ie a list of lists) where the outer list is a set precedence levels (from highest precedence to lowest) and each inner list specifies all the operators for that precedence level.


  module Parser
  where
  
  import Text.ParserCombinators.Parsec
  import Text.ParserCombinators.Parsec.Expr
  
  expr :: Parser Integer
  expr = buildExpressionParser table factor <?> "expression"

  table :: [[ Operator Char st Integer ]]
  table = [
      [ op "*" (*) AssocLeft, op "/" div AssocLeft ],
      [ op "+" (+) AssocLeft, op "-" (-) AssocLeft ]
      ]
    where
      op s f assoc = Infix (do { string s ; return f }) assoc
  
  factor = do { char '(' ; x <- expr ; char ')' ; return x }
     <|> number
     <?> "simple expression"
  
  number :: Parser Integer
  number = do { ds <- many1 digit; return (read ds) } <?> "number"

The above simple example works fine but there is a potential for a rather subtle problem if the expression parser is used in conjunction with the Text.ParserCombinators.Parsec.Token module.

The problem arises when trying to parse expressions in C-like languages which have bitwise OR (|) as well as logical OR (||) and where the bitwise operation has higher precedence than the logical. The symptom was that code containing the logical OR would fail because the expression parser was finding two bitwise ORs. After banging my head against this problem for a considerable time, I posted a problem description with a request for clues to the Haskell Cafe mailing list with the following code snippet:


  import qualified Text.ParserCombinators.Parsec.Expr as E

  opTable :: [[ E.Operator Char st Expression ]]
  opTable = [
      -- Operators listed from highest precedence to lowest precedence.

      {- snip, snip -}

      [    binaryOp "&" BinOpBinAnd E.AssocLeft ],
      [    binaryOp "^"  BinOpBinXor E.AssocLeft ],
      [    binaryOp "|"  BinOpBinOr E.AssocLeft ],

      [    binaryOp "&&" BinOpLogAnd E.AssocLeft ],
      [    binaryOp "||" BinOpLogOr E.AssocLeft ]
      ]

  binaryOp :: String -> (SourcePos -> a -> a -> a)
                     -> E.Assoc -> E.Operator Char st a
  binaryOp name con assoc =
      E.Infix (reservedOp name >>
          getPosition >>=
          return . con) assoc

Unfortunately, no real answer was forthcoming and while I still held out hope that an answer might eventuate, I continued to work on the problem.

Eventually, after reasoning about the problem and looking at the source code to the Token module I realised that the problem was with the behaviour of the reservedOp combinator. This combinator was simply matching the given string at the first precedence level it found so that even if the code contained a logical OR (||) the higher precedence bitwise OR would match leaving the second vertical bar character un-parsed.

My first attempt at a solution to this problem was to define my own combinator reservedOpNf using Parsec's notFollowedBy combinator and use that in place of the problematic reservedOp.


  opChar :: String
  opChar = "+-/%*=!<>|&^~"

  reservedOpNf :: String -> CharParser st ()
  reservedOpNf name = try (reservedOp name >> notFollowedBy (oneOf opChar))

This solved the immediate problem of distinguishing between bitwise and logical OR, but I soon ran into another problem parsing expressions like this:


  if (whatever == -1)
     .....

which were failing at the unary minus.

A bit of experimenting suggested that again, the problem was with the behaviour of the reservedOp combinator. In this case it seemed the combinator was matching the given string, consuming any trailing whitespace and then the notFollowedBy was failing on the unary minus.

Once the problem was understood, the solution was easy, replace the reservedOp combinator with the string combinator (which matches the string exactly and doesn't consume trailing whitespace), followed by the notFollowedBy and then finally use the whiteSpace combinator to chew up trailing white space.


    reservedOpNf :: String -> CharParser st ()
    reservedOpNf name =
        try (string name >> notFollowedBy (oneOf opChar) >> whiteSpace)

Despite minor problems like the one above, I am still incredibly impressed with the ease of use and power of Parsec. Over the years I have written numerous parsers, in multiple host languages, using a number of parser generators and I have never before used anything that comes close to matching Haskell and Parsec.

Posted at: 07:28 | Category: CodeHacking/Haskell | Permalink

Tue, 30 Dec 2008

Hackfest in the House that Hack Bought.

Just before Newtonmas (also known as xmas to some) I had a small hackfest in the new house that my wife and I are calling "The House that Hack Bought". We had horms, AfC, Scott, raster, kfish, pmiller and myself hacking and chatting from about 10am.

There were 5 of us in the living room:


hackfest living room

while kfish and pmiller hacked in the sun room:


hackfest sun room

and Scott even managed to get a bit of hardware hacking done in the kitchen:


hackfest sun room

It was a great day. Thanks to all who participated.

Posted at: 02:40 | Category: NewHouse | Permalink

Mon, 29 Dec 2008

learnHaskell :: Problem -> IO Solution

Over the last week or so I've been learning Haskell. Unlike a previous attempt to learn Erlang I think this one will actually end up bearing fruit. The main thing that slowed down my attempt to learning Erlang was that I didn't have a task to apply it to that exploited the strengths of the language.

For Haskell I have a task that is ideally suited to the language and plays to its strengths; writing a parser for an Javascript-like language. Just like when I learned Ocaml I am jumping in at the deep end with a difficult problem and learning the language on the way (ie as a side-effect of tackling the task itself).

Normally my tool of choice for this task would be Ocaml and I did in fact start out writing a parser in Ocaml using the Menhir parser generator. Menhir creates a parser from an LR(1) grammar specification where the LR(1) means that the grammar is left recursive and must need no more than one token look-ahead to resolve ambiguities. Over two days of intense hacking I made quite a lot of progress on the grammar and then hit a wall; numerous rather incomprehensible shift/reduce and reduce/reduce warnings plus a part of the grammar (raw XML embedded directly into the source code) which was not context free and for which I could not see any possible way of parsing with only one token look-ahead.

Since I already knew Ocaml, I looked around at some of the other parser generators for Ocaml like Aurochs and PCL. Aurochs uses Parsing Expression Grammar (or Packrat parsing) and seemed interesting, but was rejected because it seemed to gave very little control over the Abstract Syntax Tree it generated. PCL on the other hand was a monadic parser combinator library which is basically a port of Haskell's Parsec library to Ocaml. The main problem I saw with PCL was that it used Monads and lazy evaluation, neither of which are very common in standard Ocaml code. PCL was also rather new and I wasn't sure if it was stable and mature enough for quite an advanced parsing task.

Of course I had been hearing good things about Haskell's Parsec library for a couple of years and decided that this was a good time to give it a try. The interesting thing about Parsec is that it uses lazy evaluation to provide what is effectively infinite token look-ahead. Obviously, a Parsec grammar should be written so that it only uses more than one look-ahead when that is absolutely necessary.

With a bit of googling I found a bunch of examples using Parsec for simple things and that was enough code to get me started. A week later, after an IRC question answered by Don Stewart, a bit of help from Conrad Parker and a question answered on the Haskell Cafe mailing list, I am almost as far along as I got with the Ocaml version of the parser.

My impressions of Haskell and Parsec so far are as follows:

All in all, I'm really enjoying my foray into the world of Haskell.

Posted at: 09:53 | Category: CodeHacking/Haskell | Permalink

Mon, 15 Dec 2008

Stress Relief Kit.

There are some bits of hardware which are such a pain in the neck to work with that they require a special stress relief kit to go with them. Here's mine, for a certain touch screen controller.


[Throughput graphs for old and new Best Sinc converter]

Thanks Shaun!

Posted at: 22:32 | Category: | Permalink

Sun, 14 Dec 2008

Rabbit Optimisation.

For some time I have known that the sinc based converters in my sample rate conversion library Secret Rabbit Code might benefit from a particular optimisation which reuses the filter coefficients where possible, rather than recalculating them. The good news is that I finally managed to get around to performing this optimisation work and the results are reasonably impressive.

The old converter used what was basically a single channel converter on each channel in turn. Since part of this single channel operation involved calculating the filter coefficients on the fly it was pretty obvious that if more than one channel was being processed, it would be possible to calculate the coefficients for the first channel and reuse them for all the rest. Once that was done, the inner loop can be special cased for commonly used channel counts like 1, 2 and 4 while having other channel counts handled by a generic function that can handle any number of channels. These special cases contain straight line code where the generic function contains a loop.

As can bee seen in the throughput graph for the SRC_SINC_BEST_QUALITY converter, this optimisation resulted in modest throughput improvements for 1 channel, a big improvement for 2 channels and an outrageous more than doubling of throughput for the 4 channel case. Even the generic multi-channel case, represented here by 3 and 5 channels showed a modest improvement. It should be noted that the filter coefficients are the same for the old and new versions so the only changes in performance are due to the optimisation.


[Throughput graphs for old and new Best Sinc converter]

These results were obtained on a 3 plus year old 1.1 GHz Pentium M laptop. More recent machines should show much better absolute results but similar proportional improvements. The other sinc-based converters will also show similar proportional improvements.

A new version of Secret Rabbit Code containing this optimisation should probably be released some time during the next couple of days.

Posted at: 08:16 | Category: CodeHacking/SecretRabbitCode | Permalink