Sat, 16 Oct 2010

libsndfile Malware on Windows.

I just found a very suspicious bit torrent download available here:

http://www.torrentzap.com/torrent/1581031/Libsndfile+%2864-Bit%29+1.0.23

The file being shared is intended to look like the Windows 64 bit installer for libsndfile-1.0.23 and seems to be widely available on this and a number of other torrent sites.

However, the file on the torrent sites is called libsndfile-64-bit-1.0.23.exe while the one I distribute is called libsndfile-1.0.23-w64-setup.exe.

I haven't analyzed the torrent version of the file; I simply don't have the tools or the knowledge to investigate it. I don't even have access to a machine that runs 64 bit Windows. The setup file on my website was cross compiled from Linux to 64 bit Windows using the very wonderful MinGW w64 tools and the setup installer created using INNO Setup running under Wine. However, the file is named differently and has a different md5sum. That in itself is more than enough reason to be suspicious.

The valid file that I distribute has the following md5 and sha256 sums:


    md5sum    : efe73b7cb52724e7db7bb7d6ce145929
    sha256sum : 30896dac1002a7b509b6f4620317dad730d8ad761e4ff0402db5a94b0d4c09a2

I'm not really aware of how problems like this are addressed on Windows. Is there a safe, secure, verifiable way of distributing Windows software packages? If so, I'd appreciate it if someone could let me know how its done.

For Linux this is much easier. Firstly, the vast majority of people on Linux install libsndfile via their Linux distribution. The person who packages libsndfile for any given distribution grabs the source code tarball from my web site. At the same time they should also grab the GPG signature file and verify that the source code tarball is correct and valid.

I don't know what happens in all distributions, but in Debian, the person doing the packaging GPG signs the package before uploading to the Debian servers. Once the GPG signed package is uploaded, the packager's GPG signature is checked before it goes into the unstable distribution. From there the validity of the package is tracked all the way to where an end user installs it on a machine via the process documented here. This process means that its very difficult to get malware onto a Linux machine via the distribution's package manager.

I suppose this in one more reason why people should be running Linux rather than Windows.

Posted at: 12:11 | Category: CodeHacking/libsndfile | Permalink

Thu, 07 Oct 2010

The (Problems with the) RF64 File Specification.

One of the very common sound file formats that libsndfile reads and writes is the WAV format. This format uses unsigned 32 bit integers internally to specify chunk lengths which limits the total size of well formed file to be about 4 gigabytes in size. On modern systems with high bit widths, multiple channels and high sample rates, this 4Gig limit can be run into very easily. For instance at a sample rate of 96kHz, with 24 bit samples, a 5.1 surround sound recording will run into the 4Gig limit after about 41 minutes.

In order to overcome the limitations of WAV, the European Broadcasting Union decided in 2006 to start the specification of an extended WAV file format capable of handling 64 bit file offsets. The document that resulted from this specification process was first released in 2006 and the latest update was made in 2009 and is available here. I have a number of problems with this specification document.

First and foremost, in section 3.5, the document states:

In spite of higher sampling frequencies and multi-channel audio, some production audio files will inevitably be smaller than 4 Gbyte and they should therefore stay in Broadcast Wave Format.

The problem arises that a recording application cannot know in advance whether the recorded audio it is compiling will exceed 4 Gbyte or not at end of recording (i.e. whether it needs to use RF64 or not).

The solution is to enable the recording application to switch from BWF to RF64 on the fly at the 4 Gbyte size-limit, while the recording is still going on.

This is achieved by reserving additional space in the BWF by inserting a 'JUNK' chunk 3 that is of the same size as a 'ds64' chunk. This reserved space has no meaning for Broadcast Wave, but will become the 'ds64' chunk, if a transition to RF64 is necessary.

In short, the suggestion above for writing a file boils down to:

  1. Open the file and write a RIFF/WAV file header with a JUNK section big enough to allow the header to be replaced with an RF64 header if needed.
  2. If the file ends up bigger than 4 gigabytes, go back and replace the existing header with an RF64 header.

There are two problems with this suggestion; it makes testing difficult and it makes the software more complex which means its more likely to contain bugs. The testing problem arises because testing that the RF64 header is written correctly can only be done by writing a 4 gigabyte file. Programmers can then either choose not to test this (which means the software is is highly unlikely to work as specified) or test write a full 4 Gig file. However, programmers also want their tests to run quickly (so that they can be run often) and writing 4 gigabytes of data to disk is definitely not going to be quick. Of course, a smaller unit test might be able to bypass the requirement of writing 4 gigabytes, but it would still be prudent to do a real test at the WAV to RF64 switch over point. The complexity problem is simply that writing a WAV file header first and then overwriting it with an RF64 header later is far more complicated than just writing an RF64 header to begin with. Complexity breeds bugs.

The libsndfile project has had, from the very beginning, a pretty comprehensive test suite and the running of that test suite takes about 30 seconds on current hardware. In order to comprehensively test the reading and writing of RF64 files, libsndfile disregards the rather silly suggestion of the EBU to convert on the fly between WAV and RF64 files. If the software calling libsndfile specifies that an RF64 file be generated, libsndfile will write an RF64 file, even if that file only contains 100 bytes.

A second problem with the RF64 specification is that the specification is ambiguous in a very subtle way. The problem is with how the binary chunks within the file are specified. For WAV files, chunks are specified in this document as:


  typedef unsigned long DWORD ;
  typedef unsigned char BYTE ;

  typedef DWORD FOURCC ;            // Four-character code
  typedef FOURCC CKID ;             // Four-character-code chunk identifier
  typedef DWORD CKSIZE ;            // 32-bit unsigned size value

  typedef struct {                  // Chunk structure
      CKID        ckID ;                   // Chunk type identifier
      CKSIZE      ckSize ;                 // Chunk size field (size of ckData)
      BYTE        ckData [ckSize] ;        // Chunk data
  } CK;

This specifies that a chunk has a 4 byte identifier, followed by a 4 byte chunk size, followed by the chunk data. The important thing to note here is that the chunk size does not include the 4 byte chunk identifier and the 4 byte chunk size field. Inspecting real WAV files found in the wild will confirm that this is the case for all common chunks found in WAV files.

Now contrast the above with how the chunks are specified in the EBU document. Ror instance the 'fmt ' chunk (which is common to both WAV and RF64) is specified as:


  struct FormatChunk5                // declare FormatChunk structure
  {
      char           chunkId[4];     // 'fmt '
      unsigned int32 chunkSize;      // 4 byte size of the 'fmt ' chunk
      unsigned int16 formatType;     // WAVE_FORMAT_PCM = 0x0001, etc.
      unsigned int16 channelCount;   // 1 = mono, 2 = stereo, etc.
      unsigned int32 sampleRate;     // 32000, 44100, 48000, etc.
      unsigned int32 bytesPerSecond; // only important for compressed formats
      unsigned int16 blockAlignment; // container size (in bytes) of one set of samples
      unsigned int16 bitsPerSample;  // valid bits per sample 16, 20 or 24
      unsigned int16 cbSize;         // extra information (after cbSize) to store
      char           extraData[22];  // extra data of WAVE_FORMAT_EXTENSIBLE when necessary
  };

Here, the chunkSize field is simply the "size of the 'fmt ' chunk" and nowhere in the EBU document is it specified exactly how that chunkSize field should be calculated. However, if you give the EBU documentation to any experienced software engineer with no previous knowledge of RIFF/WAV files, they would almost certainly assume that the chunkSize field should be the size of the whole chunk, including the chunkID and chunkSize fields. However, someone who knows about RIFF/WAV files will be less likely to follow that path.

This leaves the programmer implementing code to read and write this format with a couple of possibilities:

However, the last part of section 3.5 of the EBU/RF64 document describes how a WAV file is to be upgraded to an RF64 file, and that description makes no mention of the 'fmt ' chunk being modified during that upgrade. One can only assume from this, that the 'fmt ' chunk in an RF64 file should be identical to that of a WAV file and that the EBU/RF64 specification is misleading.

For libsndfile, I have decided to assume that the specification is indeed misleading. Unfortunately, I'm pretty sure that at some point I will be asked to at least read files which strictly adhere to the literal interpretation of the document. I'm also pretty sure that implementing code to read files written to conform to both interpretations of the spec will be a very painful exercise.

Posted at: 21:36 | Category: CodeHacking/libsndfile | Permalink

Sun, 03 Oct 2010

Distros and Test Suites.

libsndfile is cross platform and is expected to run on 32 and 64 bit CPUs on any system that is reasonably POSIX compliant (ie even Windows). It also has a lot of low level code that does things like endian swapping and bit shifting etc. Although I compile and test the code on all the systems I have access to, I don't have access to everything. That's why libsndfile has a test suite.

The libsndfile test suite is as comprehensive as I can make it. Its taken a lot or work, over man years to get to where it is, but has saved me many times that amount of work tracking obscure bugs.

The test suite is important. That's why I suggest that anyone building libsndfile from source should run the test suite before using the library. This is especially true for people packaging libsndfile for distributions. That's why is so disappointing to see something like this Gentoo bug.

Gentoo managed to mess up their build meta-data resulting in a libsndfile binary that was horribly broken on 32 bit systems. It was broken in such a way that just about every single test in the libsndfile test suite would have failed. Unfortunately, since Gentoo didn't run the test suite they distributed their broken build meta-data to users. And the users started emailing me with weird bug reports.

Fortunately, other distributions like Debian get it right. Debian even keeps build logs for all releases of all packages on all architectures and makes them available on the web. For instance, the build log for libsndfile version 1.0.21-3 on the MIPS can be found here.

If anyone is using a distro which does not routinely run the test suite when building packages which supply a test suite, I recommend that they switch to a distro that does.

Posted at: 22:58 | Category: CodeHacking/libsndfile | Permalink

Fri, 05 Jun 2009

Jim Henson Co.

I received an email this morning:


  From: David XXX <xxx@creatureshop.com>
  To: <erikd@xxx.com>
  Subject: libsndfile usage
  Date: Thu, 4 Jun 2009 10:52:02 -0700
  User-Agent: Mozilla-Thunderbird 2.0.0.19 (X11/20090103)

  Erik, just thought I would mention we have been using libsndfile
  internally at the Jim Henson Co. for a while, we recently got approval
  to open source some of our utilities which mostly use libsndfile to read
  in wav files for encoding into a quicktime file, using libquicktime. We
  released them as usage examples and extras for libquicktime.

  http://libquicktime.cvs.sourceforge.net/viewvc/libquicktime/lqt_utils/

  --
  David XXX
  Pipeline Tools Programmer
  Jim Henson Creature Shop
  xxx@creatureshop.com

This is so cool.

Of course Jim Henson and Company are probably best known for the Muppets and Sesame Street but a couple of years ago my then four year old daughter absolutely loved watching Bear in the Big Blue House.

In a subsequent email David explained that the Bear program was before his time, and that libsndfile was being used as part of the render process in the production of Sid the Science Kid which I don't think has reached this country yet.

Posted at: 21:06 | Category: CodeHacking/libsndfile | Permalink

Thu, 14 May 2009

libsndfile 1.0.20.

There's a new release of libsndfile available in the usual place. This is a security bug fix release which fixes a potential heap overflow in VOC files found and reported by Tobias Klein ( http://www.trapkit.de/ ) and another in the AIFF file parser found by me.

I am also making available patches for older versions of libsndfile:

Hopefully the next release will contain new features instead of just bug fixes.

Posted at: 20:53 | Category: CodeHacking/libsndfile | Permalink

Tue, 03 Mar 2009

libsndfile 1.0.19.

There's a new release of libsndfile available in the usual place. There is also a back story about this release.

At about the same time as my blog post entitled "Security Hyperventilating" I released version 1.0.18 of libsndfile. A couple of days later I received an email from Alin Rad Pop of Secunia Research (the company who got it right last time) informing me of a real, potentially exploitable bug in my newly released code. I've been told that this security vulnerability will be made public over the next couple of days as CVE-2009-0186.

While I certainly didn't believe my code was bug free I have worked very hard to reduce the bugs as much as possible. I have a full and rather comprehensive test suite, I run valgrind over the code regularly, I have learnt from past mistakes and I when I find one bug I nearly always take the time to search for other instances of the same class of bug in the code.

I even wrote a program to do automated Fuzz Testing. This program takes an existing sound file, modifies it, writes it to disk and then runs it through libsndfile. If libsndfile segfaults or does anything else wildly wrong, the problem file is saved for later review. That review usually results in a bug fix.

In spite of all this testing, there was still a security vulnerability. Thats mainly because libsndfile is one of those projects that needs to parse untrusted binary data. Remember all those exploitable bugs against libjpeg and libgif in the late 1990s? Well both of those projects only parse one file type; libsndfile parses more than a dozen. The only other project I can think of with large numbers of different things to parse is the Samba Project which needs to parse dozens of different kinds of messages in the CIFS protocol.

Of course there are other tools to for finding bugs; static analysis tools. In this field there are FOSS products like Sparse and Splint. The first is rather new and developed specifically to find bugs in Linux kernel source code and the second is basically unmaintained. Both are rather intrusive in that they require special annotations to help them ignore valid code and find buggy code.

Neither of these FOSS programs compare well against commercial offerings like Coverity's Prevent but Coverity is widely regarded as being rather expensive. However, for FOSS projects, Coverity does have a program where it scans FOSS projects and feeds the scan results back to the projects so they can fix bugs.

libsndfile was added to Coverity scan of FOSS projects well over a year ago. I fixed all of the issues (mostly minor) pretty much immediately and then asked to go on to the next rung of the ladder. Unfortunately, I was told that I had to wait for all the projects that were currently on rung 0 to fix their issues before that could happen. Eventually, that did happen, but I don't remember anyone contacting me about it. This slow progress was frustrating.

However, on hearing that there was a CVE about to be published against libsndfile I was fortunate enough to have someone offer to run libsndfile through one of the commercial static analysis tools. The result of that was a report containing 68 warnings, split into four roughly equally sized groups:

The interesting thing about the above test was that the bug that resulted in the CVE was still present in the code analysed by static analysis tool, but was not found. So while this tool did find a bunch of errors it is still not able to find every error. Obviously there is room for improvement here, both in my code and in the static analysis tool.

Posted at: 19:53 | Category: CodeHacking/libsndfile | Permalink

Sun, 15 Feb 2009

Ten Years of libsndfile.

Today, February 15th 2009, is the ten year anniversary of the first release of libsndfile.

Like most FOSS projects, libsndfile started off as an urge to scratch an itch. I was interested in Digital Signal Processing (DSP) and wanted an easy way to get digital audio into and out of software I was writing to try out various DSP algorithms. Secondly, I wanted to a sound file editor and one important part of such an editor is an ability to read and write various sound file formats. I did however look at a couple of existing sound file editors and found that most of them available at the time had buggy and incorrect WAV file handling. So I started out getting that part fixed. Nowadays, most sound file editors on Linux and many on other platforms use libsndfile for file I/O.

In its 10 years of existence, libsndfile has grown from some 5000 lines of code to over 45000 lines of code (not counting the test suite and the example programs). The earliest versions could read WAV, AIFF and AU file formats while the latest version supports 25 formats and is still growing.

It was originally written to run on Linux and other UNIX-like systems but soon ended up running on windows and Mac OS9 (the old non-Unix Apple Macintosh) operating system. Fortunately Mac OS9 has been assigned to the dustbin of history leaving windows as the only operating system that was difficult or painful to support. Recently, the windows development has moved to a system where the only way I support building of libsndfile for that OS is to cross compile from Linux, with the test suite being run under Wine. This has made my life significantly easier since I also release pre-compiled windows binaries.

One surprise for me was that a Wikipedia entry as added in 2006. The page says:

"libsndfile is a widely-used [citation needed] C library"

and I think that the ten year anniversary of the first release may be a good time to look at where libsndfile is actually being used. With a little research and some help from the libsndfile mailing lists, this is what I found (in no particular order):

On top of that there are language bindings for Fortran 77, Free Pascal, Haskell, Perl, Python ( PySndfile, AudioLab, libsndfile-python and possibly others ), Ruby and probably many others.

Overall its been a fun ten years. I've learnt a lot about writing reliable and portable cross platform code and become a much better coder for it. Having libsndfile as a hobby project has definitely helped my employment prospects and my career as a professional software engineer.

The next ten years of libsndfile will mainly be maintenance, but new file formats (I'm currently working on Ogg/Speex) and features will be added as needed.

Posted at: 11:31 | Category: CodeHacking/libsndfile | Permalink

Sat, 07 Feb 2009

libsndfile 1.0.18.

Over two years since the last one I've finally managed to do a new release of libsndfile. The changes are:

The Win32 installer and binaries have been tested under both Wine and Windows XP. The Win64 installer and binaries have received much less testing and should be considered alpha quality.

For those compiling from source, it should be noted that the configure script requires libvorbis version 1.2.1 or greater which is currently only available from Xiph SVN.

Posted at: 17:01 | Category: CodeHacking/libsndfile | Permalink

Thu, 31 Aug 2006

libsndfile 1.0.17.

I've just released a new version of libsndfile. The changes are:

Posted at: 22:19 | Category: CodeHacking/libsndfile | Permalink

Thu, 10 Aug 2006

C++ Wrapper for libsndfile, Part 4.

I got some email from Jaq who said:

"You don't actually give a good reason for not wanting a close() method on SndfileHandle; your reasons for not wanting open() are good, but surely a handle needs a way of closing itself? Or does the API overload object deletion with the closing of the handle too?"

So first of all, I should have stated that my current candidate header file is available here. The SndfileHandle object is designed to be used with the Resource Acquisition Is Initialization pattern. In particular, the object will close the file and release all allocations when it goes out of scope. For instance:

  {
      SndfileHandle file ("foo.wav") ;

      // Do something with file which gets closed automatically
      // when file goes out of scope.
  }

So what's the problem with the close() method? Well, its very similar to the problems with the open() method. Lets look at an example:

  SndfileHandle file1 ("foo.wav") ;

  // Make file2 == file1
  SndfileHandle file2 = file1 ;

  // Close file1
  file1.close ("bar.wav") ;

Obviously, the handle associated with file1 should be closed, but what about file2? Should that be closed or remain open?

The fact that its not obvious means that its best left out. If anyone really wants to make sure that a SndfileHandle is closed they can do:

  SndfileHandle file1 ("foo.wav") ;

  // Make file2 == file1
  SndfileHandle file2 = file1 ;

  // Close file1
  file1 = SndfileHandle () ;

In this case its a little more obvious that file1 and file2 now refer to different handles.

Posted at: 20:54 | Category: CodeHacking/libsndfile | Permalink

Wed, 09 Aug 2006

C++ Wrapper for libsndfile, Part 3.

Thinking about the C++ wrapper continues. In the current candidate version, a SndfileHandle contains a pointer to a private reference counted struct which contains the actual data. The SndfileHandle class also has a copy constructor and an assignment operator.

In addition to the above, a number of people on the mailing list have asked for the SndfileHandle class to have open() and close() methods. This seems reasonable on the face of it, but Daniel Schmitt points out that the combination of copy/assign and open/close results in a rather strange ambiguity.

Daniel gives the following example using copy/assign (ie no open/close methods):

  SndfileHandle file1 ("foo.wav") ;

  // Make file2 == file1
  SndfileHandle file2 = file1 ;

  // Now reuse file1
  file1 = SndfileHandle("bar.wav");

At the end of that block of code we now have file1 and file2 operating on different handles, which is exactly what any reasonable programmer would expect.

Now look at what happens if we have open/close methods:

  SndfileHandle file1 ("foo.wav") ;

  // Make file2 == file1
  SndfileHandle file2 = file1 ;

  // Now reuse file1
  file1.open ("bar.wav") ;

The open method can be implemented in one of the following two ways :

After the block of code above, the two different implementations would result in the following state :

Obviously, the second implementation is completely wrong, but the first implementation is at least questionable. In terms of providing something which balances utility and consistency I'm tending to favor the idea of keeping the copy/assign operations and not providing open/close methods.

Posted at: 20:20 | Category: CodeHacking/libsndfile | Permalink

Mon, 31 Jul 2006

C++ Wrapper for libsndfile, Part 2.

After yesterday's blog post a guy in Germany, Daniel Schmitt, piped up on the libsndfile mailing lists and insisted I reconsider the C++ wrapper class' copy/assign issue. My big problem with copy/assign were that they would not behave the way people might reasonable expect them to.

Daniel's major contribution was renaming the class from Sndfile to SndfileHandle. Once that is done, having a copy constructor and an assignment operator using reference counting makes sense. With the class name containing the word "Handle", the name now fits the behavior. This is such a minor and seemingly trivial change but I simply didn't see it.

Thanks Daniel. Brilliant!

Posted at: 19:48 | Category: CodeHacking/libsndfile | Permalink

Sun, 30 Jul 2006

A C++ Wrapper for libsndfile.

Over the years I've received a bunch emails saying stuff like "why did you write libsndfile in that old fashioned C language instead of nice modern shiny C++?". Obviously anyone who even thinks something like this is too ignorant of C to be a good C++ programmer. A competent C++ programmer needs to know and be comfortable with the whole of the C language as well as the whole of C++.

At the time I started work on libsndfile in 1998 I was writing far more C++ code than C code. However, back then, the GNU C++ compiler was nowhere near as good as it is today and I thought a C library interface was a safer bet than C++. In retrospect, I believe the decision of using C was spot on, for the following reasons:

However, some people do prefer C++ to C and many of those would probably be writing their own C++ wrapper. Since the vast majority of these wrappers would largely the same, it makes sense for me to distribute a C++ wrapper with libsndfile.

I decided on the following set of criteria for the wrapper:

It does however use templates for the read/write/readf/writef methods:

  template <typename T> sf_count_t read   (T *ptr, sf_count_t items) ;
  template <typename T> sf_count_t readf  (T *ptr, sf_count_t frames) ;
  template <typename T> sf_count_t write  (const T *ptr, sf_count_t items) ;
  template <typename T> sf_count_t writef (const T *ptr, sf_count_t frames) ;

with explicit specializations for types short, int, float and double.

It also explicitly makes the copy constructor and assignment operator private. The problem with these two is that two objects wrapping the same SNDFILE* pointer will not give the expected behavior. With the C version:

  SNDFILE *file1 = sf_open (filename, ...);
  SNDFILE *file2 = file1 ;

anyone reading the code can see that file1 and file2 are two pointers pointing to the same object. The code reader knows what behavior to expect here.

Now contrast this with the C++ version:

  Sndfile file1 (filename, ...) ;
  Sndfile file2 (file1) ;

The objects file1 and file2 look like two independent objects and should behave like two independent objects, but instead they behave like they do in the C version above. I believe that this is inconsistent.

The only solution that would maintain consistency would be to make the copy constructor do a deep copy but that is simply too much of a pain in the neck to implement.

The current version of the wrapper is available here or in verision 1.0.17 or later of libsndfile.

Posted at: 18:26 | Category: CodeHacking/libsndfile | Permalink