Mon, 25 Jan 2010
Golden Section - Jumping the Gun (1993).
Almost two decades ago, when I was at university I played bass guitar in a couple of rock bands; Golden Section and Fishtank. Both bands playing an all original set although Golden Section did cover a song by Fishtank and Fishtank had a song without lyrics but found by accident that the lyrics to the Beatles' "Lucy in the Sky with Diamonds" worked, despite our melody being different.
Recently one of my bands mates in Golden Section posted a video of Golden Section playing live at the Palais Hotel in Newcastle (Australia) in 1993. The first real glimpse of me playing bass is at 1:10.
Playing in Golden Section was just so much fun! Working up to the 1991 Newcastle University Band Competition (which we won) we were rehearsing twice a week and as a result reached a state where the band synced like clockwork. We could play light and reasonably heavy (Fishtank was heavier) and we were even a bit funky on certain songs.
I absolutely loved playing in this band. Good times, good memories.
Posted at: 22:04 | Category: Music | Permalink
Thu, 07 Jan 2010
In Search of the Linux Laptop.
I'm shopping for a laptop to run Linux on and I'm finding it a really frustrating process. I would like a high end, small form factor machine like the Dell Studio XPS 13. I've seen one of these machines in the real life and held it in my hands. It seems to be excellent quality and have the right features at a reasonable price. Unfortunately, on the Dell Australia web site, there is no option to purchase this machine with Linux pre-installed.
My current laptop is a Dell Latitude X1, and my wife has a Dell too. At work I have a Dell Precision workstation. All of these machine run Linux 100% of the time and for every single one of these machines I had to pay for a Microsoft Windows license that was never used. As most people know, Microsoft uses a small portion of its revenue to fund attacks on Linux and Free/Open Source Software like the SCO debacle and the ramming of OOXML through the ISO standards process.
Considering Microsoft's malignant presence in the computer industry and the fact that I don't use their products makes me reluctant to buy a machine with Windows pre-installed. I want to buy a laptop where I get my operating system of choice and just as importantly, I know Microsoft doesn't get any part of the money I pay.
Looking around for laptops with Linux pre-installed in Australia I found VG Computing who have a range of laptops which can be shipped with Linux. Unfortunately, their order page says that they get the machines with Windows which they remove to install Linux. I do realise that as a small vendor, there is not much that VG can do about this, but as far as I am concerned, thats a fail.
Another company thats been around for ages is Pioneer Computers but their machines always seemed a bit old, a bit under powered and a bit over priced. Looking at that site just recently I found their DreamBook Light M73 which was close to what I was looking for. Ordering machines from Pioneer with Windows XP costs $89 more than the same machine with Ubuntu. Purchased with Linux, this is genuinely a Microsoft free machine.
Taking a trip out to Pioneer in Alexandria I was able to see one up close and I must say I was disappointed. Compared to the Dell XPS 13, the DreamBook felt flimsy and poorly constructed. On top of that the DreamBook had a smaller keyboard (much like the Compaq M300 I had years ago) than than the Dell and SiS graphics whereas I was really hoping for Nvidia. Another machine crossed off the list.
In the US, there are a number vendors that sell laptops with Linux. In late November 2009 I contacted two of them, System 76 and ZaReason Inc. System 76 wasn't willing to ship to Australia for warranty reasons. ZaReason will ship to Australia, but to take advantage of any warranty work, I'd have to ship the machine to and from the US which would be rather inconvenient.
Despite the concerns over warranty, by early December 2009 I had come to the decision to purchase a ZaReason Alto 3550 with a bunch of extras like faster CPU, more RAM and a bigger disk. Unfortunately, while trying to purchase it online I ran afoul of their payment system and by the time we'd figured that out they had run out of stock.
I contacted them and asked about the ETA for new stock. They said, Dec 19th, which came and went with no new stock on the website. I contacted them again and was told another week. A week passed with no new stock so I contacted them again when I was informed that their distributor doesn't have any more of the Alto 3550 available. I would go for the Alto 2550 but that has an Intel graphics chip whereas I was hoping for an Nvidia like in the Alto 3550.
So now I have to decide, do I go with the ZaReason Alto 2550 with the Intel chip and then worry about warranty issues or do I buy the Dell locally, where I know that the support and service will be excellent and then try to get a refund for the Windows license I don't need or want.
Posted at: 22:14 | Category: Tech | Permalink
Sun, 03 Jan 2010
Debugging Memory Leaks in a GTK+ House of Cards.
Recently I've been hacking on Conrad Parker's sound editing, audio mangling and DJing tool Sweep. As part of my bug fixing and clean up work I ran Sweep under the Linux world's favourite memory debugging tool Valgrind. Even after running valgrind with the officially sanctioned environment variables and gtk suppressions file, the resulting 500k gzipped output file was a little shocking.
Now I'm pretty sure a number of those leaks are in Sweep, but a significant number of them seem to be in GTK+ and Glib. Since trying to differentiate the leaks in Sweep from the leaks in GTK+ was proving to be a very difficult and frustrating task I decided to look at the behaviour of a simple GTK+ program under Valgrind. The program I chose was the helloworld example from the GTK+ tutorial.
Compiling that on Ubuntu 9.10 and running it under valgrind using the following commands:
export G_SLICE=always-malloc
export G_DEBUG=gc-friendly
valgrind --tool=memcheck --leak-check=full --leak-resolution=high \
--num-callers=50 --show-reachable=yes --suppressions=gtk.suppression \
helloworld > helloworld-vg.txt 2>&1
resulted in a memcheck summary of:
==22566== LEAK SUMMARY: ==22566== definitely lost: 1,449 bytes in 8 blocks ==22566== indirectly lost: 3,716 bytes in 189 blocks ==22566== possibly lost: 4,428 bytes in 107 blocks ==22566== still reachable: 380,505 bytes in 7,898 blocks ==22566== suppressed: 35,873 bytes in 182 blocks
The full memcheck report is available here.
The simplest GTK+ hello world program is 100 lines of code and results in a leak report of over 8000 leaked blocks even when using the recommended valgrind suppressions file and GTK+ debugging environment variables. If someone modifies that code and adds another leak, trying to find that leak needle in the GTK+ leak haystack is going to be a needlessly difficult task.
Researching this some more I find that GTK+ is known to do a large number of allocations that are done once per program run and are never released. Furthermore the GTK+ developers seem to think this is ok and from the point of view of a user running a GTK+ program this is true. However for developers coding against GTK+ and hoping to use Valgrind to find leaks in their own code, this is a royal PITA. Leaks in the developer's code can easily be swamped and hidden by GTK+ leaks. My guess is that most people don't even bother checking unless their program's memory footprint grows over time for no good reason.
Obviously, I'm not the first to realise how hard it is too debug memory leaks in a program when the library it links against throws up so many warnings. In fact, back in 2001 a bug was raised in the GTK+ bug tracker requesting the addition of a call to be used only during debugging that would release all memory so that client programs are easier to debug. That bug has remained open and without action for over 8 years.
As far as I am concerned, this is completely unacceptable.
If this was my code, I would be too ashamed to put my name on it.
Edit: Being able to valgrind GTK+ client code is worth the effort and
cost of changing the otherwise perfectly reasonable behaviour of not accounting
for lifetime-of-the-app data structures (thanks Andrew).
Note: Anyone who wishes to comment on this can do so on reddit.
Posted at: 11:38 | Category: CodeHacking | Permalink
Sat, 05 Dec 2009
FP-Syd #21.
Two weeks ago, on November 19th we held the 21st meeting of FP-Syd, the Sydney Functional Programming group. The meeting was held at Google's Sydney offices and we had 22 people show up to hear our two presenters.
First up we had repeat offender Mark Wotton give us a presentation titled "Hubris : A Trojan Horse for Haskell" on his project called Hubris, a bridge between Ruby (ie Rails) and Haskell. The specific use case Mark has in mind for this bridge is call fast compiled Haskell code from Ruby web code. The code this is is available at:
git clone git://github.com/mwotton/HaskellHubris.git
but requires the 6.12 release candidate of the GHC Haskell compiler.
Our second presenter for the evening was Tony Sloane who gave us an introduction to "Embedding a Rewriting DSL in Scala". Tony's DSL of interest is the Stratego Language which he is embedding in Scala using the Kiama library. Tony's particular interest seems to be in program transformation; desugaring of high level languages, evaluation by reduction rules, optimisation and source to target translation.
A big thanks to both our speakers and to Dan, Mark and Google for providing the meeting venue and the light refreshments.
FP-Syd will be having brief hiatus over the Xmas/New Year period and our next meeting will be in February.
Posted at: 16:19 | Category: FP-Syd | Permalink
Tue, 17 Nov 2009
DDC : Man or Boy?
Computer scientist Donald Knuth came up with something he called the Man or Boy Test as a way of evaluating implementations of the ALGOL60 language (standardized in 1963) to distinguish compilers that correctly implemented "recursion and non-local references" from those that did not. Knuth said:
"I have written the following simple routine, which may separate the 'man-compilers' from the 'boy-compilers'."
My first attempt at solving this problem in Disciple resulted in me raising bug #148 in the DDC bug tracker with the following code:
-- Compiler needs a little help inferring the types.
a :: Int -> a -> a -> a -> a -> a -> Int
a k x1 x2 x3 x4 x5
= do b () = do { k := k - 1 ; a k b x1 x2 x3 x4 }
if k <= 0 then x4 () + x5 () else b ()
fn n = \() -> n
main () -- Function 'a' should return -67
= do out = a 10 (fn 1) (fn -1) (fn -1) (fn 1) (fn 0)
if out /= -67
then println $ "Output was " % show out % ". Should have been -67."
else println "Passed!"
Fiddling around with the problem a bit, I suddenly realised that the Disciple language has call-by-reference semantics by default (by way of contrast, the C programming language has default call-by-value semantics with optional call-by-reference semantics using pointers).
While chatting with Ben on IRC he suggested using a copy to create a local copy of the function parameter that gets mutated so that mutation doesn't change the value outside call frame.
Here are two correct solutions to the Man or Boy problem:
a0 :: Int -> a -> a -> a -> a -> a -> Int
a0 k x1 x2 x3 x4 x5
= do b () = do { k := k - 1 ; a0 (copy k) b x1 x2 x3 x4 }
if k <= 0 then x4 () + x5 () else b ()
a1 :: Int -> a -> a -> a -> a -> a -> Int
a1 k x1 x2 x3 x4 x5
= do m = copy k
b () = do { m := m - 1 ; a1 m b x1 x2 x3 x4 }
if k <= 0 then x4 () + x5 () else b ()
fn n = \() -> n
main ()
= do out0 = a0 10 (fn 1) (fn -1) (fn -1) (fn 1) (fn 0)
out1 = a1 10 (fn 1) (fn -1) (fn -1) (fn 1) (fn 0)
println "All outputs below should be equal to -67."
println $ "Output 0 : " % show out0
println $ "Output 1 : " % show out1
Both of these Disciple solutions are significantly less complex than the equivalent Haskell solution.
While I have no problem with function parameters being passed by reference, I don't think its a good idea to have those parameters being mutable by default (ie with the values also changing in the calling function).
I need to play with this some more.
Posted at: 22:03 | Category: CodeHacking/DDC | Permalink
Sun, 15 Nov 2009
Hacking DDC.
Over the last couple of months I've been doing a bit of hacking on an experimental compiler called DDC. This has been some of the most interesting, gratifying and challenging hacking I have done in years. Having this much fun should probably be illegal!!
I was introduced to DDC at the April 2008 meeting of FP-Syd when Ben Lippmeier, its author, gave a presentation titled "The Disciplined Disciple Compiler". The two main reasons this compiler is interesting are:
- Its written in Haskell an advanced purely functional programming language.
- The language it compiles (Disciple) has some interesting solutions to the problems of side effects and mutability.
The Disciple language is very Haskell-like but has some extra features in the type system which allows the compiler to track mutability and side effects in the type system. The important differences between the Disciple language and the Haskell language are listed on the DDC web page as:
- Strict Evaluation Order is the default, laziness is introduced explicitly.
- Type directed Field Projections complement type classing.
- All data objects support Destructive Update.
- The Effect System tracks what computational effects are being used in a program, without the need for state monads.
- The Class System ensures that effects and destructive update play nicely with laziness.
- Closure Typing is used to track data sharing, and to preserve soundness in the presence of Polymorphic Update.
Obviously a compiler that is doing all this really clever stuff has to be pretty complicated, but it still only weighs in at about 50k lines of code.
The main challenge in working on this is that i am not a very experienced Haskell programmer. There are also large chunks of the compiler doing some very complicated stuff that I don't even have a hope of understanding without reading and understanding Ben's PhD thesis.
Despite that, Ben was willing to give me commit access to the Darcs repo and I have been able to significantly reduce the number of bugs in the DDC bugtracker. Since I was already pretty familiar with the concepts of lexing and parsing as well as being familiar with Parsec (probably the most widely used parsing tool in the Haskell community) I started off fixing some simple lexer and parser bugs like:
- #91 : Require module imports (and exports) to be at the start of the module.
- #95 : Parse error with lists.
- #96 : ellipsis in list generator expressions not comprehensive enough.
- #97 : Error in parsing end of {- -} comments.
- #103 : Not able to parse 'a, b, c :: Type' style type signatures.
I then managed to hack in support for Int64 and Float64 (#106) followed by some significant re-factoring of the Parsec parser which reduced the usage of the Parsec.try construct allowing Parsec to produce much better error messages.
Once I'd done all that, I ran into a very busy time at work and didn't mess with DDC for a couple of months. When I finally got back to looking at DDC, I realised that nearly all of the remaining bugs were much deeper than the bugs I had tackled so far. Tackling these deeper bugs required a new strategy as follows:
- Scan the bug list for reports that either had test cases already or give enough information for me to proceed.
- Create a new darcs branch for each bug. This allowed me to work on multiple different bugs at once so that if I got stuck on any one specific bug, I could just leave it and move on to another.
- Create a reproducible test case if one didn't exist already.
- Create a shell script in the root directory of each branch which did make and then ran the specific test case for this specific bug.
- Use Haskell's Debug.Trace module in conjunction with Haskell's very wonderful Show Type Class to add debug statements to the code.
- Use the Wolf Fencing debugging technique to narrow down the problem to specific area of the code.
Once the problem had been narrowed down to a piece of code, all that remained was to develop a fix. In many cases this resulted in me asking Ben how he'd like it fixed, either in email or on IRC. I also often came up with an ugly fix at first which was refined and cleaned up before being applied and pushed upstream.
With the above methodology I was able to fix a number of deeper and more complex bugs like the following:
- #33 : Check for conflicting projection functions.
- #39 : Emit an error if modules are recursive.
- #42 : Support unboxed CAFs
- #45 : Better error message for runtime pattern match failure.
- #53 : Check for name shadowing in forall quantifiers.
- #58 : Panic in type inferencer.
- #71 : Better error message for unimplemented class functions.
- #77 : crushProjClassT panics when there are type errors.
- #78 : Renamer problems in data type defs.
- #144 : Need better error message when source file does not exist.
I'm now getting a pretty good idea of how the compiler is put together and I'm stretching my hacking into feature enhancements.
My enthusiasm for DDC was recently validated by functional programming guru Oleg Kiselyov's comment on the haskell-cafe mailing list:
"One may view ML and Haskell as occupying two ends of the extreme. ML assumes any computation to be effectful and every function to have side effects. Therefore, an ML compiler cannot do optimizations like reordering (even apply commutative laws where exists), unless it can examine the source code and prove that computations to reorder are effect-free. .....
Haskell, on the other hand, assumes every expression pure. Lots algebraic properties become available and can be exploited, by compilers and people. ....
Hopefully a system like DDC will find the middle ground."
Anyway, back to hacking ....
Posted at: 21:28 | Category: CodeHacking/DDC | Permalink
Sun, 25 Oct 2009
FP-Syd #20.
On October 22th we held the 20th meeting of FP-Syd, the Sydney Functional Programming group. The meeting was held at Google's Sydney offices and we had about 28 people show up to hear our two presenters.
First up we had Roman Leshchinskiy and his presentation "Loop Fusion in Haskell", work that is part of GHC's Data Parallel Haskell library. Loop fusion depends on the ability to convert operations on arrays into operations on streams. Then, when applying multiple stream operations, adjacent conversions to and from streams can be dropped, allowing further inlining. The real beauty of this approach is that stream operations and data parallelism can be written as a library outside of the GHC compiler and then depend on the compiler to do most of the heavy lifting. Romain then moved on to explain that this work was just as applicable to general parallel computation on multicore systems as it was to clusters and GPUs.
Our second presenter for the evening was Barry Jay who gave us an introduction to "Pattern Calculus". Barry's work on Pattern Calculus was inspired by the fact that while lambda calculus is able to adequately explain computation, it does not explain operations on data well. In particular, lambda calculus does not distinguish between variables and data constructors while in the pattern calculus constructors are treated as a separate class; matchable symbols. The ideas behind Pattern Calculus are explained more fully in Barry's book:
A big thanks to both our speakers and to Shane Stephens and Google for providing the meeting venue and the light refreshments.
Posted at: 17:49 | Category: FP-Syd | Permalink
Tue, 20 Oct 2009
The Stupidity of Fat Elf (was OSX Universal Binaries).
Early in 2008 I drafted a blog post about something on Mac OS X called Universal binaries. Now the same stupid idea is available for Linux, and planned for other ELF systems. Its called Fat Elf.
The original post follows.
As the main author and maintainer of libsndfile I've recently been involved in a number of email exchanges along the lines of this:
OS X User : Is it possible to build a Universal Binary of libsndfile? Me : No, please see the FAQ. OS X User : Autoconf sucks! Me : Even if you use Xcode, the problem of testing remains. OS X User : Other projects can build Universal Binaries, why can't yours?
to which I should probably respond "Have you stopped beating your wife/ girlfriend yet?".
For those lucky enough to be blissfully unaware of the issue, Universal Binaries are Apple's solution to the marketing problem that arose from their decision to ditch the PowerPC processors it had been using in its machines in favour of CPUs from Intel. Obviously, this change of CPU would have an impact on users and Apple chose a two pronged approach to ease the migration of its users from one CPU to another; Rosetta a PowerPC emulator which allows PowerPC code to run on Intel based Macs, and the idea of the Universal Binary (UB), a way of packaging binary code for PowerPC and Intel into a single monolithic file so that the OSX operating system can run which ever is appropriate for the CPU it is being run on.
Building Universal Binaries on Mac OS X using Apple's own development toolset Xcode is apparently quite trivial. Under the hood, Xcode uses a version of the GNU GCC compiler which Apple hacked to allow it to take in a single input source code file, pass it through both an Intel and a PowerPC back-end and then generate a single binary containing both Intel and PowerPC executable code.
Apple has obviously gone to a lot of trouble to make the process easy. So easy, that the average OSX developer expects it to just work, without realising exactly what it is they are doing and even more importantly, without realising that the process itself contains one very large flaw (which I'll get to shortly).
Building Universal Binaries of FLOSS (Free/Libre/Open Source ...) software is a whole different matter. A large majority of FLOSS software uses things like scons or autoconf and its related tools to detect characteristics of the target platform so that the source code can be compiled correctly for the target.
The problem is, that people who know that the Xcode version of GCC can generate a universal binary from one pass over a source code file expect to take a project like libsndfile, set the CFLAGS environment variable, run configure and build a working universal binary. Anyone who knows autoconf can guess that this is simply not going to work. The main output of the configure script generated by autoconf is a header file containing definitions to handle the detected features of the CPU and operating system. The problem is that the configure script is only run once and the characteristics detected on that run are then used to compile both the Intel and the PowerPC version of the executable code.
The result of the above is that people developing FLOSS like myself get emails from OSX users complaining that their UB version of libsndfile doesn't work. Since I first heard of Apple's Universal Binaries back in early 2006 I have personally spent over 50-100 hours trying to come up with a solution to an issue that only affects an operating system that I personally am not very interested in. Thats a bunch of hours that I could have spent in much more enjoyable pursuits.
Now, consider how many other FLOSS project would have been hit by similar problems. So lets say 10 hours per project across a thousand projects. Thats one hell of a lot of developer time that could have been spent doing more useful things. Why didn't Apple foresee this? Why didn't Apple spend some of its resources trying to mitigate the effect of UB on FLOSS projects. Why didn't Apple devote say 500 man hours of its developer's time to help ease the transition of FLOSS projects to UB; 500 man hours of work on autoconf and automake would likely have gone a long way.
The difficulty of generating UB for code using the autotools is not even the biggest problem with UB. What I see as the biggest problem is the huge green elephant standing in the corner that no one wants to talk about; testing (maybe that should have a blink tag).
For nearly a decade, testing, especially reliable, repeatable, automated testing has been considered an important part of the process of creating reliable software. Now Apple is encouraging the building of Universal Binaries, but I for one am very curious as to how these are being tested. I can think of the following possibilities:
- Ad-hoc and manual testing. Obviously this is a poor substitute for automated testing.
- Automated testing relying on copying test binaries to another machine and running the tests for the other architecture there. This requires easy access to the other machine and considerable effort to make the tests work.
- Automated testing relying on Rosetta. Like the previous option this requires considerable effort to make the tests work.
Given that all these options are difficult it is highly likely that testing falls by the wayside. In short Apple is effectively encouraging the release of untested binaries to users. When these untested binaries go wrong, who do users blame? The authors of that software even when those FLOSS authors are not Mac users and have little or no interest in Apple and its operating system.
Furthermore, Apple is about to add another spanner to the works; quad binaries. Quad binaries are like Universal Binaries that contain code for both 32 bit and 64 bit versions of PowerPC and Intel. Quad binaries will be even more difficult to test, more difficult to get right and still more pain for FLOSS developers.
Posted at: 20:53 | Category: CodeHacking | Permalink
Mon, 19 Oct 2009
FP-Syd #19.
On September 17th we held the 19th meeting of FP-Syd, the Sydney Functional Programming group. The meeting was held at Google's Sydney offices and we had about 18 people show up to hear our two presenters.
First up we had Raphael Speyer and his presentation "Rhino, ES5 & GSoC". Raphael explained how he was accepted into the Google Summer of Code project to work on bring the Rhino, a Javascript engine that runs on the Java Virtual machine, into line with the upcoming EcmaScript 5 standard.
For the second presentation we had Shane Stephens one of the Google people working on Google Wave giving us an overview of Google's Operational Transform algorithm which is at the core Google Wave. He then walked us through a reimplementation of the algorithm in Haskell.
A big thanks to Shane Stephens and Google for providing the meeting venue and the light refreshments.
Posted at: 19:46 | Category: FP-Syd | Permalink
Tue, 13 Oct 2009
FP-Syd #18.
Way back on August 23rd we held the 18th meeting of FP-Syd, the Sydney Functional Programming group. As usual, the meeting was held at Google's Sydney offices and we had about 24 people attend to hear our two presenters.
First up we had Mark Wotton and his presentation "Testing By Convention and Flow". TBC ( on Hackage) is a harness for running tests written with HUnit or Quickcheck. The main idea is that if your tests are written to follow a set of conventions, a lot of the boiler can be skipped used TBC.
The second presentation was by Ben Lippmeier on his work on getting Haskell's GHC compiler working on SUN's OpenSparc T2 processor. The OpenSparc T2 is interesting because it has 8 cores per processor and 8 hardware threads per core and hence is an interesting target for GHC's parallel evaluation model.
A big thanks to Shane Stephens and Google for providing the meeting venue and some light refreshments.
Posted at: 19:55 | Category: FP-Syd | Permalink
Sun, 30 Aug 2009
Here's Alfie.
Here's the latest addition to our family, Alfie, a 2 year old long haired Chihuahua.
Alfie's previous owners had recently moved and were not able to keep him in their new apartment so he has come to live with us. He's already managed to find the spot on the floor which is nice and sunny in the morning. He seems to be thoroughly enjoying it in this picture.
Posted at: 10:38 | Category: NewHouse | Permalink
Tue, 18 Aug 2009
FP-Syd #17.
On Thursday July 23rd we held the 17th meeting of FP-Syd, the Sydney Functional Programming group. The meeting was held at Google's Sydney offices and we had about 28 people attending to hear our two presentations.
First up we had Jonathan Lange with a presentation titled "Monads and Twisted" where he introduced the asynchronous event handling in Python's Twisted Framework. During a brief introduction to Twisted he showed how callbacks are registered for future events. He then went on to show how the function signatures for these callbacks were very similar to those for Monadic bind and return and that these callbacks obeyed all three of the monad laws.
Next up we had Paul Steckler demonstrating how he has been using Ocaml's camlp4 macro pre-processor in his work on the Goanna static analysis tool. Paul mentioned that documentation for camlp4 is rather poor and that to figure out some things he actually needed to read the camlp4 source code. It does however seem to be a powerful pre-processor system.
A big thanks to James Kozianski and Google for providing the meeting venue and some light refreshments.
The next meeting of FP-Syd happens this week.
Posted at: 18:57 | Category: FP-Syd | Permalink
Wed, 01 Jul 2009
Three More for the Debian New Queue.
Over the last couple of weeks I've managed to get three new packages into the Debian NEW queue :
- haskell-polyparse - A variety of alternative parser combinator libraries for Haskell (this is a dependency for later versions of HaXml).
- haskell-dataenc - A Haskell library of data encoders and decoders like Base64, uuencoding etc.
- haskell-json - Haskell library for serialising data to and from JSON.
Thanks to Simon Horman for sponsoring/uploading the first two of and Matt Palmer for sponsoring/uploading haskell-json.
Posted at: 18:32 | Category: CodeHacking/Debian | Permalink
Sun, 21 Jun 2009
FP-Syd #16.
On Thursday June 18th we held the 16th meeting of FP-Syd, the Sydney Functional Programming group. The meeting was held at Google's Sydney offices and we had about 25 people attending to hear our two presentations.
This month we tried something a little different, 5 minute lightning talks. We had four lightning presenters :
- Jeremy Apthorp, "Closures in C" - Jeremy explained how the Clang frontend for LLVM had added lexical closures to the C programming language.
- Jeeva Suresh, "Functional Programming Warps your mind - Quantified"- Jeeva quantified exactly how programming in functional languages changes a programmer's approach when programming in other languages.
- Mark Bradley, "Cycling Josephus for Speed" - Mark show how a naive approach to the Josephus Problem in Haskell would, for some example parameters, exhaust all memory. Mark showed a better solution that satisfied the problem but still consumed a large amount of memory. Interestingly, Benjamin Johnston posted a solution to this problem in SWI Prolog that performed very well in comparison to the Haskell version.
- Ben Lippmeier, "The Poisoning Problem" - Ben showed us an example of the poisoning problem which had cropped up in his DDC compiler where a value might need to be mutable or immutable depending on the value of an if statement.
Interestingly, none of the presenters managed to stay within their allotted 5 minutes but the time limits were not enforced.
Our main speaker of the evening was Eric Willigers who gave us an excellent introduction to the Clojure language, a LISP like language for the Java Virtual Machine. Eric started with the main data structures; lists, vectors, maps and sets, moved on to functions and macros before explaining Clojure's concurrency primitives.
All in all this was another most inspiring and enjoyable meeting. Thanks to all our speakers as well as Shane Stephens and Google for providing the venue.
Posted at: 13:03 | Category: FP-Syd | Permalink
Mon, 15 Jun 2009
Two More for the Debian New Queue.
I've got two new packages in the Debian NEW queue :
- haskell-safe - Library for safe (pattern match free) functions.
- haskell-uniplate - A Haskell library for uniform type generic traversals.
These two libraries are obviously useful in their own right, but I was particularly interested in packaging these because they are build dependencies for hoogle which I'm currently working on packaging.
Thanks to Simon Horman for sponsoring these uploads and the previous ones.
Posted at: 22:31 | Category: CodeHacking/Debian | Permalink
Tue, 09 Jun 2009
Debian Maintainer.
With the closing of Debian bug #531811 I am now officially a Debian Maintainer.
Although I've been using Debian and related distributions for nearly a decade, for most of that time I've been pretty happy just being an upstream developer and a downstream user.
That changed when I started learning and using Haskell some 6 months ago. After using Ocaml on Debian for about four years I had grown spoilt by the work of the Ocaml Debian Maintainers who do a stellar job in keeping the everything Ocaml related in Debian up to date. Any time I needed a new Ocaml library, it was nearly always just an apt-get install away.
Unfortunately Haskell on Debian is in a poor state in comparison to Ocaml. My main aim in becoming a Debian Maintainer is to improve Haskell on Debian. First of all, I will be packaging Haskell libraries. So far I have make Debian packages of the following three Hackage packages :
- haskell-curl - Haskell bindings for libcurl (already in Debian unstable).
- haskell-bzlib - Haskell bindings for libbz2 (still in the new package queue).
- haskell-unixutils - An interface between Haskell and Unix-like operating systems (still in the new package queue).
I have also joined the Debian Haskell Mailing List and hope to help out on packaging and improving the compiler version transition process.
Posted at: 19:52 | Category: CodeHacking/Debian | Permalink
Fri, 05 Jun 2009
Jim Henson Co.
I received an email this morning:
From: David XXX <xxx@creatureshop.com> To: <erikd@xxx.com> Subject: libsndfile usage Date: Thu, 4 Jun 2009 10:52:02 -0700 User-Agent: Mozilla-Thunderbird 2.0.0.19 (X11/20090103) Erik, just thought I would mention we have been using libsndfile internally at the Jim Henson Co. for a while, we recently got approval to open source some of our utilities which mostly use libsndfile to read in wav files for encoding into a quicktime file, using libquicktime. We released them as usage examples and extras for libquicktime. http://libquicktime.cvs.sourceforge.net/viewvc/libquicktime/lqt_utils/ -- David XXX Pipeline Tools Programmer Jim Henson Creature Shop xxx@creatureshop.com
This is so cool.
Of course Jim Henson and Company are probably best known for the Muppets and Sesame Street but a couple of years ago my then four year old daughter absolutely loved watching Bear in the Big Blue House.
In a subsequent email David explained that the Bear program was before his time, and that libsndfile was being used as part of the render process in the production of Sid the Science Kid which I don't think has reached this country yet.
Posted at: 21:06 | Category: CodeHacking/libsndfile | Permalink
Sun, 24 May 2009
FP-Syd #15.
On Thursday May 21st we held the 15th meeting of FP-Syd, the Sydney Functional Programming group. The meeting was held at Google's Sydney offices and we had about 22 people attending to hear our two presentations, an Erlang double header.
The first presentations was by Ben Johnston, a PhD candidate at University of Technology, Sydney. Ben's talk was titled "How to build a robotic bear in two weeks (or less)." and in it he explained how the need to get some code running on a new robot quickly led him to try Erlang. He explained how he used C for low level control of hardware devices, SWI Prolog for the AI and Erlang for all the glue to hold it all together.
Next up we had Matt Palmer talk about the Erlang database called Mnesia. Matt showed the close correspondence between Erlang queries and their equivalent in SQL. He also talked about the ease of replications, sharding and in-memory databases in Mnesia.
A big thanks goes to Ben and Matt for their presentations Thanks also to Shane Stephens and Google's Sydney office for making their facilities available.
Posted at: 21:23 | Category: FP-Syd | Permalink
Mon, 18 May 2009
FP-Syd #14.
Some time ago, on Thursday April 16th, we held the 14th meeting of FP-Syd, the Sydney Functional Programming group. The meeting was held at Google's Sydney offices and we had about 25 people attending to hear our two presenters.
The first presentations was by Tom Davies titled "CAL/OpenQuark and Java" about CAL and Haskell 98 like language for the Java Virtual machine. CL was released as open source under a BSD-like license as OpenQuark in 2007. Tom gives a quite comprehensive list of the differences between between CAL and Haskell in his slides and a PDF file of the development history of Quark in a supplementary PDF. The main problem with CAL/OpenQuark is that it is effectively abandon-ware and that no-one is actively maintaining it.
The second presentation was by Simon Winwood and was titled "Program extraction from the Coq Theorem Prover". I've been trying unsuccessfully to contact Simon to get hold of his slides. I'll update this when I do finally have a link.
Update 2009/06/05 : Simon's slides are now available here.
A big thanks goes to Shane Stephens and Google's Sydney office for making their facilities available. Thanks also to Tom and Simon for their presentations.
Posted at: 19:15 | Category: FP-Syd | Permalink
Sun, 17 May 2009
Suspend to Disk in Jaunty.
I upgraded my laptop to Ubuntu 9.04 early in the beta release phase and my only disappointment with this release was that suspend to disk, which had worked so well in Hardy and Intrepid was broken in Jaunty. I even raised a Launchpad bug about it.
Unfortunately it seemed that nothing was being done about this bug until another user Mathew Mueller posted a solution to the bug report.
I am now a happy camper. Thanks Mathew and thanks Ubuntu.
Posted at: 17:38 | Category: Tech | Permalink
Thu, 14 May 2009
libsndfile 1.0.20.
There's a new release of libsndfile available in the usual place. This is a security bug fix release which fixes a potential heap overflow in VOC files found and reported by Tobias Klein ( http://www.trapkit.de/ ) and another in the AIFF file parser found by me.
I am also making available patches for older versions of libsndfile:
Hopefully the next release will contain new features instead of just bug fixes.
Posted at: 20:53 | Category: CodeHacking/libsndfile | Permalink
Thu, 02 Apr 2009
FP-Syd #13.
Just today, Richard Collins reminded me that I forgot to blog the last FP-Syd meeting. I'm correcting that now.
Last Thursday (March 26th) was the 13th meeting of FP-Syd, the Sydney Functional Programming group. The meeting was held at Google's new offices in Sydney and we had about 35 people attending to hear our two presenters.
The first presentation was by Simon (Horms) Horman, who volunteered to give a presentation the morning of meeting after our scheduled speaker pulled out due to illness. Horms' presentation was titled "Using Haskell to Write a Very Small Portion of Xen". He started out by explaining how he is relatively new to Haskell and that most of his coding is actually in C, for the Xen hypervisor. Xen hypervisor Advanced Configuration and Power Interface (APCI) which is programmed in a very simple C like language called ACPI Source Language (ASL). Programming ASL is a rather repetitive and error prone task, so Horms' solution was to write a simple Haskell program to generate the ASL code. Horms' slides, Haskell code and output ASL code is available in this tarball .
The second presentation was by Mark Wotton on his attempts to improve predictive text entry on mobile phones. Mark explained how he wanted to improve predictive text by using the previous three words instead of just the current word. He chose Haskell for its rapid prototyping capabilities and explained some of the difficulties he had with performance, especially with large corpora and how he got around these problems.
Mark has put up his slides up here and even has a demo available on the web.
A big thanks goes to Shane Stephens and Google's Sydney office for making their facilities available. Thanks also to Horms and Mark for their presentations.
Posted at: 21:36 | Category: FP-Syd | Permalink
Tue, 03 Mar 2009
libsndfile 1.0.19.
There's a new release of libsndfile available in the usual place. There is also a back story about this release.
At about the same time as my blog post entitled "Security Hyperventilating" I released version 1.0.18 of libsndfile. A couple of days later I received an email from Alin Rad Pop of Secunia Research (the company who got it right last time) informing me of a real, potentially exploitable bug in my newly released code. I've been told that this security vulnerability will be made public over the next couple of days as CVE-2009-0186.
While I certainly didn't believe my code was bug free I have worked very hard to reduce the bugs as much as possible. I have a full and rather comprehensive test suite, I run valgrind over the code regularly, I have learnt from past mistakes and I when I find one bug I nearly always take the time to search for other instances of the same class of bug in the code.
I even wrote a program to do automated Fuzz Testing. This program takes an existing sound file, modifies it, writes it to disk and then runs it through libsndfile. If libsndfile segfaults or does anything else wildly wrong, the problem file is saved for later review. That review usually results in a bug fix.
In spite of all this testing, there was still a security vulnerability. Thats mainly because libsndfile is one of those projects that needs to parse untrusted binary data. Remember all those exploitable bugs against libjpeg and libgif in the late 1990s? Well both of those projects only parse one file type; libsndfile parses more than a dozen. The only other project I can think of with large numbers of different things to parse is the Samba Project which needs to parse dozens of different kinds of messages in the CIFS protocol.
Of course there are other tools to for finding bugs; static analysis tools. In this field there are FOSS products like Sparse and Splint. The first is rather new and developed specifically to find bugs in Linux kernel source code and the second is basically unmaintained. Both are rather intrusive in that they require special annotations to help them ignore valid code and find buggy code.
Neither of these FOSS programs compare well against commercial offerings like Coverity's Prevent but Coverity is widely regarded as being rather expensive. However, for FOSS projects, Coverity does have a program where it scans FOSS projects and feeds the scan results back to the projects so they can fix bugs.
libsndfile was added to Coverity scan of FOSS projects well over a year ago. I fixed all of the issues (mostly minor) pretty much immediately and then asked to go on to the next rung of the ladder. Unfortunately, I was told that I had to wait for all the projects that were currently on rung 0 to fix their issues before that could happen. Eventually, that did happen, but I don't remember anyone contacting me about it. This slow progress was frustrating.
However, on hearing that there was a CVE about to be published against libsndfile I was fortunate enough to have someone offer to run libsndfile through one of the commercial static analysis tools. The result of that was a report containing 68 warnings, split into four roughly equally sized groups:
- False positives.
- Warnings about header files being included un-necessarily.
- In test suite code not used in the library itself.
- Real bugs; resource leaks, dead code, reading beyond the end of arrays, off-by-one errors, not handling error return values, bad free calls etc.
The interesting thing about the above test was that the bug that resulted in the CVE was still present in the code analysed by static analysis tool, but was not found. So while this tool did find a bunch of errors it is still not able to find every error. Obviously there is room for improvement here, both in my code and in the static analysis tool.
Posted at: 19:53 | Category: CodeHacking/libsndfile | Permalink
Tue, 24 Feb 2009
Calling from C into Ocaml.
I've got a project where it would be nice to be able to call Ocaml code from a C program. Although interfacing Ocaml and C is covered in the official manual and the O'Reilly Ocaml book, neither of these sources have a complete example.
As a firm believer in the idea that a 100 lines of working code is worth a thousand lines of explanatory text in a book or on the web, I thought I'd put together a small but complete example. First off, here is the Ocaml code (download ocaml-called-from-c.ml):
let ocaml_puts name =
Printf.printf "Program name is '%s'.\n" name ;
(* Must flush stdout before returning to C. *)
flush stdout
let ocaml_string_join join arr =
(* Create and return a string. *)
String.concat join (Array.to_list arr)
(* On program initialisation, register functions to be called from C. *)
let () =
Callback.register "ocaml_puts" ocaml_puts ;
Callback.register "ocaml_string_join" ocaml_string_join
There are two functions that will be called from C, ocaml_puts and ocaml_string_join and both functions must be registered as callbacks with the Ocaml runtime using Callback.register. To find the function signatures of these functions we can use the ocamlc program:
prompt > ocamlc -i ocaml-called-from-c.ml val ocaml_puts : string -> unit val ocaml_string_join : string -> string array -> string
The first function has a single string parameter and returns nothing, while the second takes two parameters, a string and an array of strings and returns a string.
The C program which calls these two functions looks like this (download c-main-calls-ocaml.c):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <caml/alloc.h>
#include <caml/mlvalues.h>
#include <caml/memory.h>
#include <caml/callback.h>
static void
call_ocaml_void (const char * name)
{ CAMLparam0 () ;
CAMLlocal1 (ostr) ;
ostr = caml_copy_string (name);
value * func = caml_named_value ("ocaml_puts") ;
if (func == NULL)
puts ("caml_named_value failed!") ;
else
caml_callback (*func, ostr) ;
CAMLreturn0 ;
} /* call_ocaml_void */
static void
call_ocaml_string (char * join, char const ** argv)
{ CAMLparam0 () ;
CAMLlocal3 (ojoin, oargv, ores) ;
ojoin = caml_copy_string (join);
oargv = caml_alloc_array (caml_copy_string, argv) ;
value * func = caml_named_value ("ocaml_string_join") ;
if (func == NULL)
puts ("caml_named_value failed!") ;
else
ores = caml_callback2 (*func, ojoin, oargv) ;
printf ("Ocaml returned : '%s'\n", String_val (ores)) ;
CAMLreturn0 ;
} /* call_ocaml_string */
int
main (int argc, char ** argv)
{ const char * progname ;
int k, count ;
progname = argv [0] ;
if (strstr (progname, "./") == progname)
progname += 2 ;
if (argc < 2)
{ puts ("Need at least 1 command line argument.") ;
exit (1) ;
} ;
count = argc >= 2 ? atoi (argv [1]) : 1 ;
count = count < 1 ? 1 : count ;
printf ("Count : %d\n", count) ;
/* Must call this before calling any Ocaml code. */
caml_startup (argv) ;
for (k = 0 ; k < count ; k++)
call_ocaml_void (progname) ;
for (k = 0 ; k < count ; k++)
call_ocaml_string (" ", (char const **) (argv + 1)) ;
return 0 ;
} /* main */
The main function is mostly self explanatory; the only thing to note is that if we want to call any Ocaml code from C, we must call caml_startup first. Looking at the functions that call into Ocaml, note that these functions begin with a call to CAMLparam0 and ends with a call to CAMLreturn0. These are both macros, the first of which sets up the Ocaml specific stack requirements and the second of which cleans up after the first. The '0' at the end of their names indicates that there are zero Ocaml managed data objects passed into and returned from the C function respectively.
For values to be passed to Ocaml, we use local Ocaml managed variables set up with CAMLlocal1 if we only have one, or CAMLlocal3 if we have 3. Data can be copied into these local Ocaml variables using the caml_copy_* and caml_alloc_* families of functions.
The Ocaml functions we want to call can be looked up by name using caml_named_value and the function actually called using caml_callback if we only have one parameter to pass or caml_callback2 for two parameters.
For the call to ocaml_string_join which returns a string, we can extract the return value from the Ocaml wrapper using String_val. There are also other functions to retrieve other data types, the only real caveat being that if the type isn't atomic (eg int or double) and you want to return it from the C function it will be necessary allocate memory for it and copy it because the memory area returned from Ocaml will be invalid after the call to CAMLreturn0.
Finally, building this simple example can be done as follows (using version 3.10.2 of the Ocaml compiler):
ocamlopt -c ocaml-called-from-c.ml -o ocaml-called-from-c.cmx
ocamlopt -output-obj -o camlcode.o ocaml-called-from-c.cmx
gcc -g -Wall -Wextra -c c-main-calls-ocaml.c -o c-main-calls-ocaml.o
gcc camlcode.o c-main-calls-ocaml.o -ldl -lm -L /usr/lib/ocaml/3.10.2 \
-lasmrun -o c-main-calls-ocaml
The first line compiles to Ocaml file into an Ocaml object (*.cmx) using the native code compiler, the second takes the Ocaml object and all the other Ocaml objects needed and generates a object file (camlcode.o) that can be linked to C code. The last two lines compile the C code into an object file and then links all the C objects and required libraries into an executable.
prompt > ./c-main-calls-ocaml 4 abc wxyz Count : 4 Program name is 'c-main-calls-ocaml'. Program name is 'c-main-calls-ocaml'. Program name is 'c-main-calls-ocaml'. Program name is 'c-main-calls-ocaml'. Ocaml returned : '4 abc wxyz' Ocaml returned : '4 abc wxyz' Ocaml returned : '4 abc wxyz' Ocaml returned : '4 abc wxyz'
At this point its probably a good idea to run the program under valgrind and vary the first parameter to prove to oneself that un-freed memory when the program terminates is a constant (due to the Ocaml runtime) and does not vary in proportion to the number of times the Ocaml code is called (which would indicate a memory leak in the interface code).
Posted at: 22:31 | Category: CodeHacking/Ocaml | Permalink
Sun, 22 Feb 2009
FP-Syd #12.
Last Thursday night was the 12th meeting of FP-Syd, the Sydney Functional Programming group and we had just under 20 people show up at the Sydney offices of Thoughtworks to hear our presenter.
Numbers were down a bit this month, probably because we only had one speaker. While we had a decent number of people volunteering to present when FP-Syd started it looks like we have, to a certain extent, pumped the well dry. The presentations committee is going to look into ways of encouraging more people to come forward and tell us what they're working on.
Anyway, our presenter for the evening was Tim Docker, talking about his Haskell graphing library Graphics.Rendering.Chart which is also on Hackage.
Tim started off with a demo of what the library could do and explained how it was built on top of hscairo the Haskell bindings to the very wonderful Cairo library as well as Gtk2Hs, the Haskell bindings to GTK. He then went on to discuss the difficulties of building a generic graphing library and of creating an API that exposes all the various features in an as-easy-as-possible to use way.
Internally, Tim's library uses Haskell data types and records, often with ten or more separate fields, in addition to having types and records nested within types and records. He went on to explain that accessing these deep and wide structures in Haskell was painful. In particular, the problems are:
- Field names must be unique with a module.
- "Updates" (creating a new record from an old one with one or more fields changed) have a special syntax.
- The special syntax for these updates doesn't allow composition, making updating of one field deep in a nested structure difficult.
Tim's solution to this problem was the use of the data-accessor library and Template Haskell for automatic generation of accessors. While this was an adequate solution, Tim felt that it was working around a limitation inherent to the Haskell language.
Finally, Tim spoke about making his Charts library extensible using Higher Order Functions.
A big thanks goes to Nick Carroll and Thoughtworks for making their facilities available for FP-Syd for the last 4 or 5 meetings. The next FP-Syd meeting will be a Google's new head quarters in Pyrmont. Thanks also to Tim for the presentation.