m3ga blog

Mon, 29 Dec 2008

learnHaskell :: Problem -> IO Solution

Over the last week or so I've been learning Haskell. Unlike a previous attempt to learn Erlang I think this one will actually end up bearing fruit. The main thing that slowed down my attempt to learning Erlang was that I didn't have a task to apply it to that exploited the strengths of the language.

For Haskell I have a task that is ideally suited to the language and plays to its strengths; writing a parser for an Javascript-like language. Just like when I learned Ocaml I am jumping in at the deep end with a difficult problem and learning the language on the way (ie as a side-effect of tackling the task itself).

Normally my tool of choice for this task would be Ocaml and I did in fact start out writing a parser in Ocaml using the Menhir parser generator. Menhir creates a parser from an LR(1) grammar specification where the LR(1) means that the grammar is left recursive and must need no more than one token look-ahead to resolve ambiguities. Over two days of intense hacking I made quite a lot of progress on the grammar and then hit a wall; numerous rather incomprehensible shift/reduce and reduce/reduce warnings plus a part of the grammar (raw XML embedded directly into the source code) which was not context free and for which I could not see any possible way of parsing with only one token look-ahead.

Since I already knew Ocaml, I looked around at some of the other parser generators for Ocaml like Aurochs and PCL. Aurochs uses Parsing Expression Grammar (or Packrat parsing) and seemed interesting, but was rejected because it seemed to gave very little control over the Abstract Syntax Tree it generated. PCL on the other hand was a monadic parser combinator library which is basically a port of Haskell's Parsec library to Ocaml. The main problem I saw with PCL was that it used Monads and lazy evaluation, neither of which are very common in standard Ocaml code. PCL was also rather new and I wasn't sure if it was stable and mature enough for quite an advanced parsing task.

Of course I had been hearing good things about Haskell's Parsec library for a couple of years and decided that this was a good time to give it a try. The interesting thing about Parsec is that it uses lazy evaluation to provide what is effectively infinite token look-ahead. Obviously, a Parsec grammar should be written so that it only uses more than one look-ahead when that is absolutely necessary.

With a bit of googling I found a bunch of examples using Parsec for simple things and that was enough code to get me started. A week later, after an IRC question answered by Don Stewart, a bit of help from Conrad Parker and a question answered on the Haskell Cafe mailing list, I am almost as far along as I got with the Ocaml version of the parser.

My impressions of Haskell and Parsec so far are as follows:

Initially I disliked Haskell's syntax even more than I disliked that of Ocaml, but I am coming to terms with it. I'm even quite liking it.
The error messages from the Haskell compiler are very much more wordy than those from the Ocaml compiler but not a whole lot more enlightening.
You don't really need to be comfortable with monads to start cutting Haskell code.
The two unusual language features of Haskell, monads and lazy evaluation make things like Parsec possible. Without both of these features, Parsec would not be possible.
Hoogle, the Haskell API search engine is a fantastic resource.
Parsec is fantastic to use, amazingly powerful and very complete (whenever I find myself writing too much code its usually because there is a already a combinator to do what I need).
Writing the parser in the language itself (ie Haskell) is so much nicer than writing it in a separate Domain Specific Language like one does with yacc / bison / ocamlyacc / menhir etc.
Parsec error and warning messages are much easier to understand and debug than the typical shift/reduce and reduce/reduce messages from yacc style parser generators (Menhir has a conflicts output file which helps a little).
When the parser finds an error in the input its parsing, its much easier to generate good error messages than it is with yacc style parsers.

All in all, I'm really enjoying my foray into the world of Haskell.

Posted at: 09:53 | Category: CodeHacking/Haskell | Permalink

Mon, 29 Dec 2008

learnHaskell :: Problem -> IO Solution

Categories

Syndication

Planets

Links

File Validators