Wed, 28 Mar 2007
Tridge Was Right.
At Linux.conf.au 2005, Tridge gave a keynote talk about some of the issues the Samba team had run into when designing Samba4. While discussing the problems of writing a complex server which has to serve multiple simultaneous requests he put up a series of three slides. The first said:
Having used OS level threads in the past, I was in complete agreement with this. The problems of sharing data across threads and locking/unlocking of that data to make sure the accesses are safe is simply too difficult for mere mortals to get right in anything other than trivial cases.
Tridges' second slide said:
Splitting multi threaded code into multiple processes fixes the locking problems by removing the ability of the processes to share data (ignoring IPC shared memory of course). Obviously for a server program like Samba, this is not a solution.
The third slide in the series said:
State machines suck!
At the time of Tridge's keynote, I didn't really appreciate what he was saying.
The idea is really quite simple; everything is done in a single process so no locking is required. All I/O is multiplexed using the Unix select system call and a state machine keeps track of state of all of the I/O channels.
The problem with this is that any blocking I/O operation must be replaced with a non-blocking operation. Failure to do this will mean that a single I/O call that blocks will prevent the servicing of all other I/O operations until the blocked operation decides to complete and return control to the state machine.
However, the state machine model does work relatively well for simple examples. Unfortunately, non-blocking I/O leads to a second problem; writing code to do non-blocking I/O is significantly more difficult than for regular blocking I/O.
In my day job I've been working on some C++ classes which talk to a web server using HTTP POST operations over a keep-alive connection. This code had a couple of requirements:
- Must be non-blocking to fit in with the rest of the code.
- Must be capable of HTTPS connections using OpenSSL (which is a particularly nasty to get working in non-blocking mode).
- Must be able to connect via a HTTP proxy in both HTTP and HTTPS modes.
- Must be able to detect a connection that gets broken and gracefully re-establish it.
I now have code that fits these requirements and a pretty comprehensive test suite. With this experience behind me I have to say that getting this working was a royal pain in the neck. I also agree with Tridge; state machines suck almost as much as threads.
Maybe its time for me to learn Erlang.