Wed, 16 Aug 2006

Why I Don't Like Dynamic Typing.

I've just spent the last week or so coding in Python, a language I first picked up about eight years ago. For a number of years, I used Python quite extensively but have found in the last year or so that I'm tending to favor Ocaml for many of the tasks where I used to use Python.

A large part of the reason for this switch has to do with type checking; the process where the compiler or run time environment checks whether operations on data held in variables is valid for the type of that data. For instance, the concatenation operation is probably valid for strings and lists, but simply doesn't make sense for integers.

The big difference between Ocaml and Python is that Ocaml uses static type checking while Python uses dynamic type checking. For those too lazy to follow the links, static typing means that type and related correct-ness checking of the program is done by the compiler at compile time while languages that use dynamic typing do a large proportion of this checking at run time.

Here's a Python example that bit me just today:


  try:
      data = my_obj.read (1024)
  except:
      print "Read on '%s' failed" & my_obj.name ()

The error is that I had an ampersand (the '&' character) in the last line where I should have had a percent symbol. The above code has an fatal bug but ran perfectly well for hours until the first time that my object's read method threw an exception. It was then that the print statement was type checked and thats when the program exited with the following error message:


  TypeError: unsupported operand type(s) for &: 'str' and 'str'

This particular error is typical of a whole class of errors that can exist in dynamically typed programs [0] but may never show up until the program is in the hands of a user. Personally, I think programs blowing up like this in the hands of users is unacceptable. Unfortunately, its also extremely common; so common that most regular computer users would have experienced things like this at least once. To me, this is a failure of discipline of software engineering.

Many defenders of dynamic typing say that typing errors can be picked up in a test suite. While I am a huge advocate of rigorous testing and test driven development, I'm also way too lazy to write tests for every single code path including the exception handler for every single try/except block. Especially when there is a better way.

My work with Python coincided with the publishing of an article on the Register's Developer site titled Mathematical Approaches to Managing Defects. This article also included a section on Formal Methods where the programming environment and process uses mathematical proofs to prove that a piece of code conforms to its specification.

A question asked in the article was "is proof more effective than testing for industrial scale programs?". The answer according to a company specializing in high integrity/reliability software was:

"... that 'proof appears to be substantially more efficient at finding faults than the most efficient testing phase'. This implies, of course, that you use both proof and testing on the project, where each technique is appropriate (even though proof is more cost-effective at finding some errors than testing is at finding other errors, proof may not be able to find all errors)."

It then struck me that compile time type checking built into the programming language is in effect, a certain level of proof-like correct-ness testing. Every error that is found at compile time is one less error that can occur at run time. What's more, the stronger the type system, the higher the level of correct-ness testing applied.

Unfortunately, not all statically typed languages are equal. Languages like C and C++ are statically typed, but both have loopholes (like pointers, casting and automatic type conversion) which allow the programmer to bypass the type checking system and introduce bugs. Languages like Ocaml, Haskell and Ada are statically typed but with fewer loopholes. For instance, in Ocaml, there are no pointers, no implicit type conversions and the only typing loophole is the Marshal module. If the programmer avoids the Marshal module, Ocaml's type system is bullet proof.

As programmers we have to decide whether the current situation where our programming languages allow us to shoot ourselves in the foot is OK or whether we need to aim higher. A first obvious step towards better software for users is choosing better programming languages; languages that protect us as programmers from our own weaknesses. Using such languages means that the programmer can spend less time manually checking for bugs (something computers are better at anyway) and more time thinking about algorithms, design and implementation issues; the stuff that computers can't do.



[0] Obviously, Python is not the only language in common usage that uses dynamic typing. Others include Actionscript (one of the most evil languages I've ever had the misfortune to use), Javascript, Perl, Tcl, Ruby and a host of others.
Java on the other hand is statically typed with some loopholes and a run time which does some run time checking for things like casting operations. It is therefore more type safe than C and C++, but not as type safe as Ocaml, Haskell and Ada.

Posted at: 21:43 | Category: CodeHacking | Permalink