Sunday, January 2, 2011

State

State is one of those funny things in programming. It is both wonderful and a curse, depending on the context. In my experience, moreso a curse.

First, a quick definition. By state, I mean a change occurs to some portion of a program, usually by assignment. A very simple example in C:

for( int x = 0; x < 10; x++ );

This has state, because the value of x changes by executing this loop.

The imperative paradigm views state as almost an end-all be-all. If you've ever worked in assembly language, which is essentially purely imperative, you know what I'm talking about. Technically, there are no functions, only subroutines, as functions return something. Machine languages typically have no way of "returning" something. Instead, there is some programming convention that certain changes to state are indicative of a function returning something. In MIPS, the convention is to put return values in certain globally available registers, or the stack.

Assuming the programmer knows EXACTLY what he or she is doing, this is great. This is typically the fastest way of doing something in terms of execution speed, and it can be efficient on memory as well. The problem is that most programmers don't completely know what they are doing. This isn't a bad thing, and I don't mean it offensively. In a good abstraction, the programmer shouldn't have to worry about everything. Do you really know and understand ALL of the software between a print statement and your monitor? I hope not. (If you do...well I tip my hat to you).

Things can get complicated fast in a purely imperative model. This is why there are all sorts of assembly conventions which attempt to limit the amount of state changes one can make. One could probably write marginally faster code, but it would take an enormous amount of time just to read the resulting nightmare. Humans can only really understand a few things at once, and this works contrary to a completely imperative model.

At the opposite end of the spectrum of the imperative paradigm is the purely functional paradigm. In a purely functional paradigm, there is no state. Everything is represented as a series of function calls. What may be surprising to the typically imperative programmer is that it's possible to achieve quite a bit in this model. Programs tend to be shorter, and easier to understand. Debugging time tends to be shorter, as well. Individual functions can be tested independently. Without state, the same function is guaranteed to return the same outputs on the same inputs under all conditions, barring some catastrophic hardware failure.

There is an added bonus of removing state: parallelism. In parallel code, the areas that need locks typically modify state. These areas can introduce sequential dependencies. Taken together, these regions result in code that is harder to produce, less clear, more prone to deadlock, and is slower. Without state, such regions don't exist. It's much easier to write parallel code. In fact, since purely functional programs can be broken down into a single expression with a multitude of subexpressions, it's possible to automatically achieve parallelization!

There is a downside of completely removing state, however. For one, all I/O is based on state. A program without I/O capabilities cannot take in any inputs or give any outputs, which is fairly worthless. Purely functional languages perform tricks to do I/O, which is technically impure by its very nature.

There is another downside: algorithmic complexity. Without state, certain problems cannot be represented efficiently. I've started to read Chris Okasaki's "Purely Functional Data Structures", which makes clear that even on a purely theoretical basis it's impossible for certain purely functional algorithms to be as efficient as equivalent imperative implementations. (Note that "certain" is an important word here; you can usually do as well.)

State just plain makes certain things easier to represent. Arguably, this is the main reason why many functional languages are impure, meaning they do allow for state.

Between these two extremes of purely imperative and purely functional programming are object oriented programming and logical programming. In the object oriented model, computation is represented in terms of objects. These objects have specific state and behavior. State is modified through well-defined channels which are typically integrated into the behavior of objects.

This is an interesting middle ground. It allows for state, but in a way that hides most of the complexity. In a well-written object-oriented program, changing the state in one object will not effect the state or behavior of a completely unrelated object. In the imperative model, there is no such enforcement of this. (Note that the aforementioned assembly conventions share a resemblance to the object oriented model, by allowing only certain side effects to occur. However, these are merely convention; your code does not enforce that these conventions are followed.)

Like object-oriented programming, logical programming has an interesting, partially restricted way of allowing state. The entire database, which holds both the program and the data, can be modified. Items can be added or removed, and the behavior of the program can change based on this. This is extremely powerful, but it can also be just as dangerous as in the imperative paradigm. Typically, programs change the database as little as possible, but this is more of a convention than anything else. It's possible to truly have a program write and execute itself with this model, in a way that is unparalleled in other paradigms.

A major downside of this is that changes are usually global. There is only one database, accessible to everything. I suspect that there are logical programming systems that allow for more restrictions to be placed on modifications, but this is true for ISO-standard Prolog.

So why did I just blab on about state? The code I'm looking at uses it...a lot. It's Java, but it's written in more of an imperative style. Oh boy...

No comments:

Post a Comment