Exploring State Machines
In previous posts I described how to create a model to generate code for asynchronous operations. I started with asynchronous operations because it can be done with a smaller model and fewer blog posts to show the basics. What really motivated me to get started with learning alternative development techniques was hand coding a state machine. It seems that there are more variations on state machine implementations than asynchronous implementations. In addition, integrating a state machine with the surrounding code can also be a challenge.
I was working on what should have been a simple code change to a state machine. I came to the conclusion that I would have to reverse engineer the state machine entirely to be sure that my changes would not cause any bugs. There was no one place in the code where I could see how the state machine worked. The real shocker was when I found that the code could run in two different states at the same time on different threads. At this point, I was convinced there had to be a better way.
It is certainly possible to write state machines by hand and get them to work. When finished though, most are not easily interpreted and correlated with the mental model you had. The code for guard conditions and determining the current state often gets mixed into giant if statements that are hard to read. Some of these problems could be fixed by careful factoring, followed by new coding standards to help readability. In general though, it is not easy to enforce good state machine design by coding standard within a group of developers. It would really be nice to have some tool support, and that was my motivation.
Example State Machine
In order to evaluate potential implementations of state machines, I needed a problem that would have the same challenges production code has. Since state machines I work with interact with network protocols on one side and with publicly consumable APIs on the other, SCTP seemed to be a good choice. SCTP also is new to me and hopefully as well as others, so it would give a good learn as you go type experience without offending anyone if I deviate from the RFC accidentally. In addition, I should be able to show the problems that threading and asynchronous operations create for state machines in later posts.
SCTP is covered in RFC 4960, which is comfortably over 100 pages in length. If you want to read it, feel free. However, I will summarize the basics necessary to understand the state machine. Since my goal is not to implement the protocol perfectly, there will be some strategic omissions in my presentation and implementation so we don't get off track.
In SCTP, an association is the relationship between two network endpoints which is essentially a connection. Each association has one or more streams associated with it. A stream guarantees ordering of messages within it, except for messages specifically marked for out of order delivery. The number of streams is negotiated at association start up.
In SCTP, applications don't need to worry about receiving partial messages. The SCTP stack is responsible for fragmenting messages on the sending side and reassembling them on the receiving side. This is a great feature if you are used to writing directly on TCP. SCTP also allows for both graceful shutdown and aborting of the association.
Each SCTP packet consists of a common header, and one or more chunks of data. The common header essentially gives the source and destinations of the packet. The chunks can either be control chunks or data chunks. Data chunks are the possibly fragmented application data being sent. Control chunks handle association startup, shutdown, and reliable delivery of the data chunks. In many cases, control and data chunks can be combined and sent in one packet as an optimization.
There are a few internal components described in the RFC that work together to implement the protocol:
- Stream sequencing
- Data fragmentation
- Bundling of chunks
- Ack and congestion avoidance (not addressed)
- Path management (not addressed)
- Packet validation (minimally addressed)
As you can see, I will likely not address all components in my implementation of SCTP.
Now that you have some basic concepts, I can summarize how associations are created and shut down. These two functions should be sufficient to carry us for quite a few topics on state machines.
The startup procedure between A and B to create an association is as follows:
At this point, there is an established association between A and B
A shutdown initiated by A would proceed as follows:
Both sides are then gracefully closed.
Note: If A receives a SHUTDOWN from B after SHUTDOWN is sent, it then waits for a SHUTDOWN_COMPLETE from B in the same manner that B does for A as shown.
If either side requests an Abort, an ABORT chunk is sent to the remote, and the association is immediately closed.
States and Events
The states I have for the association are:
The key events handled are:
- Receive each packet type
- Associate (or connect) application request
- Abort application request
- Shutdown application request
- SendMessage (data)
In addition the state Active contains some common abort handling behavior for all states except Closed. I was able to factor out common behavior because I used the State pattern from the book Design Patterns by Gamma, Helm, Johnson, and Vlissides (aka. Gang of Four). This pattern has each state represented by a class and events represented by methods on that class. I have never used this pattern in implementing a state machine before so I decided to try it.
It would be hard to implement a state machine without seeing a state diagram. I've redrawn the one from the RFC here for convenience.
In this post I introduced a new set of topics about state machines and gave an overview of a sample problem that should allow us to do quite a bit of experimentation. In particular I would possibly like to cover the following in future posts:
- state machine design and implementation
- integrating asynchronous operations
- API design