Designing socket messaging format

Steve Holden sholden at
Tue Nov 13 14:40:32 CET 2001

"David Bolen" <db3l at> wrote ...
> shriek at (Stephen) writes:
> > Just read Dan Bernstein's description. Interesting note about
> > security considerations using CRLF, which follows on from an
> > earlier post in this thread.
> Well, but of course the bug here wasn't really using the CRLF
> terminator, but the laziness of the programmer in handling the general
> case of a buffer overflow.  And a lazy programmer can cause a security
> problem in oh so many ways.

> It's just as easy to discard overflows while watching for a terminator
> when you run out of buffer space as it is to discard invalid length
> messages after you've received a length that is too long.  And in
> fact, they have more in common than not, since even in the length
> counted case, it's going to be up to the receiver to "absorb" the data
> in the stream that it will be ignoring.  So it's really just a
> question as to when you decide that the message is invalid and needs
> to be skipped.

> One could even argue that the delimited case is also a touch more
> robust in the presence of data corruption.  If corruption in the
> stream affects the delimiter than the receiver (assuming no timeout)
> will wait indefinitely, but any eventual retransmission from the
> client will eventually trigger a new recognition of the retransmitted
> terminator.  And then probably throwing away the mess of two packets
> :-) Whereas, in the length case, corruption of the length field will
> not only affect the receivers proper reception of the data but will
> also complicate its ability to resynchronize with the sender for any
> retransmissions, since it has to "find" the length field again.
One could, but one could probably more strongly argue the requirement for a
reliable transport layer for protocols of this nature. In a datagram
protocol you aren't usually dealing with information streams of indefinite
length, but with chunks that arrive altogether and can easily be tested
against some maximum length. After which they either fit the protocol
pattern or they don't.

> So in custom protocol formats it can also be helpful to have a magic
> marker for start of packets that, if present, is quoted in the stream,
> to aid in any resynchronization.  Adding a CRC can't hurt either.  Of
> course, this assumes you aren't opening and tearing down connections
> for each packet, which in most cases is a lot of overhead.
While the above steps are useful in datagram-based application protocols
they should not be required over, say, TCP, when if communications take
place at all they will be error-free. But stream-based datagrame protocols
are tricky, and inevitably result in the re-implementation of some (or all!)
of TCP's complexity at the application level, as you describe above. That
was why the web accepted the overhead of a TCP connection per interaction
for HTTP 0.9, and then went on to design keep-alive workarounds in later
versions as an optimization.


More information about the Python-list mailing list