Network byte ordering question ...

Mon Jun 5 23:43:44 EDT 2000

istevens at calum.csclub.uwaterloo.ca (ian stevens) writes:

> I am writing a Python application which makes use of a protocol which
> requires information to be in network byte order.  Specifically, a
> message header takes on the following form:
> 
>     bytes   type
> 
>     0-15    Message ID ... all numbers
>     16      Numeric value
>     17      Numeric value
>     18      Numeric value
>     19-22   Numeric value
> 
> What is the best way to extricate the above information and convert it
> to the byte ordering of the host machine?  To convert a header in the
> byte ordering of the host machine to network byte ordering?  How do
> I determine the type of machine the program is running on and, hence,
> its byte ordering?

I know Python is supposed to have one obvious way to do things, but I
think there are several possibilities here, with varying degrees of
readability and no doubt performance.  So I thought I'd offer a few
suggestions.

As you indicated you had looked at, you could probably use the struct
module, and that may be the simplest approach.  One caution is that
the module doesn't really guarantee much about the alignment of
individual elements in a structure, so I would suggest slicing off the
fields you need to decode and then unpacking them individually.

One other wrinkle to stay aware of - if your 4-byte value is an
"unsigned" value, then you need to work with Python long integers and
not regular integers, which are signed, to ensure that you can
represent all possible values.  The structure unpacking automatically
returns integers as Python long integers, but any manual conversion
would need to take that into account.  If you know the values are
going to be less than 0x8000, or that having the numeric value as
negative won't hurt, you can ignore that part.  I've assumed long
integer results in code below to ensure we can treat the entire
spectrum of 4-byte unsigned values.

In your case, the structure seems very straight forward, and there's
only one field for which the byte ordering matters (your final
4-byte/32-bit field).  I'm also assuming that by numeric value you are
talking true binary value (e.g., 0x00 for 0, and not 0x30, or the
ASCII '0').

I'm not sure how the data is already represented in memory, but
presuming that it's a string in "header", you could easily do
something like to pick up the first set of fields:

    message_id = header[:16]                    # Bytes 0-15
    values     = map(ord,header[16:19])         # Bytes 16-18 as ints

That would give you your numeric fields as values[0] through
values[3].  You could then just append (values.append) the 4-byte
field, or store it separately, however you like.

Decoding that final field using the struct module (as mentioned above)
could be done by slicing off the field, and use that as a single field
structure - to remove any padding dependency.  The following:

    struct.unpack("!I",header[19:23])

would result in a tuple of 1 with the value as a Python long int.  The
unpack string is exclamation-mark (network order), capital eye (for
unsigned).  If you wanted to use that in the above code for appending
to the values list, you'd need to subscript [0] to get the actual
object to append to the list, as in:

    values.append(struct.unpack("!I",header[19:23])[0])

Alternatively you could just write some direct code to handle the
conversion, by accessing one byte at a time from the field:

    values.append((long(ord(header[19])) << 24) +     # Network order 4-byte
		  (long(ord(header[20])) << 16) +     #   value at 19-22
		  (long(ord(header[21])) << 8)  +
		  long(ord(header[22])))

The conversion from a 4-byte field to a single integer value is a
simple way to convert network ordering to native ordering without
worrying about determining the native ordering.  You can do the
reverse on output - shift right and mask by 0xFF for each byte.  There
needs to be a few parentheses to ensure proper precedence for the
shifts versus addition, but it's basically just four operations.

If you wanted to use a more functional approach to the latter
computation using some of Python's support for that, you can consider
the above computation as a constant shift left (8 bites) of a running
total of each byte.  This can be accomplished with a
reduce/map/lambda approach with something like:

    values.append(reduce(lambda x,y: (x<<8)+y,
                         map(long,map(ord,header[19:23]))))

which turns bytes 19-22 of the header into a list of long integers,
which are then reduced by adding each to a running total which is
shifted left prior to each new addition.  Neat, eh?  :-)

Oh, and if you did want to determine the native ordering (to, for
example, short circuit the conversion if it matched network order),
one approach is to compare the internal representation of a number to
a fixed constant (memory buffer) to decide which endian you are.  This
can be done with the struct module - you can find an example in the
wave.py file in the standard library, which I'll quote below:

    # Determine endian-ness
    import struct
    if struct.pack("h", 1) == "\000\001":
	    big_endian = 1
    else:
	    big_endian = 0

Hope this helps.

--
-- David
-- 
/-----------------------------------------------------------------------\
 \               David Bolen            \   E-mail: db3l at fitlinxx.com  /
  |             FitLinxx, Inc.            \  Phone: (203) 708-5192    |
 /  860 Canal Street, Stamford, CT  06902   \  Fax: (203) 316-5150     \
\-----------------------------------------------------------------------/