[Python-ideas] Add encoding attribute to bytes

Nick Coghlan ncoghlan at gmail.com
Tue Nov 10 11:41:26 CET 2009

Terry Reedy wrote:
>> As for the usefulness, I often have to work with proprietary
>> communication
>> protocols between computer and devices, and there the bytes have no
>> encoding
>> whatsoever
> Random bits? It seems to me that protocol means some sort of encoding,
> formatting, or structuring, some sort of agreed on interpretation, even
> if private.

This is true, but the encoding scheme *isn't* a property of the binary
data in and of itself. It's metadata about it that guides the
application as to how the stream should be interpreted.

For a lot of the things I've done in the past, I haven't cared at all
about the encoding of binary data - I've just been schlepping bits from
point A to point B and back without caring what they actually *meant*.
Other times I didn't have to guess or pass any metadata around because
the comms port was hardwired to a particular device that only knew one
way of communicating - the definition of the protocol was implicit in
the implementation of the interface software.

In fact, one of the key features typically desired in a communications
protocol is for it to be content neutral: you push binary data in one
end and get the same binary data out of the other end. Peer applications
using the channel to communicate with each other don't need to care what
the channel is doing with the data, but equally importantly, the
software implementing the comms channel doesn't need to know how to
interpret the bits it is transporting*.

For other applications, the Unicode encoding might be important to know.
Some will care more about the MIME type, or use some other defined
binary encoding (what is the Unicode encoding of an sqlite or bsddb
database file?). Other applications may be interested in a proprietary
binary format that is formally defined solely by the code that knows how
to read and write it.

Can bytes be used to store encoded Unicode data? Sure they can. But they
can be used for a whole host of other things as well, so burdening them
with an attribute that is occasional helpful, but more often dead weight
or even outright misleading would be a mistake.


* Sometimes a bit more coupling makes sense when there are engineering
advantages to be had, but this is usually an application specific thing
(e.g. IP has a protocol field that identifies different application
layer protocols such as TCP, UDP and ESP which have different network
performance expectations, This allows IP network routers to apply
different rules without having to peek inside the payload of each IP packet)

Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

More information about the Python-ideas mailing list