[Python-ideas] Add encoding attribute to bytes

Terry Reedy tjreedy at udel.edu
Tue Nov 10 21:12:50 CET 2009

Nick Coghlan wrote:
> Terry Reedy wrote:
>>> As for the usefulness, I often have to work with proprietary
>>> communication
>>> protocols between computer and devices, and there the bytes have no
>>> encoding
>>> whatsoever
>> Random bits? It seems to me that protocol means some sort of encoding,
>> formatting, or structuring, some sort of agreed on interpretation, even
>> if private.
> This is true, but the encoding scheme *isn't* a property of the binary
> data in and of itself. It's metadata about it that guides the
> application as to how the stream should be interpreted.
> For a lot of the things I've done in the past, I haven't cared at all
> about the encoding of binary data - I've just been schlepping bits from
> point A to point B and back without caring what they actually *meant*.
> Other times I didn't have to guess or pass any metadata around because
> the comms port was hardwired to a particular device that only knew one
> way of communicating - the definition of the protocol was implicit in
> the implementation of the interface software.
> In fact, one of the key features typically desired in a communications
> protocol is for it to be content neutral: you push binary data in one
> end and get the same binary data out of the other end. Peer applications
> using the channel to communicate with each other don't need to care what
> the channel is doing with the data, but equally importantly, the
> software implementing the comms channel doesn't need to know how to
> interpret the bits it is transporting*.
> For other applications, the Unicode encoding might be important to know.
> Some will care more about the MIME type, or use some other defined
> binary encoding (what is the Unicode encoding of an sqlite or bsddb
> database file?). Other applications may be interested in a proprietary
> binary format that is formally defined solely by the code that knows how
> to read and write it.
> Can bytes be used to store encoded Unicode data? Sure they can. But they
> can be used for a whole host of other things as well, so burdening them
> with an attribute that is occasional helpful, but more often dead weight
> or even outright misleading would be a mistake.
> Cheers,
> Nick.
> * Sometimes a bit more coupling makes sense when there are engineering
> advantages to be had, but this is usually an application specific thing
> (e.g. IP has a protocol field that identifies different application
> layer protocols such as TCP, UDP and ESP which have different network
> performance expectations, This allows IP network routers to apply
> different rules without having to peek inside the payload of each IP packet)

Your experience has been different from mine. Thanks for the exposition.
I can see why you prefer metadata to either be in the stream itself or 
as part of a wrapper object.

Terry Jan Reedy

More information about the Python-ideas mailing list