[Python-ideas] Add encoding attribute to bytes

MRAB python at mrabarnett.plus.com
Fri Nov 6 03:19:35 CET 2009


Terry Reedy wrote:
> A Python interpreter has one encoding for floats, ints, and strings. 
> sys.float_info and sys.int_info give details about the first two. 
> although they are mostly invisible to user code. (I presume they are 
> attached to sys rather than float and int precisely because this.) A 
> couple of recent posts have discussed making the unicode encoding (UCS2 
> v 4) both less visible and more discoverable to extensions.
> 
> Bytes are nearly always an encoding of *something*, but the particular 
> encoding used is instance-specific. As Guido has said, the programmer 
> must keep track. But how? In an OO language, one obvious way is as an 
> attribute of the instance. That would be carried with the instance and 
> make it self-identifying.
> 
> What I do not know if it is feasible to give an immutable instance of a 
> builtin class a mutable attribute slot. If it were, I think this could 
> make 3.x bytes easier and more transparent to use. When a string is 
> encoded to bytes, the attribute would be set. If it were then pickled, 
> the attribute would be stored with it and restored with it, and less 
> easily lost. If it were then decoded, the attribute would be used. If it 
> were sent to the net, the attribute would be used to set the appropriate 
> headers. The reverse process would apply from net to bytes to (unicode) 
> text.
> 
> Bytes representing other types of data, such as nedia could also be 
> tagged, not just those representing text.
> 
> This would be a proposal for 3.3 at the earliest. It would involved 
> revising stdlib modules, as appropriate, to use the new info.
> 
You said "give an immutable instance of a builtin class a mutable
attribute slot". Why would the slot be mutable? Surely if the attribute
said that the bytes represented a certain type of data then you
shouldn't be able to change it. ("The attribute says that the bytes are
UTF-8, but I'm going to change it so that it says they are ISO-8859-1.")
I think that the attribute should be immutable.



More information about the Python-ideas mailing list