On Jun 21, 2010, at 12:34 PM, Toshio Kuratomi wrote:
I like the idea of having encoding information carried with the data. I don't think that an ebytes type that can *optionally* have an encoding attribute makes the situation less confusing, though.
Agreed. I think the attribute should always be there, but there probably needs to be a magic value (perhaps None) that indicates and unknown, manual, garbage, error, broken encoding. Examples: you read bytes off a socket and don't know what the encoding is; you concatenate two ebytes that have incompatible encodings.
To me the biggest problem with python-2.x's unicode/bytes handling was not that it threw exceptions but that it didn't always throw exceptions. You might test this in python2:: t = u'cafe' function(t)
And say, ah my code works. Then a user gives it this:: t = u'café' function(t)
And get a unicode error because the function only works with unicode in the ascii range.
That's an excellent point.
ebytes seems to have the same pitfall where the code path exercised by your tests could work with:: eb = ebytes(b) eb.encoding = 'euc-jp' function(eb)
but the user exercises a code path that does this and fails:: eb = ebytes(b) function(eb)
What do you think of making the encoding attribute a mandatory part of creating an ebyte object? (ex: ``eb = ebytes(b, 'euc-jp')``).
If ebytes is a separate type, then definitely +1. If 'ebytes is bytes' then I'd probably want to default the second argument to the magical "i-don't-know' marker. -Barry