rrr at ronadam.com
Sat Feb 18 23:15:17 CET 2006
> On Sat, Feb 18, 2006, Ron Adam wrote:
>> I like the bytes.recode() idea a lot. +1
>> It seems to me it's a far more useful idea than encoding and decoding by
>> overloading and could do both and more. It has a lot of potential to be
>> an intermediate step for encoding as well as being used for many other
>> translations to byte data.
>> I think I would prefer that encode and decode be just functions with
>> well defined names and arguments instead of being methods or arguments
>> to string and Unicode types.
>> I'm not sure on exactly how this would work. Maybe it would need two
>> sets of encodings, ie.. decoders, and encoders. An exception would be
>> given if it wasn't found for the direction one was going in.
> Here's an idea I don't think I've seen before:
> bytes.recode(b, src_encoding, dest_encoding)
> This requires the user to state up-front what the source encoding is.
> One of the big problems that I see with the whole encoding mess is that
> so much of it contains implicit assumptions about the source encoding;
> this gets away from that.
Yes, but it's not just the encodings that are implicit, it is also the
s.encode(codec) # explicit source type, ? dest type
s.decode(codec) # explicit source type, ? dest type
encodings.tostr(obj, codec) # implicit *known* source type
# explicit dest type
encodings.tounicode(obj, codec) # implicit *known* source type
# explicit dest type
In this case the source is implicit, but there can be a well defined
check to validate the source type against the codec being used. It's my
feeling the user *knows* what he already has, and so it's more important
that the resulting object type is explicit.
In your suggestion...
bytes.recode(b, src_encoding, dest_incoding)
Here the encodings are both explicit, but the both the source and the
destinations of the bytes are not. Since it working on bytes, they
could have come from anywhere, and after the translation they would then
will be cast to the type the user *thinks* it should result in. A
source of errors that would likely pass silently.
The way I see it is the bytes type should be a lower level object that
doesn't care what byte transformation it does. Ie.. they are all one way
byte to byte transformations determined by context. And it should have
the capability to read from and write to types without translating in
the same step. Keep it simple.
Then it could be used as a lower level byte translator to implement
encodings and other translations in encoding methods or functions
instead of trying to make it replace the higher level functionality.
More information about the Python-Dev