rrr at ronadam.com
Thu Feb 23 08:18:42 CET 2006
Stephen J. Turnbull wrote:
>>>>>> "Ron" == Ron Adam <rrr at ronadam.com> writes:
> Ron> Terry Reedy wrote:
> >> I prefer the shorter names and using recode, for instance, for
> >> bytes to bytes.
> Ron> While I prefer constructors with an explicit encode argument,
> Ron> and use a recode() method for 'like to like' coding.
> 'Recode' is a great name for the conceptual process, but the methods
> are directional. Also, in internationalization work, "recode"
> strongly connotes "encodingA -> original -> encodingB", as in iconv.
We could call it transform or translate if needed. Words are reused
constantly in languages, so I don't think it's a sticking point. As
long as its meaning is documented well and doesn't change later, I think
it would be just fine. If the concept of not having encode and decode
as methods work, (and has support other than me) the name can be decided
> I do prefer constructors, as it's generally not a good idea to do
> encoding/decoding in-place for human-readable text, since the codecs
> are often lossy.
> Ron> Then the whole encode/decode confusion goes away.
> Unlikely. Errors like "A string".encode("base64").encode("base64")
> are all too easy to commit in practice.
Yes,... and wouldn't the above just result in a copy so it wouldn't be
an out right error. But I understand that you mean similar cases where
it would change the bytes with consecutive calls. In any case, I was
referring to the confusion with the method names and how they are used.
This is how I was thinking of it.
* Given that the string type gains a __codec__ attribute to handle
automatic decoding when needed. (is there a reason not to?)
str(object[,codec][,error]) -> string coded with codec
unicode(object[,error]) -> unicode
bytes(object) -> bytes
* a recode() method is used for transformations that *do_not*
change the current codec.
See any problems with it? (Other than from gross misuse of course and
your dislike of 'recode' as the name.)
There may still be a __decode__() method on strings to do the actual
decoding, but it wouldn't be part of the public interface. Or it could
call a function from the codec to do it.
The only catching point I see is if having an additional attribute on
strings would increase the memory which many small strings would use.
That may be why it wasn't done this way to start. (?)
More information about the Python-Dev