[Python-3000] string API growth [was: Re: PEP 3138- String representation in Python 3000]
Stephen J. Turnbull
stephen at xemacs.org
Thu May 15 02:58:15 CEST 2008
Jim Jewett writes:
> Maybe I'm missing something, but it seems to me that there are only a
> few logical combinations;
There are lots of logical combinations, but most of them fall into
"general transform", is that what you mean?
> if the below is wrong, maybe that is one
> reason unicode seems more complex than it should.
>
> Encoding: str -> ByteString
> (staticmethod) BytesString.encode(my_string, encoding=?)
> ==
> my_string.encode(encoding=?)
>
> Decoding: ByteString -> str
> my_bytes.decode(encoding=?)
> ==
> (staticmethod) str.decode(my_bytes, encoding=?)
+1
> General Transforming:
> # Why insist on type-preservation?
> # Why even make these methods?
> my_string.transform(fn) == fn(my_string)
> my_bytes.transform(fn) == fn(my_bytes)
Make them methods if they are "like" codecs, by which I mean something
like (more or less) invertible stream-oriented transformations. Eg,
my_bytes.gzip()
Pretty weak, though.
> Transcoding: ByteString -> ByteString
> # If you care how it is represented, it is no longer unicode;
> # it is a specific (ByteString) representation
> mybytes.recode(old_encoding=?, new_encoding)
>
> # Can the old encoding often be inferred?
> # Or should it always be written because of EIBTI?
(1) I agree this is the obvious connotation of "transcode" in the
codec context.
(2) This usage is too special to deserve treatment at this level,
especially since for most purposes
my_bytes.decode(old_encoding).encode(new_encoding)
will be perfectly sufficient.
(3) old_encoding should not be inferred as part of .decode() or
.recode(), as such inference is unreliable and domain-specific
heuristics often lead to great improvements. A separate
method/function should be used.
More information about the Python-3000
mailing list