[Python-3000] string API growth [was: Re: PEP 3138- String representation in Python 3000]

Jim Jewett jimjjewett at gmail.com
Wed May 14 19:45:10 CEST 2008


On 5/14/08, Georg Brandl <g.brandl at gmx.net> wrote:
> M.-A. Lemburg schrieb:
>>> On Fri, May 9, 2008 at 3:54 AM, M.-A. Lemburg <mal at egenix.com> wrote:
>>>> On 2008-05-08 22:55, Terry Reedy wrote:
>>>>> Functions that map unicode->unicode or bytes->bytes could be called
>>>>> transcoders.

bytes->bytes might be, but for many mappings (and all unicode->unicode
mappings) they are general transformers.

If you care about the concrete representation, then you aren't really
dealing with unicode anymore; you're dealing with the ByteString.

>>>> Are you suggesting to have two separate methods which then
>>>> allow same-type-conversions ?

>>>> ... have to map naturally to the codec method encode and
>>>> decode

For str->str or bytes->bytes, how do you decide which direction is
"en"coding vs "de"coding?

> > How about these:

> > str.str_encode() -> str
> > str.str_decode() -> str

> > bytes.bytes_encode() -> bytes
> > bytes.bytes_decode() -> bytes

>  What about transform/untransform?

Maybe I'm missing something, but it seems to me that there are only a
few logical combinations; if the below is wrong, maybe that is one
reason unicode seems more complex than it should.

Encoding:  str -> ByteString
    (staticmethod) BytesString.encode(my_string, encoding=?)
    ==
    my_string.encode(encoding=?)

Decoding:  ByteString -> str
    my_bytes.decode(encoding=?)
    ==
    (staticmethod) str.decode(my_bytes, encoding=?)

General Transforming:
    # Why insist on type-preservation?
    # Why even make these methods?
    my_string.transform(fn) == fn(my_string)
    my_bytes.transform(fn) == fn(my_bytes)

Transcoding:  ByteString -> ByteString
    # If you care how it is represented, it is no longer unicode;
    # it is a specific (ByteString) representation
    mybytes.recode(old_encoding=?, new_encoding)

    # Can the old encoding often be inferred?
    # Or should it always be written because of EIBTI?

-jJ


More information about the Python-3000 mailing list