[Python-3000] PEP 3138- String representation in Python 3000
Nick Coghlan
ncoghlan at gmail.com
Thu May 15 12:34:32 CEST 2008
Greg Ewing wrote:
> Stephen J. Turnbull wrote:
>> This discussion isn't about whether it could be done or not, it's
>> about where people expect to find such functionality. Personally, if
>> I can find .encode('euc-jp') on a string object, I would expect to
>> find .encode('gzip') on a bytes object, too.
>
> What I'm not seeing is a clear rationale on where you
> draw the line. Out of all the possible transformations
> between a string and some other kind of data, which
> ones deserve to be available via this rather strange
> and special interface, and why?
>
Where this kind of unified interface to binary and character transforms
is incredibly handy is in a stacking IO model like the one used in Py3k.
For example, suppose you're using a compressed XML stream to communicate
over a network socket. What this approach allows you to do is have
generic 'transformation' layers in your IO stack, so you can just build
up your IO stack as something like:
XMLParserIO('myschema')
BufferedTextIO('utf-8')
BytesTransform('gzip')
RawSocketIO
To change to a different compression mechanism (e.g. bz2), you just
chance the codec used by the BytesTransform layer from 'gzip' to 'bz2'.
As for how you choose what to provide as codecs... well, that's a major
reason why the codec registry is extensible. The answer is that any
binary or character transform which is useful to the application
programmer can be accessed via the codec API - the only question will be
whether the application programmer will have to write the codec
themselves, or will find it already provided in the standard library.
Cheers,
Nick.
P.S. My original tangential response that didn't actually answer your
question, but may still be useful to some folks:
An actual codec that encodes a character string to a byte sequence, and
decodes a byte sequence back to a character string would be invoked via
the str.encode() and bytes.decode() methods. For example,
mystr.encode('utf-8') to serialise a string using UTF-8,
mybytes.decode('utf-8') to read it back.
A text transform that converts a character string to a different
character string would be invoked via the str.transform() and
str.untransform() methods. For example,
mystr.transform('unicode-escape') to convert unicode characters to their
\u or \U equivalents, mystr.untransform('unicode-escape') to convert
them back to the actual unicode characters.
A binary transform that converts a byte sequence to a different byte
sequence would be invoked via the bytes.transform() and
bytes.untransform() methods. For example, mybytes.transform('gzip') to
compress a byte sequence, mybytes.untransform('gzip') to decompress it.
--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
---------------------------------------------------------------
http://www.boredomandlaziness.org
More information about the Python-3000
mailing list