[Python-3000] PEP 3138- String representation in Python 3000
Joel Bender
jjb5 at cornell.edu
Mon May 19 17:53:11 CEST 2008
Stephen J. Turnbull wrote:
> But why be verbose *and* ignore the vernacular?
>
> gzipped = plaintext.transform('gzip')
> plaintext = gzipped.transform('gunzip')
I'm generally resistant to a registry, none of my applications are so
general that they would take advantage of a
string-key-to-dictionary-to-function-pointer. If they did, they would
have to have some pretty severe constraints on what functions can be
selected, so I would end up building my own context sensitive dictionary
of available functions. I'm in favor of:
gzipped = plaintext.transform(zlib.compress)
plaintext = gzipped.transform(zlib.decompress)
So, you may ask, why would that be any better that this...
gzipped = zlib.compress(plaintext)
...and the answer is that it depends on what you consider the most
appropriate design pattern to follow.
> I think the style should be EIBTI for "private" protocols, and TOOWDTI
> for transforms that wrap well-known libraries.
I've been around socket libraries and protocol encoding/decoding stacks
too long I guess, or I'm just jaded, but TOOWDTI is a pipe dream.
There's Only One Blessed Way To Do It I can understand and appreciate.
EIBTI trumps TOOWDTI when it has to go through a registry. I would be
-1 on this design:
In module codecs:
from gzip import compress as _gzip_compress
...
_registry['gzip'] = _gzip_compress
Where there is a great deal of code that enforces TOOWDTI, effectively
obfuscating the fact that all your passing to transform() nothing more
magical than a reference to a function.
> This is a non-starter, because you don't know what the representation
> of strings is.
If you're working on that kind of application. My applications have to
know what the items in the sequence are, or they have to figure it out,
but when it comes time to do the transformation, they know.
> We could be right-thinking and mandate that in the
> .transform() context the string representation is considered
> big-endian (and for little-endian platforms the bytes are swabbed
> before applying the transformation).
Yuck.
> But that would annoy all the Wintel users because string.transform('zip')
> would produce gobbledgook when unzipped from the command line. And
> of course assuming a little-endian representation is un-right-thinkable.
It would annoy me because mandating the format of the input is up to the
transformation function, not the transform().
y = x.transform(f)
If there is some endian restriction on f, it should detect it and
enforce it, or if it can't, document it. If there is some platform
strangeness, it should take that into account.
> In this sense string-to-string and byte-to-byte *must* be kept
> separate from "true" codecs.
I don't any codecs that aren't true. Some may be more popular or
command than others, and the more popular ones may be blessed by being
presented as easily accessible, just like your gunzip === gzip_to_plaintext.
> I think it would be a very bad idea to allow names to be shared
> for, say, byte-to-byte and string-to-byte "gzip" for the reason
> given above.
I don't agree, only because I've written plenty of functions that can
take a variety of different kinds of inputs as a convenience. If
zlib.compress can take bytes or strings I would be fine with that, and
if I could be more explicit, e.g.,
gzipped = plainbytes.transform(zlib.compress_bytes)
I would be even happier. What is not available in Python that is in
C++, and believe that I don't miss it all THAT much, is a way to select
the appropriate function based on both the input and output.
Annotations would have been a way to do it, but there's far too many
people that don't like it for very good reasons.
> Whether string-to-string and byte-to-byte need to share a namespace is
> another question, but since we already need three (string->byte,
> byte->string, byte->byte) that should be forced not to collide, I
> don't think that there's that big a loss in requiring that
> .transform('pig_latin') (string to string) be spelled differently from
> .transform('pig_latin1') (byte to byte assuming ISO 8859/1 data).
I agree, and I don't think there's an advantage to passing string names.
import piglatin as pig
piggy = mytext.transform(pig.latin1_encode)
I'm -1 on transform.register('pig_latin1', pig.latin1_encode).
> Do you have use cases where byte-to-byte and string-to-string
> transformations should share the same name?
Not in the same module.
Joel
More information about the Python-3000
mailing list