[Python-3000] PEP 3138- String representation in Python 3000

Tue May 20 00:27:46 CEST 2008

Guido van Rossum writes:

 > Hm, Martin is pretty convincing here. Before we go ahead and accept
 > .transform() and friends (by whatever name) we should look for
 > convincing use cases where the transformation is typically given by
 > some other input, rather than hard-coded in the app. (And cases where
 > there are two or three possibilities from a fixed menu don't count --
 > so that would rule out Content-transfer-encoding.)

I don't understand the motivation for this restriction.  I think we do
not want to share names across categories, so the size of any given
category is not important, it's the whole registry that is useful.  If
people want to filter on category, the registry entries could be given
a 'category' attribute.

Aside from that, the kind of application I have in mind is indeed
something like the email module and its clients (like Mailman).
Things like

language_charset_map = { 'japanese' : 'iso-2022-jp',
                         'english' : 'iso-8859-1',
                         'russian' : 'koi8-r',
                         ... }

charset_transfer_encoding_map = { 'iso-2022-jp' : 'base64',
                                  'iso-8859-1' : 'quoted-printable',
                                  'koi8-r' : 'base64',
                                  ... }

mime_type_compression_map = { 'text/plain' : None,
                              'img/bmp' : 'gzip',
                              ... }

with the almost obvious definition of transform_mime_body().

This kind of table is often given in a file accessed by non-Python-
programmers.  For example, for encodings that are not mostly ASCII,
gzipped base64 may be a very economical way to transmit (and store) a
text part.  However, a non-English list that transmits a lot of code
might prefer quoted-printable to allow the code to be analyzed by some
kind of robot (obviously a legacy app!), and many lists will have
strong preferences between UTF-8 and a legacy encoding.  Japanese
companies often have corporate encodings containing characters not
available in JIS (and sometimes not in Unicode).  A list dedicated to
image processing may want to add image/* formats that haven't yet been
registered with the IANA, etc.

On the Mailman lists it is a FAQ that people don't understand the
difference between 'None' and None.  I don't think we can avoid None,
True, and False, but for many Mailman admins the difference between
'gzip' and Compressors.gzip.compress is non-obvious and annoying.
Giving string names to all these transforms would make the
administration interface perceptibly more regular.

On the other hand, suppose we have a web interface for configuration
so that the admins don't ever see the difference between a codec
registry key and a Python identifier.  Do we want to expose all the
possible compressors, codecs, transfer encodings, and what not in the
module that provides the configuration UI so that the list of names
can be provided?  How does the web interface avoid needing to know all
of those in advance?  How does the web interface know which functions
are which (eg, compressor v. decompressor)?

Of course the same questions apply to a registry, but as functionality
(answers to those questions) is added to the registry, the changes
needed to take advantage of it are much more localized and less
invasive than, say, requiring "compressors" to provide "compress" and
"uncompress" functions or methods, and a standard set of options.

The main thing that I sympathize with in Martin's post is the issue of
options to transforms, but it seems to me that keyword arguments deal
with that clearly and flexibly.