[Python-Dev] transform() and untransform() methods, and the codec registry

Alexander Belopolsky alexander.belopolsky at gmail.com
Tue Dec 7 06:57:43 CET 2010


On Tue, Dec 7, 2010 at 12:06 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Tue, Dec 7, 2010 at 2:46 PM, Alexander Belopolsky
> <alexander.belopolsky at gmail.com> wrote:
>> Having all encodings accessible in a str method only promotes a
>> programming style where bytes objects can contain differently encoded
>> strings in different parts of the program.  Instead, well-written
>> programs should decode bytes on input, do all processing with str type
>> and decode on output.  When strings need to be passed to char* C APIs,
>> they should be encoded in UTF-8.  Many C APIs originally designed for
>> ASCII actually produce meaningful results when given  UTF-8 bytes.
>> (Supporting such usage was one of the design goals of UTF-8.)
>
> This world sounds nice, but it isn't the one that exists right now.
> Practicality beats purity and all that :)

.. and default encoding being fixed as UTF-8 already goes 99% of the
way to that world.  As long as I can use encode/decode without an
argument, it does not bother me much that they can take one.  These
methods are also much easier to ignore than the transform/untransform
pair simply because it is only one method per class.
transform/untransform have much larger mental footprint not only
because there are two of them in both str and bytes, but also because
both str and bytes have a synonymously named translate method.  With
43 non-special methods, str interface is already huge.  The
transform() method with a suitable set of codecs could possibly
replace things like expandtabs() or swapcase(), but that would be like
writing x.transform('exp') and x.unstransform('exp') instead of
math.exp(x) and math.log(x).


More information about the Python-Dev mailing list