[Python-Dev] transform() and untransform() methods, and the codec registry

Tue Dec 7 06:57:43 CET 2010

On Tue, Dec 7, 2010 at 12:06 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Tue, Dec 7, 2010 at 2:46 PM, Alexander Belopolsky
> <alexander.belopolsky at gmail.com> wrote:
>> Having all encodings accessible in a str method only promotes a
>> programming style where bytes objects can contain differently encoded
>> strings in different parts of the program.  Instead, well-written
>> programs should decode bytes on input, do all processing with str type
>> and decode on output.  When strings need to be passed to char* C APIs,
>> they should be encoded in UTF-8.  Many C APIs originally designed for
>> ASCII actually produce meaningful results when given  UTF-8 bytes.
>> (Supporting such usage was one of the design goals of UTF-8.)
>
> This world sounds nice, but it isn't the one that exists right now.
> Practicality beats purity and all that :)

.. and default encoding being fixed as UTF-8 already goes 99% of the
way to that world.  As long as I can use encode/decode without an
argument, it does not bother me much that they can take one.  These
methods are also much easier to ignore than the transform/untransform
pair simply because it is only one method per class.
transform/untransform have much larger mental footprint not only
because there are two of them in both str and bytes, but also because
both str and bytes have a synonymously named translate method.  With
43 non-special methods, str interface is already huge.  The
transform() method with a suitable set of codecs could possibly
replace things like expandtabs() or swapcase(), but that would be like
writing x.transform('exp') and x.unstransform('exp') instead of
math.exp(x) and math.log(x).