[Python-ideas] Deprecating bytes.swapcase and friends [was: Maintenance burden of str.swapcase]

Nick Coghlan ncoghlan at gmail.com
Wed Sep 7 07:26:03 CEST 2011


On Wed, Sep 7, 2011 at 2:36 PM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> I don't know if it's worth the effort to deprecate them, though.

I could live with a purely documentation based deprecation, although
I'd prefer to *actually* deprecate at least those four methods on
bytes and bytearray objects (since we switched mailing lists,
reproducing the list for reference: 'capitalize', 'istitle',
'swapcase', 'title').

> There is a school of thought (represented on python-dev by Philip Eby
> and Antoine Pitrou, among others, I would say) that says that text
> with an implicit encoding is still text if you can figure out what the
> encoding is, and the syntactically important tokens are invariably
> ASCII, which often is enough information to do the work.  So if you
> can do some operation without first converting to str, let's save the
> cycles and the bytes (especially in bit-shoveling applications like
> WSGI)!  I disagree, but "consenting adults" and all that.

FWIW, I actually used to be in that school myself, *until* I took on
the task of making more of the urllib.parse APIs take a polymorphic
bytes-in-bytes-out, str-in-str-out approach for 3.2. The difference in
complexity between the "right" way (i.e. decoding with the ascii
codec, manipulating as Unicode, encoding back to bytes with the ascii
codec) and a hackier approach that tried to manipulate the bytes
directly was such that I didn't even end up benchmarking the two
approaches to decide between them - I ended up having zero interest in
attempting to maintain the latter version, so the implicit
decode/encode is the version that went into the release.

That experience pushed me solidly in the direction of arbitrary fast
ASCII text manipulation without encoding/decoding overhead in Python 3
being a task for a third party type - neither bytes nor str fit the
bill. To be really effective, such a type either needs algorithms
dedicated to using it so that all the associated 'literals' are
predefined as objects of the relevant type and don't need to worry
about handling actual strings being passed in or else they need to
transparently interoperate with builtin str objects.

The potential viability and utility of such a tagged string type,
however, isn't a particularly strong argument for anything relating to
the bytes API - it's pretty clear that Guido's plan to break the
8-bit-data-as-text paradigm in Python 3 has succeeded to that extent.

Regards,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia



More information about the Python-ideas mailing list