[Python-ideas] Deprecating bytes.swapcase and friends [was: Maintenance burden of str.swapcase]
Guido van Rossum
guido at python.org
Wed Sep 7 16:39:00 CEST 2011
TBH, your experience showed that trying to write "polymorphic" code
manipulating either-str-or-bytes-meaning-text is too ugly to care. I
don't know if the same is true if one were to just set out to
manipulate bytes-meaning-text. FWIW, I haven't changed my mind on
swapcase -- I regret it, but (despite acknowledging your experience)
value the consistency more than the cost of implementing it. I could
live with deprecating it across the board, if only to ease life for
PyPy and others.
On Tue, Sep 6, 2011 at 10:26 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Wed, Sep 7, 2011 at 2:36 PM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
>> I don't know if it's worth the effort to deprecate them, though.
> I could live with a purely documentation based deprecation, although
> I'd prefer to *actually* deprecate at least those four methods on
> bytes and bytearray objects (since we switched mailing lists,
> reproducing the list for reference: 'capitalize', 'istitle',
> 'swapcase', 'title').
>> There is a school of thought (represented on python-dev by Philip Eby
>> and Antoine Pitrou, among others, I would say) that says that text
>> with an implicit encoding is still text if you can figure out what the
>> encoding is, and the syntactically important tokens are invariably
>> ASCII, which often is enough information to do the work. So if you
>> can do some operation without first converting to str, let's save the
>> cycles and the bytes (especially in bit-shoveling applications like
>> WSGI)! I disagree, but "consenting adults" and all that.
> FWIW, I actually used to be in that school myself, *until* I took on
> the task of making more of the urllib.parse APIs take a polymorphic
> bytes-in-bytes-out, str-in-str-out approach for 3.2. The difference in
> complexity between the "right" way (i.e. decoding with the ascii
> codec, manipulating as Unicode, encoding back to bytes with the ascii
> codec) and a hackier approach that tried to manipulate the bytes
> directly was such that I didn't even end up benchmarking the two
> approaches to decide between them - I ended up having zero interest in
> attempting to maintain the latter version, so the implicit
> decode/encode is the version that went into the release.
> That experience pushed me solidly in the direction of arbitrary fast
> ASCII text manipulation without encoding/decoding overhead in Python 3
> being a task for a third party type - neither bytes nor str fit the
> bill. To be really effective, such a type either needs algorithms
> dedicated to using it so that all the associated 'literals' are
> predefined as objects of the relevant type and don't need to worry
> about handling actual strings being passed in or else they need to
> transparently interoperate with builtin str objects.
> The potential viability and utility of such a tagged string type,
> however, isn't a particularly strong argument for anything relating to
> the bytes API - it's pretty clear that Guido's plan to break the
> 8-bit-data-as-text paradigm in Python 3 has succeeded to that extent.
> Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
> Python-ideas mailing list
> Python-ideas at python.org
--Guido van Rossum (python.org/~guido)
More information about the Python-ideas