[Python-Dev] Maintenance burden of str.swapcase

Sun Sep 11 20:49:06 CEST 2011

On 08/09/2011 03:46, Stephen J. Turnbull wrote:
> Glyph Lefkowitz writes:
>   >  On Sep 7, 2011, at 10:26 AM, Stephen J. Turnbull wrote:
>   >
>   >  >  How about "title"?
>   >
>   >  >>>  'content-length'.title()
>   >  'Content-Length'
>   >  

Does anyone *actually* use .title() for this? (And why not just use the 
correct casing in the string literal...)

Michael

>   >  You might say that the protocol "has" to be case-insensitive so
>   >  this is a silly frill:
>
> Not me, sir.  My whole point about the "bytes should be more like str"
> controversy is the dual of that: you don't know what will be coming at
> you, so the regularities and (normally allowable) fuzziness of text
> processing are inadmissible.
>
>   >  there are definitely enough case-sensitive crappy bits of network
>   >  middleware out there that this function is critically important for
>   >  an HTTP server.
>
> "Critically important" is surely an overstatement.  You could always
> title-case the literal strings containing field names in the source.
>
> The problem with having lots of str-like features on bytes is that you
> lose TOOWDTI, or worse, to many performance-happy coders, use of bytes
> becomes TOOWDTI "because none of the characters[sic] I'm planning to
> process myself are non-ASCII".  This is the road to Babel; it's
> workable for one-off scripts but it's asking for long-term trouble in
> multi-module applications.  The choice of decoding to str and
> processing in that form should be made as attractive as possible.
>
> On the other hand, it is undeniably useful for protocol tokens to have
> mnemonic representations even in binary protocols.  Textual
> manipulations on those tokens should be convenient.
>
> It seems to me that what might be an improvement over the current
> situation (maybe for Py4k only, though) is for bytes and
> (PEP-393-style) str to share representation, and have a "cast" method
> which would convert from one to the other, validating that the range
> contraints on the representation are satisfied.  The problem I see is
> that this either sanctions the practice of using latin-1 as "ASCII
> plus anything", which is an unpleasant hack, or you'd need to check in
> text methods that nothing is done with non-ASCII values other than
> checks for set membership (including equality comparison, of course).
>
> OTOH, AFAICS, Antoine's claim that inserting a non-latin-1 character
> in a str that happens to contain only ASCII values would convert the
> representation to multioctets (true), and therefore this doesn't give
> the desired efficiency properties, is beside the point.  Just don't do
> that!  You *can't* do that in a bytes object, anyway; use of str in
> this way is a "consenting adults" issue.  You trade off the
> convenience of the full suite of text tools vs. the possibility that
> somebody might insert such a character -- but for the algorithms
> they're going to be using, they shouldn't be doing that anyway.
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
>

-- 
http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html