[Python-Dev] Maintenance burden of str.swapcase
Michael Foord
fuzzyman at voidspace.org.uk
Sun Sep 11 20:49:06 CEST 2011
On 08/09/2011 03:46, Stephen J. Turnbull wrote:
> Glyph Lefkowitz writes:
> > On Sep 7, 2011, at 10:26 AM, Stephen J. Turnbull wrote:
> >
> > > How about "title"?
> >
> > >>> 'content-length'.title()
> > 'Content-Length'
> >
Does anyone *actually* use .title() for this? (And why not just use the
correct casing in the string literal...)
Michael
> > You might say that the protocol "has" to be case-insensitive so
> > this is a silly frill:
>
> Not me, sir. My whole point about the "bytes should be more like str"
> controversy is the dual of that: you don't know what will be coming at
> you, so the regularities and (normally allowable) fuzziness of text
> processing are inadmissible.
>
> > there are definitely enough case-sensitive crappy bits of network
> > middleware out there that this function is critically important for
> > an HTTP server.
>
> "Critically important" is surely an overstatement. You could always
> title-case the literal strings containing field names in the source.
>
> The problem with having lots of str-like features on bytes is that you
> lose TOOWDTI, or worse, to many performance-happy coders, use of bytes
> becomes TOOWDTI "because none of the characters[sic] I'm planning to
> process myself are non-ASCII". This is the road to Babel; it's
> workable for one-off scripts but it's asking for long-term trouble in
> multi-module applications. The choice of decoding to str and
> processing in that form should be made as attractive as possible.
>
> On the other hand, it is undeniably useful for protocol tokens to have
> mnemonic representations even in binary protocols. Textual
> manipulations on those tokens should be convenient.
>
> It seems to me that what might be an improvement over the current
> situation (maybe for Py4k only, though) is for bytes and
> (PEP-393-style) str to share representation, and have a "cast" method
> which would convert from one to the other, validating that the range
> contraints on the representation are satisfied. The problem I see is
> that this either sanctions the practice of using latin-1 as "ASCII
> plus anything", which is an unpleasant hack, or you'd need to check in
> text methods that nothing is done with non-ASCII values other than
> checks for set membership (including equality comparison, of course).
>
> OTOH, AFAICS, Antoine's claim that inserting a non-latin-1 character
> in a str that happens to contain only ASCII values would convert the
> representation to multioctets (true), and therefore this doesn't give
> the desired efficiency properties, is beside the point. Just don't do
> that! You *can't* do that in a bytes object, anyway; use of str in
> this way is a "consenting adults" issue. You trade off the
> convenience of the full suite of text tools vs. the possibility that
> somebody might insert such a character -- but for the algorithms
> they're going to be using, they shouldn't be doing that anyway.
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
>
--
http://www.voidspace.org.uk/
May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html
More information about the Python-Dev
mailing list