[Python-ideas] string codes & substring equality

Andrew Barnert abarnert at yahoo.com
Thu Nov 28 06:55:47 CET 2013


From: Chris Angelico <rosuav at gmail.com>



> and compare it (which is done with slicing)? Sometimes you don't need
> a single named way to do exactly what you want[1], you should just
> build from primitives.

…


> [1] http://php.net/manual/en/function.gzgetss.php - why does this exist?


Because PHP. In Python there's one obvious way to do it. In Perl every possible way you could do it works. In PHP, there are three ways you can almost do something like it.

Getting back to the original topic, and more seriously, I don't see why this is a problem. Having boxed single-character objects wouldn't be significantly faster than using strings for single characters. Especially for Unicode, where a character isn't a byte, but an abstract code point that can be represented as at least three different variable-length sequences, taking up to 6 bytes. (Not to mention that with Unicode, half the time you want to do things like locale-based collation or searches that treat NFC and NFD the same or searches that treat Cyriliic small Es and Latin small C the same, etc., and if you think you don't, you've probably got a bug in your code. So character objects would be more of an attractive nuisance than a useful thing anyway.)

If your code is really spending a significant amount of time building these single-character strings out of your slices, then any code that iterates over characters in a Python loop is almost certainly going to be way too slow no matter how you optimize it.



More information about the Python-ideas mailing list