[Python-ideas] string codes & substring equality

Thu Nov 28 23:05:51 CET 2013

On 28 November 2013 05:55, Andrew Barnert <abarnert at yahoo.com> wrote:
> Getting back to the original topic, and more seriously, I don't see why this is a problem. Having boxed single-character objects wouldn't be significantly faster than using strings for single characters. Especially for Unicode, where a character isn't a byte, but an abstract code point that can be represented as at least three different variable-length sequences, taking up to 6 bytes. (Not to mention that with Unicode, half the time you want to do things like locale-based collation or searches that treat NFC and NFD the same or searches that treat Cyriliic small Es and Latin small C the same, etc., and if you think you don't, you've probably got a bug in your code. So character objects would be more of an attractive nuisance than a useful thing anyway.)
>
> If your code is really spending a significant amount of time building these single-character strings out of your slices, then any code that iterates over characters in a Python loop is almost certainly going to be way too slow no matter how you optimize it.

Talking about substring comparisons, this isn't true. Building the
string is O(n), comparing it is amortized O(1) for nonequal strings.