[Python-ideas] string codes & substring equality

Chris Angelico rosuav at gmail.com
Thu Nov 28 07:05:40 CET 2013


On Thu, Nov 28, 2013 at 4:55 PM, Andrew Barnert <abarnert at yahoo.com> wrote:
> From: Chris Angelico <rosuav at gmail.com>
>> and compare it (which is done with slicing)? Sometimes you don't need
>> a single named way to do exactly what you want[1], you should just
>> build from primitives.
>> [1] http://php.net/manual/en/function.gzgetss.php - why does this exist?
>
> Because PHP. In Python there's one obvious way to do it. In Perl every possible way you could do it works. In PHP, there are three ways you can almost do something like it.

Sure, but the point is still there. I picked up an extreme example by
pointing to PHP, but it's still the same thing: the startswith
function, given more parameters, is effectively equivalent to slicing
and comparing. What is gained by having the method do both jobs in one
wrapper? In this case, the answer might be performance, or it might be
readability, and both can be argued. But it's certainly not a glaring
hole; if startswith could ONLY check the beginning of a string, the
push to _add_ this feature would be quite weak.

> Especially for Unicode, where a character isn't a byte, but an abstract code point that can be represented as at least three different variable-length sequences, taking up to 6 bytes.

No, a character is simply an integer. How it's represented is
immaterial. The easiest representation in Python is a straight int,
the easiest in C is probably also an int (32-bit; if it's 64-bit, you
waste 40-odd bits, but it's still easiest); the variable length byte
representations are for transmission/storage, not for manipulation.

ChrisA


More information about the Python-ideas mailing list