Ah. That makes a lot of sense, actually. Anyway, so then Latin1 strings are memcmp-able, and others are not. That's fine; I'll just add a check for that (I think there are already helper functions for this) and then have two special-case string functions. Thanks!

On Wed, Oct 12, 2016 at 4:08 PM Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:

On Wed, Oct 12, 2016 at 5:57 PM, Elliot Gorokhovsky <elliot.gorokhovsky@gmail.com> wrote:
On Wed, Oct 12, 2016 at 3:51 PM Nathaniel Smith <njs@pobox.com> wrote:
But this isn't relevant to Python's str, because Python's str never uses UTF-8.

Really? I thought in python 3, strings are all unicode... so what encoding do they use, then?

No encoding is used.  The actual code points are stored as integers of the same size.  If all code points are less than 256, they are stored as 8-bit integers (bytes).  If some code points are greater or equal to 256 but less than 65536, they are stored as 16-bit integers and so on.