[Python-3000] string C API

Thu Sep 14 22:34:38 CEST 2006

On 9/14/06, Josiah Carlson <jcarlson at uci.edu> wrote:
>
> "Bob Ippolito" <bob at redivi.com> wrote:
> > The argument for UTF-8 is probably interop efficiency. Lots of C
> > libraries, file formats, and wire protocols use UTF-8 for interchange.
> > Verifying the validity of UTF-8 during string creation isn't that big
> > of a deal.
>
> Indeed, UTF-8 validation/creation isn't a big deal.  But that wasn't my
> concern.  My concern was Python-only operation efficiency, for which a
> fixed-length-per-character encoding generally wins (at least for
> operations involving two strings with the same internal encoding).

If you need to know the number of characters often you can calculate
that when the string's contents are validated. Slice ops may become
slower though... but versus UCS-4 the memory and memory bandwidth
savings might actually be a net performance win overall for many
applications.

-bob