[Python-Dev] len(chr(i)) = 2?

R. David Murray rdmurray at bitdance.com
Sun Nov 21 20:29:15 CET 2010


On Sun, 21 Nov 2010 10:17:57 -0800, Raymond Hettinger <raymond.hettinger at gmail.com> wrote:
> On Nov 21, 2010, at 9:38 AM, R. David Murray wrote:
> > I'm sorry, but I have to disagree.  As a relative unicode ignoramus,
> > "UCS-2" and "UCS-4" convey almost no information to me, and the bits I
> > have heard about them on this list have only confused me.

[...]

> 6rom a users point-of-view, the actual encoding or encoding name
> doesn't matter much.  They just need to be able to predict the relevant
> behaviors (memory consumption and len/slicing behavior).
> 
> For the narrow build, that behavior is:
> - Characters in the BMP consume 2 bytes and count as one char
>   for purposes of len and slicing.
> - Characters above the BMP consume 4 bytes and counts as
>   two distinct chars for purpose of len and slicing.
> 
> For wide builds, all characters are 4 bytes and count as a single
> char for len and slicing.
> 
> Hope this helps,

Thank you, that nicely summarizes and confirms what I thought I knew about
wide versus narrow build.  And as I said, using the names UCS-2/UCS-4
would only *confuse* that understanding, not clarify it.

--
R. David Murray                                      www.bitdance.com


More information about the Python-Dev mailing list