[Python-Dev] len(chr(i)) = 2?
R. David Murray
rdmurray at bitdance.com
Sun Nov 21 20:29:15 CET 2010
On Sun, 21 Nov 2010 10:17:57 -0800, Raymond Hettinger <raymond.hettinger at gmail.com> wrote:
> On Nov 21, 2010, at 9:38 AM, R. David Murray wrote:
> > I'm sorry, but I have to disagree. As a relative unicode ignoramus,
> > "UCS-2" and "UCS-4" convey almost no information to me, and the bits I
> > have heard about them on this list have only confused me.
[...]
> 6rom a users point-of-view, the actual encoding or encoding name
> doesn't matter much. They just need to be able to predict the relevant
> behaviors (memory consumption and len/slicing behavior).
>
> For the narrow build, that behavior is:
> - Characters in the BMP consume 2 bytes and count as one char
> for purposes of len and slicing.
> - Characters above the BMP consume 4 bytes and counts as
> two distinct chars for purpose of len and slicing.
>
> For wide builds, all characters are 4 bytes and count as a single
> char for len and slicing.
>
> Hope this helps,
Thank you, that nicely summarizes and confirms what I thought I knew about
wide versus narrow build. And as I said, using the names UCS-2/UCS-4
would only *confuse* that understanding, not clarify it.
--
R. David Murray www.bitdance.com
More information about the Python-Dev
mailing list