How do I display unicode value stored in a string variable using ord()
no.email at nospam.invalid
Sun Aug 19 19:48:06 CEST 2012
Terry Reedy <tjreedy at udel.edu> writes:
>> Meanwhile, an example of the 393 approach failing:
> I am completely baffled by this, as this example is one where the 393
> approach potentially wins.
What? The 393 approach is supposed to avoid memory bloat and that
does the opposite.
>> I was involved in a project that dealt with terabytes of OCR data of
>> mostly English text. So the chars were mostly ascii,
> 3.3 stores ascii pages 1 byte/char rather than 2 or 4.
But they are not ascii pages, they are (as stated) MOSTLY ascii.
E.g. the characters are 99% ascii but 1% non-ascii, so 393 chooses
a much more memory-expensive encoding than UTF-8.
> I doubt that there are really any non-bmp chars.
You may be right about this. I thought about it some more after
posting and I'm not certain that there were supplemental characters.
> As Steven said, reject such false identifications.
Reject them how?
>> That's a natural for UTF-8
> 3.3 would convert to utf-8 for storage on disk.
They are already in utf-8 on disk though that doesn't matter since
they are also compressed.
>> but the PEP-393 approach would bloat up the memory
>> requirements by a factor of 4.
> 3.2- wide builds would *always* use 4 bytes/char. Is not occasionally
> better than always?
The bloat is in comparison with utf-8, in that example.
> That looks like a 3.2- narrow build. Such which treat unicode strings
> as sequences of code units rather than sequences of codepoints. Not an
> implementation bug, but compromise design that goes back about a
> decade to when unicode was added to Python.
I thought the whole point of Python 3's disruptive incompatibility with
Python 2 was to clean up past mistakes and compromises, of which unicode
headaches was near the top of the list. So I'm surprised they seem to
repeated a mistake there.
> I would call it O(k), where k is a selectable constant. Slowing access
> by a factor of 100 is hardly acceptable to me.
If k is constant then O(k) is the same as O(1). That is how O notation
works. I wouldn't believe the 100x figure without seeing it measured in
More information about the Python-list