How do I display unicode value stored in a string variable using ord()
Paul Rubin
no.email at nospam.invalid
Mon Aug 20 02:24:55 EDT 2012
Steven D'Aprano <steve+comp.lang.python at pearwood.info> writes:
> Paul Rubin already told you about his experience using OCR to generate
> multiple terrabytes of text, and how he would not be happy if that was
> stored in UCS-4.
That particular text was stored on disk as compressed XML that had UTF-8
in the data fields, but I think Roy is right that it would have
compressed to around the same size in UCS-4. Converting it to UCS-4 on
input would have bloated up the memory footprint and that was the issue
of concern to me.
> Pittance or not, I do not believe that people will widely abandon compact
> storage formats like UTF-8 and Latin-1 for UCS-4 any time soon.
Looking at http://www.icu-project.org/ the C++ classes seem to use
UTF-16 sort like Python 3.2 :(. I'm not certain of this though.
More information about the Python-list
mailing list