How do I display unicode value stored in a string variable using ord()
Terry Reedy
tjreedy at udel.edu
Sun Aug 19 17:59:43 EDT 2012
On 8/19/2012 2:11 PM, wxjmfauth at gmail.com wrote:
> Well, it seems some software producers know what they
> are doing.
>
>>>> '€'.encode('cp1252')
> b'\x80'
>>>> '€'.encode('mac-roman')
> b'\xdb'
>>>> '€'.encode('iso-8859-1')
> Traceback (most recent call last):
> File "<eta last command>", line 1, in <module>
> UnicodeEncodeError: 'latin-1' codec can't encode character '\u20ac'
> in position 0: ordinal not in range(256)
Yes, Python lets you choose your byte encoding from those and a hundred
others. I believe all the codecs are now tested in both directions. It
was not an easy task.
As to the examples: Latin-1 dates to 1985 and before and the 1988
version was published as a standard in 1992.
https://en.wikipedia.org/wiki/Latin-1
"The name euro was officially adopted on 16 December 1995."
https://en.wikipedia.org/wiki/Euro
No wonder Latin-1 does not contain the Euro sign. International
standards organizations standards are relatively fixed. (The unicode
consortium will not even correct misspelled character names.) Instead,
new standards with a new number are adopted.
For better or worse, private mappings are more flexible. In its Mac
mapping Apple "replaced the generic currency sign ¤ with the euro sign
€". (See Latin-1 reference.) Great if you use Euros, not so great if you
were using the previous sign for something else.
Microsoft changed an unneeded code to the Euro for Windows cp-1252.
https://en.wikipedia.org/wiki/Windows-1252
"It is very common to mislabel Windows-1252 text with the charset label
ISO-8859-1. A common result was that all the quotes and apostrophes
(produced by "smart quotes" in Microsoft software) were replaced with
question marks or boxes on non-Windows operating systems, making text
difficult to read. Most modern web browsers and e-mail clients treat the
MIME charset ISO-8859-1 as Windows-1252 in order to accommodate such
mislabeling. This is now standard behavior in the draft HTML 5
specification, which requires that documents advertised as ISO-8859-1
actually be parsed with the Windows-1252 encoding.[1]"
Lots of fun. Too bad Microsoft won't push utf-8 so we can all
communicate text with much less chance of ambiguity.
--
Terry Jan Reedy
More information about the Python-list
mailing list