Unicode in Python
wxjmfauth at gmail.com
wxjmfauth at gmail.com
Tue Apr 22 11:28:32 EDT 2014
Le mardi 22 avril 2014 14:21:40 UTC+2, Steven D'Aprano a 茅crit :
> On Tue, 22 Apr 2014 02:07:58 -0700, wxjmfauth wrote:
>
>
>
> > Le mardi 22 avril 2014 08:30:45 UTC+2, Rustom Mody a 茅crit :
>
> >>
>
> >>
>
> >>
>
> >>
>
> > @ rusy
>
> >
>
> >> "Ive reworded it to make it clear that I am referring to the
>
> > character-sets and not encodings."
>
> >
>
> > Very good, excellent, comment. An healthy coding scheme can only work
>
> > properly with a unique characters set and the coding is achieved with
>
> > the help of a unique operator. There is no other way to do it and that's
>
> > the reason why we have to live today with all these coding schemes
>
> > (unicode or not). Note: A coding scheme can be much more complex than
>
> > the coding of "raw" characters (eg. CID fonts).
>
> >> "So instead of using 位 (0x3bb) we should use 饾潃 (0x1d740) or
>
> >> something thereabouts like 饾渾"
>
>
>
> For those who cannot see them, they are:
>
>
>
> py> unicodedata.name('\U0001d740')
>
> 'MATHEMATICAL BOLD ITALIC SMALL LAMDA'
>
> py> unicodedata.name('\U0001d706')
>
> 'MATHEMATICAL ITALIC SMALL LAMDA'
>
>
>
>
>
> ("LAMDA" is the official Unicode name for Lambda.)
>
>
>
>
>
> > This is a very good understanding of unicode. The letter lambda is not
>
> > the mathematical symbole lambda. Another example, the micro sign is not
>
> > the greek letter mu which is not the mathematical mu.
>
>
>
> Depends what you mean by "is not". The micro sign is a legacy
>
> compatibility character, we shouldn't use it except for compatibility
>
> with legacy (non-Unicode) character sets. Instead, we should use the NFKC
>
> or NFKD normalization forms to convert it to the recommended character.
>
>
>
>
>
> py> import unicodedata
>
> py> a = '\N{GREEK SMALL LETTER MU}' # Preferred
>
> py> b = '\N{MICRO SIGN}' # Legacy
>
> py> a == b
>
> False
>
> py> unicodedata.normalize('NFKD', b) == a
>
> True
>
> py> unicodedata.normalize('NFKC', b) == a
>
> True
>
>
>
> As for the mathematical mu, there is no separate Unicode "maths symbol
>
> mu" so far as I am aware. One would simply use '\N{MICRO SIGN}' or
>
> '\N{GREEK SMALL LETTER MU}' to get a 渭.
>
>
>
> Likewise, the 位 used in mathematics is the Greek letter 位, not a separate
>
> symbol, just like the Latin letter x and the x used in mathematics are
>
> the same.
>
>
Normalization is working fine, but it proofs nothing, it
has to use some convention.
There are several code points ranges (latin + greek), which can
be used for mathematical purpose (different mu's).
If you are interested, search for "unimath-symbols.pdf"
on CTAN (I have all this stuff on my hd).
...
"Likewise, the 位 used in mathematics is the Greek letter 位, not a separate
symbol, just like the Latin letter x and the x used in mathematics are
the same. "... just like the Latin letter x and the x used in mathematics
are the same.
...
Oh! Definitively not. A tool with an unicode engine able to
produce "math text" will certainly not use the same code point
for a "textual x" or for a "mathematical x", even if one
enter/type/hit the same "x".
To be exaggeratedly stict, the real question is to know
if a used "lambda" or "x" belongs to a "math unicode range"
or not. This is quite a different approach. (Please no
confusion with a "text litteral variable x").
A text processing tool will notice the difference, it will
use different fonts.
jmf
More information about the Python-list
mailing list