trying to strip out non ascii.. or rather convert non ascii
wxjmfauth at gmail.com
wxjmfauth at gmail.com
Mon Oct 28 10:01:16 EDT 2013
Le dimanche 27 octobre 2013 04:21:46 UTC+1, Nobody a écrit :
>
>
>
> Simply ignoring diactrics won't get you very far.
>
>
Right. As an example, these four French words :
cote, côte, coté, côté .
>
> Most languages which use diactrics have standard conversions, e.g.
>
> ö -> oe, which are likely to be used by anyone familiar with the
>
> language e.g. when using software (or a keyboard) which can't handle
>
> diactrics.
>
>
I'm quite confortable with Unicode, esp. with the
Latin blocks.
Except this German case (I remember very old typewriters),
what are the other languages presenting this kind of
allowed feature ?
Just as a reminder. They are 1272 characters considered
as Latin characters (how to count them it not a simple
task), and if my knowledge is correct, they are covering
and/or are here to cover the 17 languages, to be exact,
the 17 European languages based on a Latin alphabet which
can not be covered with iso-8859-1.
And of course, logically, they are very, very badly handled
with the Flexible String Representation.
jmf
More information about the Python-list
mailing list