[Python-3000] string module trimming
Jim Jewett
jimjjewett at gmail.com
Sat Apr 28 03:06:06 CEST 2007
On 4/27/07, Jeffrey Yasskin <jyasskin at gmail.com> wrote:
> On 4/18/07, Jim Jewett <jimjjewett at gmail.com> wrote:
> > Agreed. But there aren't 40K (alphabetic) letters in any particular
> > locale. Most individual languages will have less than 100.
> Here's a relevant bunch of data from the CLDR:
> http://www.unicode.org/cldr/data/charts/by_type/misc.exemplarCharacters.html
http://www.unicode.org/Public/UNIDATA/Scripts.txt is also relevant,
but I can't quite interpret it.
There are 5020 "Common" code points. These are mostly non-letters,
but I suppose they could appear in some langauges.
Latin script has 1070 characters; most Latin-script languages use only
a small fraction of them. The standard ASCII alphabet is still only
26 lower + 26 capital, but there are plenty of characters that get
used in some language or other. (The largest single block is 208
letters from LATIN CAPITAL LETTER DZ WITH CARON to LATIN SMALL LETTER
EZH WITH CURL)
-jJ
More information about the Python-3000
mailing list