languages with full unicode support
Dr.Ruud
rvtol+news at isolution.nl
Sat Jul 1 06:51:27 EDT 2006
Chris Uppal schreef:
> Since the interpretation of characters which are yet to be added to
> Unicode is undefined (will they be digits, "letters", operators,
> symbol, punctuation.... ?), there doesn't seem to be any sane way
> that a language could allow an unrestricted choice of Unicode in
> identifiers.
The Perl-code below prints:
xdigit
22 /194522 = 0.011% (lower: 6, upper: 6)
ascii
128 /194522 = 0.066% (lower: 26, upper: 26)
\d
268 /194522 = 0.138%
digit
268 /194522 = 0.138%
IsNumber
612 /194522 = 0.315%
alpha
91183 /194522 = 46.875% (lower: 1380, upper: 1160)
alnum
91451 /194522 = 47.013% (lower: 1380, upper: 1160)
word
91801 /194522 = 47.193% (lower: 1380, upper: 1160)
graph
102330 /194522 = 52.606% (lower: 1380, upper: 1160)
print
102349 /194522 = 52.616% (lower: 1380, upper: 1160)
blank
18 /194522 = 0.009%
space
24 /194522 = 0.012%
punct
374 /194522 = 0.192%
cntrl
6473 /194522 = 3.328%
Especially look at 'word', the same as \w, which for ASCII is
[0-9A-Za-z_].
==8<===================
#!/usr/bin/perl
# Program-Id: unicount.pl
# Subject: show Unicode statistics
use strict ;
use warnings ;
use Data::Alias ;
binmode STDOUT, ':utf8' ;
my @table =
# +--Name------+---qRegexp--------+-C-+-L-+-U-+
(
[ 'xdigit' , qr/[[:xdigit:]]/ , 0 , 0 , 0 ] ,
[ 'ascii' , qr/[[:ascii:]]/ , 0 , 0 , 0 ] ,
[ '\\d' , qr/\d/ , 0 , 0 , 0 ] ,
[ 'digit' , qr/[[:digit:]]/ , 0 , 0 , 0 ] ,
[ 'IsNumber' , qr/\p{IsNumber}/ , 0 , 0 , 0 ] ,
[ 'alpha' , qr/[[:alpha:]]/ , 0 , 0 , 0 ] ,
[ 'alnum' , qr/[[:alnum:]]/ , 0 , 0 , 0 ] ,
[ 'word' , qr/[[:word:]]/ , 0 , 0 , 0 ] ,
[ 'graph' , qr/[[:graph:]]/ , 0 , 0 , 0 ] ,
[ 'print' , qr/[[:print:]]/ , 0 , 0 , 0 ] ,
[ 'blank' , qr/[[:blank:]]/ , 0 , 0 , 0 ] ,
[ 'space' , qr/[[:space:]]/ , 0 , 0 , 0 ] ,
[ 'punct' , qr/[[:punct:]]/ , 0 , 0 , 0 ] ,
[ 'cntrl' , qr/[[:cntrl:]]/ , 0 , 0 , 0 ] ,
) ;
my @codepoints =
(
0x0000 .. 0xD7FF,
0xE000 .. 0xFDCF,
0xFDF0 .. 0xFFFD,
0x10000 .. 0x1FFFD,
0x20000 .. 0x2FFFD,
# 0x30000 .. 0x3FFFD, # etc.
) ;
for my $row ( @table )
{
alias my ($name, $qrx, $count, $lower, $upper) = @$row ;
printf "\n%s\n", $name ;
my $n = 0 ;
for ( @codepoints )
{
local $_ = chr ; # int-2-char conversion
$n++ ;
if ( /$qrx/ )
{
$count++ ;
$lower++ if / [[:lower:]] /x ;
$upper++ if / [[:upper:]] /x ;
}
}
my $show_lower_upper =
($lower || $upper)
? sprintf( " (lower:%6d, upper:%6d)"
, $lower
, $upper
)
: '' ;
printf "%6d /%6d =%7.3f%%%s\n"
, $count
, $n
, 100 * $count / $n
, $show_lower_upper
}
__END__
--
Affijn, Ruud
"Gewoon is een tijger."
More information about the Python-list
mailing list