Default scope of variables

Dave Angel davea at davea.name
Thu Jul 4 22:03:52 EDT 2013


On 07/04/2013 09:24 PM, Steven D'Aprano wrote:
> On Thu, 04 Jul 2013 17:54:20 +0100, Rotwang wrote:
> [...]
>> Anyway, none of the calculations that has been given takes into account
>> the fact that names can be /less/ than one million characters long.
>
>
> Not in *my* code they don't!!!
>
> *wink*
>
>
>> The
>> actual number of non-empty strings of length at most 1000000 characters,
>> that consist only of ascii letters, digits or underscores, and that
>> don't start with a digit, is
>>
>> sum(53*63**i for i in range(1000000)) == 53*(63**1000000 - 1)//62
>
>
> I take my hat of to you sir, or possibly madam. That is truly an inspired
> piece of pedantry.
>
>
>> It's perhaps worth mentioning that some non-ascii characters are allowed
>> in identifiers in Python 3, though I don't know which ones.
>
> PEP 3131 describes the rules:
>
> http://www.python.org/dev/peps/pep-3131/
>
> For example:
>
> py> import unicodedata as ud
> py> for c in 'é極¿μЖᚃ‰⇄∞':
> ...     print(c, ud.name(c), c.isidentifier(), ud.category(c))
> ...
> é LATIN SMALL LETTER E WITH ACUTE True Ll
> æ LATIN SMALL LETTER AE True Ll
> ¥ YEN SIGN False Sc
> µ MICRO SIGN True Ll
> ¿ INVERTED QUESTION MARK False Po
> μ GREEK SMALL LETTER MU True Ll
> Ж CYRILLIC CAPITAL LETTER ZHE True Lu
> ᚃ OGHAM LETTER FEARN True Lo
> ‰ PER MILLE SIGN False Po
> ⇄ RIGHTWARDS ARROW OVER LEFTWARDS ARROW False So
> ∞ INFINITY False Sm
>
>
>

The isidentifier() method will let you weed out the characters that 
cannot start an identifier.  But there are other groups of characters 
that can appear after the starting "letter".  So a more reasonable 
sample might be something like:

 > py> import unicodedata as ud
 > py> for c in 'é極¿μЖᚃ‰⇄∞':
 > ...     xc = "X" + c
 > ...     print(c, ud.name(c), xc.isidentifier(), ud.category(c))
 > ...

In particular,
     http://docs.python.org/3.3/reference/lexical_analysis.html#identifiers

has a  definition for id_continue that includes several interesting 
categories.  I expected the non-ASCII digits, but there's other stuff 
there, like "nonspacing marks" that are surprising.

I'm pretty much speculating here, so please correct me if I'm way off.

-- 
DaveA




More information about the Python-list mailing list