Default scope of variables
Joshua Landau
joshua.landau.ws at gmail.com
Thu Jul 4 22:27:18 EDT 2013
On 5 July 2013 03:03, Dave Angel <davea at davea.name> wrote:
> On 07/04/2013 09:24 PM, Steven D'Aprano wrote:
>> On Thu, 04 Jul 2013 17:54:20 +0100, Rotwang wrote:
>>> It's perhaps worth mentioning that some non-ascii characters are allowed
>>> in identifiers in Python 3, though I don't know which ones.
>>
>> PEP 3131 describes the rules:
>>
>> http://www.python.org/dev/peps/pep-3131/
>
> The isidentifier() method will let you weed out the characters that cannot
> start an identifier. But there are other groups of characters that can
> appear after the starting "letter". So a more reasonable sample might be
> something like:
...
> In particular,
> http://docs.python.org/3.3/reference/lexical_analysis.html#identifiers
>
> has a definition for id_continue that includes several interesting
> categories. I expected the non-ASCII digits, but there's other stuff there,
> like "nonspacing marks" that are surprising.
>
> I'm pretty much speculating here, so please correct me if I'm way off.
For my calculation above, I used this code I quickly mocked up:
> import unicodedata as unidata
> from sys import maxunicode
> from collections import defaultdict
> from itertools import chain
>
> def get():
> xid_starts = set()
> xid_continues = set()
>
> id_start_categories = "Lu, Ll, Lt, Lm, Lo, Nl".split(", ")
> id_continue_categories = "Mn, Mc, Nd, Pc".split(", ")
>
> characters = (chr(n) for n in range(maxunicode + 1))
>
> print("Making normalized characters")
>
> normalized = (unidata.normalize("NFKC", character) for character in characters)
> normalized = set(chain.from_iterable(normalized))
>
> print("Assigning to categories")
>
> for character in normalized:
> category = unidata.category(character)
>
> if category in id_start_categories:
> xid_starts.add(character)
> elif category in id_continue_categories:
> xid_continues.add(character)
>
> return xid_starts, xid_continues
Please note that "xid_continues" actually represents "xid_continue - xid_start".
More information about the Python-list
mailing list