[Python-3000] PEP 3131 - the details

"Martin v. Löwis" martin at v.loewis.de
Thu May 17 11:10:58 CEST 2007

> One issue I see is that the PEP defines ID_Start and ID_Continue  
> itself. It should not do that, bue instead reference as authoritative  
> the unicode properties ID_Start and ID_Continue defined in the  
> unicode property database.

ID_Start and ID_Continue are derived non-mandatory properties, and I
believe UAX#31 is the one defining these properties. So I thought I
could just copy the definition.

Currently, the Python unicodedata module does not contain a
definition for ID_Start and ID_Continue, so I could not use
it in the PEP.

> ID_Start is officially: Lu+Ll+Lt+Lm+Lo+Nl+Other_ID_Start
> and ID_Continue is officially: ID_Start + Mn+Mc+Nd+Pc +  
> Other_ID_Continue

I know see what 'stability extensions' are which are mentioned
in the PEP (copied from UAX#31). Even though Python currently does
not include  Other_ID_Start and Other_ID_Continue, it could be
made so in the parser.

It would have been nice if UAX#31 had mentioned that the "stability
extensions" are recorded in these properties.

> The only differences between PEP 3131's definition and the official  
> ones is the Other_* bits. Those are there to ensure the requirement  
> that anything now in ID_Start/ID_Continue will always in the future  
> be in said categories. That is an important feature, and should not  
> be overlooked.

See the PEP: there was an XXX remark I still needed to resolve.

> This list is available as part of the PropList.txt file in the  
> unicode data, which ought to be included automatically in python's  
> unicode database so as to get future changes.

This I'm not so sure about. I changed the PEP to say that
Other_ID_{Start|Continue} should be included. Whether the other
properties should be added to the unidata module, I don't know -
I would like to see use cases first before including them.

> I do not believe it is a good idea for python to define its own  
> identifier rules. The rules defined in UAX31 make sense and should be  
> used directly, with only the minor amendment of _ as an allowable  
> start character.

That was my plan indeed.


More information about the Python-3000 mailing list