[Python-3000] PEP 3131 - the details
"Martin v. Löwis"
martin at v.loewis.de
Thu May 17 11:10:58 CEST 2007
> One issue I see is that the PEP defines ID_Start and ID_Continue
> itself. It should not do that, bue instead reference as authoritative
> the unicode properties ID_Start and ID_Continue defined in the
> unicode property database.
ID_Start and ID_Continue are derived non-mandatory properties, and I
believe UAX#31 is the one defining these properties. So I thought I
could just copy the definition.
Currently, the Python unicodedata module does not contain a
definition for ID_Start and ID_Continue, so I could not use
it in the PEP.
> ID_Start is officially: Lu+Ll+Lt+Lm+Lo+Nl+Other_ID_Start
> and ID_Continue is officially: ID_Start + Mn+Mc+Nd+Pc +
> Other_ID_Continue
I know see what 'stability extensions' are which are mentioned
in the PEP (copied from UAX#31). Even though Python currently does
not include Other_ID_Start and Other_ID_Continue, it could be
made so in the parser.
It would have been nice if UAX#31 had mentioned that the "stability
extensions" are recorded in these properties.
> The only differences between PEP 3131's definition and the official
> ones is the Other_* bits. Those are there to ensure the requirement
> that anything now in ID_Start/ID_Continue will always in the future
> be in said categories. That is an important feature, and should not
> be overlooked.
See the PEP: there was an XXX remark I still needed to resolve.
> This list is available as part of the PropList.txt file in the
> unicode data, which ought to be included automatically in python's
> unicode database so as to get future changes.
This I'm not so sure about. I changed the PEP to say that
Other_ID_{Start|Continue} should be included. Whether the other
properties should be added to the unidata module, I don't know -
I would like to see use cases first before including them.
> I do not believe it is a good idea for python to define its own
> identifier rules. The rules defined in UAX31 make sense and should be
> used directly, with only the minor amendment of _ as an allowable
> start character.
That was my plan indeed.
Regards,
Martin
More information about the Python-3000
mailing list