Syntax idea: escaping names to avoid keyword ambiguity
Following up some of the discussions about the problems of adding keywords and Guido's proposal of making tokenization context-dependent, I wanted to propose an alternate way to go around the problem. My proposal essentially boils down to: 1. The character "$" can be used as a prefix of identifiers. formally, *identifier * ::= ["$"] xid_start https://docs.python.org/3/reference/lexical_analysis.html#grammar-token-xid_... xid_continue https://docs.python.org/3/reference/lexical_analysis.html#grammar-token-xid_...* 2. The "$" character is not part of the name. So the program "foo=3;print($foo)" prints 3. So does the program "$foo=3; print(foo)". Both set an entry to globals["foo"] and keep globals["$foo"] unset. 3. if "$" appears in a token, it's always an identifier. So "$with", "$if", "$return" are all identifiers. If you overcome the "yikes, that looks like awk/bash/perl/php, and I don't like those", and consider it as an escape for "unusual"/"deprecation" situations, I think it's not a bad chose, and allows to a simple solutions to many problems that have been in discussion recently and not so recently. [examples below] For me the benefits of this approach are: - It's very simple to explain how to use and its semantics - It (seems to me it) should be easy to explain to a python apprentice what a "$" means in code they read on a book/blogpost/manual - It's very easy to implement, minimal changes in the tokenizer - It's also easy to implement/integrate in other tools (editors with syntax highlighters, code formatters, etc) - It is easy to see that it's 100% backwards compatible (I understand that "$" has never been used in python before) - It is relatively unsurprising in the sense that other languages are already using $ to label names (there may be some point of confusion to people coming from javascript where "$" is a valid character in names and is not ignored). - It gives python devs and users a clear, easy and universal upgrade path when keywords are added (language designers: Add a __future__ import to enable keyword in python N+1, add warnings to change kw --> $kw in python N+2, and then turn it on by default in python N+3... ; developers: add the import when they want to upgrade , and fix their code with a search&replace when adding the import or after getting a warning). - It allows you to use new features even if some libraries were written for older python versions, depending the deprecation period (this could be improved with sth I'll write in another email, but that's the topic for another proposal) - When clashes occur, which they always do, there's one obvious way to disambiguate (see today the "class_" argument for gettext.translation, the "lambd" argument for random.expovariate, the "class_" filter in libraries like pyquery for CSS class, functions like pyquery, sqlalchemy.sql.operators.as_ , etc. Not counting all the "cls" argument to every classmethod ever) - If we're worried about over proliferation of "$" in code, I'm quite sure given past experience that just a notice in PEP 8 of "only with $ in names to prevent ambiguity" should be more than enough for the community What are the drawbacks that you find in this? Best, Daniel [The rest of this post is just examples] Example 1: Python 3.92 has just added a future import that makes "given" a keyword. Then you can do: # This works because we have no future import *from* hypothesis *import* given, strategies *as* st @given(st.integers()) *def* foo(i): x = f(i)**2 + f(i)**3 .... if you want to use the new feature (or upgraded to python 3.93 and started receiving warnings) you can then change it to: *from* __future__ *import* given_expression *from* hypothesis *import* $given, strategies *as* st @$given(st.integers()) *# If you forget the $ you get a SyntaxError* *def* foo(i): x = z**2 + z**3 *given* z = f(i) .... And also you could do: *from* __future__ *import* given_expression *import* hypothesis @hypothesis.$given(hypothesis.strategies.integers()) *def* foo(i): x = z**2 + z**3 *given* z = f(i) .... Or even, if you want to prevent the "$" all over your code: *from __future__ import given_expressionfrom* hypothesis *import* $given *as* hgiven, strategies *as* st @hgiven(st.integers()) *def* foo(i): x = z**2 + z**3 *given* z = f(i) .... If you have some library which uses a new keyword as a method name (you can't rename those with import ... as ...), it still works perfectly: *from* mylib *import* SomeClass instance = SomeClass() instance.$given("foo") This is also helpful as a universal way to solve name clashes between python keywords and libraries that use some external concept that overlaps with python (from https://pythonhosted.org/pyquery/attributes.html ):
*import* pyquery *as* pq p = pq(*'<p id="hello" class="hello"></p>'*)(*'p'*) p.attr(id=*'hello'*, $class=*'hello2'*) *[
]*
Or even nameclashes within python itself
@classmethod
*def *new_with_color($class, color): *# Instead of the usual cls or class_*
result = $class()
result.set_color(color)
return result
--
https://www.machinalis.co.uk
Daniel Moisset
UK COUNTRY MANAGER
A: 1 Fore Street, EC2Y 9DT London https://goo.gl/maps/pH9BBLgE8dG2
P: +44 7398 827139 <+44+7398+827139>
M: dmoisset@machinalis.com
On 2018 May 14 , at 6:47 a, Daniel Moisset
wrote: Following up some of the discussions about the problems of adding keywords and Guido's proposal of making tokenization context-dependent, I wanted to propose an alternate way to go around the problem.
My main objection to what follows is that it doesn't seem to offer any benefit over the current practice of appending an underscore (_) to a keyword to make it a valid identifier.
My proposal essentially boils down to: • The character "$" can be used as a prefix of identifiers. formally, identifier ::= ["$"] xid_start xid_continue* • The "$" character is not part of the name. So the program "foo=3;print($foo)" prints 3. So does the program "$foo=3; print(foo)". Both set an entry to globals["foo"] and keep globals["$foo"] unset.
What is the benefit here? ``globals`` returns a mapping that represents the global scope, but that doesn't mean the mapping *is* the global scope. Aside from dubious uses of globals()['class'] to sidestep parsing issues, '$class' would still be, for all practical purposes, the "real" name of the identifier.
• if "$" appears in a token, it's always an identifier. So "$with", "$if", "$return" are all identifiers. If you overcome the "yikes, that looks like awk/bash/perl/php, and I don't like those", and consider it as an escape for "unusual"/"deprecation" situations, I think it's not a bad chose, and allows to a simple solutions to many problems that have been in discussion recently and not so recently. [examples below]
The same argument applies to the underscore (or could, since it would be trivial to promise that no future keyword will end with, or even contain, an underscore). Defining new *keywords* to take the $-prefix could be done in a backwards-compatible way, although IMO the $ is too ugly to be a realistic choice. There might be a Unicode character that would make a good prefix, but given the reluctance to extend the core grammar beyond 7-bit ASCII, I've haven't spent any time looking for good candidates.
For me the benefits of this approach are: • It's very simple to explain how to use and its semantics
• It (seems to me it) should be easy to explain to a python apprentice what a "$" means in code they read on a book/blogpost/manual
• It's very easy to implement, minimal changes in the tokenizer
• It's also easy to implement/integrate in other tools (editors with syntax highlighters, code formatters, etc) • It is easy to see that it's 100% backwards compatible (I understand that "$" has never been used in python before)
The above 5 points all apply to appending an underscore to a keyword to create a valid identifier.
• It is relatively unsurprising in the sense that other languages are already using $ to label names (there may be some point of confusion to people coming from javascript where "$" is a valid character in names and is not ignored).
Given the variation in how other languages use $, I'm not sure this is a point in favor. There are plenty of questions on Stack Overflow about how and when to use $ in bash, and much of the confusion appears to stem from how $ is used in Perl. And that ignores the cases where $ is either optional (arithmetic expressions) or *should not* (the read built-in, the -v conditional operator, etc) be used. For that matter, why make $ special and restrict it to prefix position, instead of simply allowing $ as a valid identifier character and declaring that no keyword will ever use it?
• It gives python devs and users a clear, easy and universal upgrade path when keywords are added (language designers: Add a __future__ import to enable keyword in python N+1, add warnings to change kw --> $kw in python N+2, and then turn it on by default in python N+3... ; developers: add the import when they want to upgrade , and fix their code with a search&replace when adding the import or after getting a warning).
Nothing about this process is specific to the $-prefix; it applies just as well to the practice of appending _ when it becomes necessary.
• It allows you to use new features even if some libraries were written for older python versions, depending the deprecation period (this could be improved with sth I'll write in another email, but that's the topic for another proposal)
I'll withhold judgement on this point, then, but it's not obvious how this allows an old library that uses a newly-minted keyword as an identifier name to continue working.
• When clashes occur, which they always do, there's one obvious way to disambiguate (see today the "class_" argument for gettext.translation, the "lambd" argument for random.expovariate, the "class_" filter in libraries like pyquery for CSS class, functions like pyquery, sqlalchemy.sql.operators.as_ , etc. Not counting all the "cls" argument to every classmethod ever)
Any one of these practices could be declared the One True Way without modifying the grammar. You're just adding another one. (https://xkcd.com/927/)
• If we're worried about over proliferation of "$" in code, I'm quite sure given past experience that just a notice in PEP 8 of "only with $ in names to prevent ambiguity" should be more than enough for the community
And finally, this applies to the trailing underscore as well. -- Clint
On 14 May 2018 at 15:02, Clint Hepner
On 2018 May 14 , at 6:47 a, Daniel Moisset
wrote: Following up some of the discussions about the problems of adding keywords and Guido's proposal of making tokenization context-dependent, I wanted to propose an alternate way to go around the problem.
My main objection to what follows is that it doesn't seem to offer any benefit over the current practice of appending an underscore (_) to a keyword to make it a valid identifier.
There is a key difference: if an optional keyword is added (through a "from
__future__ import some_keyword"), I still can use in a simple ways names
from modules that do not have the future import enabled.
Using just an underscore suffix or similar name *change* is fine when the
language is static. But when the language changes, and I can not modify all
the third party libraries, being able to refer to *the original name*
instead of a modified one is a significant need that can not be covered by
"just add a _ suffix".
Best,
D.
--
https://www.machinalis.co.uk
Daniel Moisset
UK COUNTRY MANAGER
A: 1 Fore Street, EC2Y 9DT London https://goo.gl/maps/pH9BBLgE8dG2
P: +44 7398 827139 <+44+7398+827139>
M: dmoisset@machinalis.com
On 14 May 2018 at 15:02, Clint Hepner
On 2018 May 14 , at 6:47 a, Daniel Moisset
wrote: Following up some of the discussions about the problems of adding keywords and Guido's proposal of making tokenization context-dependent, I wanted to propose an alternate way to go around the problem.
My main objection to what follows is that it doesn't seem to offer any benefit over the current practice of appending an underscore (_) to a keyword to make it a valid identifier.
A secondary benefit I forgot to mention in my previous reply (and which is
not covered by adding an underscore suffix) is that the original name being
preserved it's better for interfacing with external systems that may use
those names:
For example, this code:
https://github.com/dsc/pyquery/blob/fd725469701a8f47840f142df5c9b6d4479ea58e...
(and some code below that) is essentially a workaround that is required
because they used a "_" suffix ibecause it's hard to pass "class" or "for"
as keys in **kwargs
--
https://www.machinalis.co.uk
Daniel Moisset
UK COUNTRY MANAGER
A: 1 Fore Street, EC2Y 9DT London https://goo.gl/maps/pH9BBLgE8dG2
P: +44 7398 827139 <+44+7398+827139>
M: dmoisset@machinalis.com
On 2018-05-14 06:47 AM, Daniel Moisset wrote:
Following up some of the discussions about the problems of adding keywords and Guido's proposal of making tokenization context-dependent, I wanted to propose an alternate way to go around the problem.
My proposal essentially boils down to:
1. The character "$" can be used as a prefix of identifiers. formally, *identifier * ::= ["$"] |xid_start| https://docs.python.org/3/reference/lexical_analysis.html#grammar-token-xid_... |xid_continue| https://docs.python.org/3/reference/lexical_analysis.html#grammar-token-xid_...* 2. The "$" character is not part of the name. So the program "foo=3;print($foo)" prints 3. So does the program "$foo=3; print(foo)". Both set an entry to globals["foo"] and keep globals["$foo"] unset. 3. if "$" appears in a token, it's always an identifier. So "$with", "$if", "$return" are all identifiers.
If you overcome the "yikes, that looks like awk/bash/perl/php, and I don't like those", and consider it as an escape for "unusual"/"deprecation" situations, I think it's not a bad chose, and allows to a simple solutions to many problems that have been in discussion recently and not so recently. [examples below]
For me the benefits of this approach are:
* It's very simple to explain how to use and its semantics * It (seems to me it) should be easy to explain to a python apprentice what a "$" means in code they read on a book/blogpost/manual * It's very easy to implement, minimal changes in the tokenizer * It's also easy to implement/integrate in other tools (editors with syntax highlighters, code formatters, etc) * It is easy to see that it's 100% backwards compatible (I understand that "$" has never been used in python before) * It is relatively unsurprising in the sense that other languages are already using $ to label names (there may be some point of confusion to people coming from javascript where "$" is a valid character in names and is not ignored). * It gives python devs and users a clear, easy and universal upgrade path when keywords are added (language designers: Add a __future__ import to enable keyword in python N+1, add warnings to change kw --> $kw in python N+2, and then turn it on by default in python N+3... ; developers: add the import when they want to upgrade , and fix their code with a search&replace when adding the import or after getting a warning). * It allows you to use new features even if some libraries were written for older python versions, depending the deprecation period (this could be improved with sth I'll write in another email, but that's the topic for another proposal) * When clashes occur, which they always do, there's one obvious way to disambiguate (see today the "class_" argument for gettext.translation, the "lambd" argument for random.expovariate, the "class_" filter in libraries like pyquery for CSS class, functions like pyquery, sqlalchemy.sql.operators.as_ , etc. Not counting all the "cls" argument to every classmethod ever) * If we're worried about over proliferation of "$" in code, I'm quite sure given past experience that just a notice in PEP 8 of "only with $ in names to prevent ambiguity" should be more than enough for the community
What are the drawbacks that you find in this? Best, Daniel
For the record, C# does something similar to help interface with other CLR languages with different keywords: any token starting with @ is an identifier even if the unprefixed token would be a reserved keyword. More on that: https://ericlippert.com/2013/09/09/verbatim-identifiers/ Alex
On 5/14/2018 10:02 AM, Clint Hepner wrote:
On 2018 May 14 , at 6:47 a, Daniel Moisset
wrote: Following up some of the discussions about the problems of adding keywords and Guido's proposal of making tokenization context-dependent, I wanted to propose an alternate way to go around the problem.
My main objection to what follows is that it doesn't seem to offer any benefit over the current practice of appending an underscore (_) to a keyword to make it a valid identifier.
Tkinter uses this convention for a few option names that clash. -- Terry Jan Reedy
I can only think of three ways to reference a name defined in a different
file: In an import
statement, as properties of objects and as keyword arguments.
Import statements are implicit assignments, so if Python allowed the
following grammar,
you could still import the odd thing that had a reserved name, without
bringing that name
into your local namespace.
from <keyword> import <keyword> as <name>
Property names always follow a dot, where only a name is valid, so Python
could allow
this too:
<expression>.<keyword>
Keyword arguments are also generally unambiguous, as they have to appear
within the
parens of an invocation, before the equals sign:
<expression>(<keyword>=<expression>)
If Python allowed those three examples (but still prevented users from
*defining* names
that are keywords) new keywords could be introduced without breaking old
code , and the
language would only require relatively minor tweaking.
-- Carl Smith
carl.input@gmail.com
On 14 May 2018 at 19:11, Terry Reedy
On 5/14/2018 10:02 AM, Clint Hepner wrote:
On 2018 May 14 , at 6:47 a, Daniel Moisset
wrote:
Following up some of the discussions about the problems of adding keywords and Guido's proposal of making tokenization context-dependent, I wanted to propose an alternate way to go around the problem.
My main objection to what follows is that it doesn't seem to offer any benefit over the current practice of appending an underscore (_) to a keyword to make it a valid identifier.
Tkinter uses this convention for a few option names that clash.
-- Terry Jan Reedy
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Just to be clear, if `foo` was introduced as a new infix operator, projects
that used `foo`
as a name would not be able to also use `foo` as an infix operator in the
file that defines
`foo` as a name, but could use the operator throughout the rest of their
project.
-- Carl Smith
carl.input@gmail.com
On 14 May 2018 at 21:52, Carl Smith
I can only think of three ways to reference a name defined in a different file: In an import statement, as properties of objects and as keyword arguments.
Import statements are implicit assignments, so if Python allowed the following grammar, you could still import the odd thing that had a reserved name, without bringing that name into your local namespace.
from <keyword> import <keyword> as <name>
Property names always follow a dot, where only a name is valid, so Python could allow this too:
<expression>.<keyword>
Keyword arguments are also generally unambiguous, as they have to appear within the parens of an invocation, before the equals sign:
<expression>(<keyword>=<expression>)
If Python allowed those three examples (but still prevented users from *defining* names that are keywords) new keywords could be introduced without breaking old code , and the language would only require relatively minor tweaking.
-- Carl Smith carl.input@gmail.com
On 14 May 2018 at 19:11, Terry Reedy
wrote: On 5/14/2018 10:02 AM, Clint Hepner wrote:
On 2018 May 14 , at 6:47 a, Daniel Moisset
wrote:
Following up some of the discussions about the problems of adding keywords and Guido's proposal of making tokenization context-dependent, I wanted to propose an alternate way to go around the problem.
My main objection to what follows is that it doesn't seem to offer any benefit over the current practice of appending an underscore (_) to a keyword to make it a valid identifier.
Tkinter uses this convention for a few option names that clash.
-- Terry Jan Reedy
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Sorry to think out loud, but if the lexer marked `foo` as a generic `Word`
token, that
could be a keyword or a name, then the parser could look at the value of
each `Word`
token, and if the context is `import foo`, `class foo...`, `def foo...` or
`foo = ...`,
then `foo` is a name there and thereafter (and `x foo y` is a SyntaxError)
else...
you get the idea.
-- Carl Smith
carl.input@gmail.com
On 14 May 2018 at 22:06, Carl Smith
Just to be clear, if `foo` was introduced as a new infix operator, projects that used `foo` as a name would not be able to also use `foo` as an infix operator in the file that defines `foo` as a name, but could use the operator throughout the rest of their project.
-- Carl Smith carl.input@gmail.com
On 14 May 2018 at 21:52, Carl Smith
wrote: I can only think of three ways to reference a name defined in a different file: In an import statement, as properties of objects and as keyword arguments.
Import statements are implicit assignments, so if Python allowed the following grammar, you could still import the odd thing that had a reserved name, without bringing that name into your local namespace.
from <keyword> import <keyword> as <name>
Property names always follow a dot, where only a name is valid, so Python could allow this too:
<expression>.<keyword>
Keyword arguments are also generally unambiguous, as they have to appear within the parens of an invocation, before the equals sign:
<expression>(<keyword>=<expression>)
If Python allowed those three examples (but still prevented users from *defining* names that are keywords) new keywords could be introduced without breaking old code , and the language would only require relatively minor tweaking.
-- Carl Smith carl.input@gmail.com
On 14 May 2018 at 19:11, Terry Reedy
wrote: On 5/14/2018 10:02 AM, Clint Hepner wrote:
On 2018 May 14 , at 6:47 a, Daniel Moisset
wrote:
Following up some of the discussions about the problems of adding keywords and Guido's proposal of making tokenization context-dependent, I wanted to propose an alternate way to go around the problem.
My main objection to what follows is that it doesn't seem to offer any benefit over the current practice of appending an underscore (_) to a keyword to make it a valid identifier.
Tkinter uses this convention for a few option names that clash.
-- Terry Jan Reedy
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
So does NumPy and sckit-learn use the trailing underscore convention.
Albeit, sklearn uses it for (almost) all the model attributes, not just
those it thinks might clash.
On Mon, May 14, 2018, 2:12 PM Terry Reedy
On 5/14/2018 10:02 AM, Clint Hepner wrote:
On 2018 May 14 , at 6:47 a, Daniel Moisset
wrote:
Following up some of the discussions about the problems of adding
keywords and Guido's proposal of making tokenization context-dependent, I wanted to propose an alternate way to go around the problem.
My main objection to what follows is that it doesn't seem to offer any benefit over the current practice of appending an underscore (_) to a keyword to make it a valid identifier.
Tkinter uses this convention for a few option names that clash.
-- Terry Jan Reedy
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
participants (6)
-
Alexandre Brault
-
Carl Smith
-
Clint Hepner
-
Daniel Moisset
-
David Mertz
-
Terry Reedy