On Wed May 16 20:53:46 EDT 2018, Steven D'Aprano wrote:
> On Wed, May 16, 2018 at 07:24:19PM +0200, Adam Bartoš wrote:
>> Hello,
>>
>> I have yet another idea regarding the the clashes between new keywords
and
>> already used names. How about introducing two new keywords *wink* that
>> would serve as lexical keyword/nonkeyword declarations, similarly to
>> nonlocal and global declarations?
>>
>> def f():
>> nonkeyword if
>> if = 2 # we use 'if' as an identifier
>> def g():
>> keyword if
>> if x > 0: pass # now 'if' again introduces a conditional
statement
>
> This is absolutely no help at all for the common case that we have an
> identifier that is a keyword and want to use it as a keyword in the same
> block. For example, we can currently write:
>
> try:
> value = data.except_
> except:
> value = data.missing()
>
> say. We're using "except_" because the data comes from some external
> interface where it uses "except", but we can't use that because its a
> keyword.
If it was possible to consider any syntactical block for the declarations,
this use case could be handled like:
try:
nonkeyword except
value = data.except
except:
value = data.missing()
Alternatively, data.except could be allowed even without a declaration as
another idea suggests. There are syntactical positions where it is easy to
distinguish between a keyword and an identifier, there are positions where
it is harder, or even impossible due to limitations on the parser
complexity. So both ideas may be viewed as complementary, and the base case
of my proposal would be just a per module opt out from a keyword, so it is
easier to fix such module when a new keyword is introduced in the language.
> I also challenge to think about how you will document the complicated
> rules for when you can and can't use keywords as names, especially think
> about explaining them to beginners:
>
> def spam(None=42):
> print(None) # What will this print?
> x = None # Fine, but what does it do?
> None = 999 # Is this an error or not?
None is a different kind of keyword. Ordinary keywords that make the
statements of the language have no value, so 'x = if' makes no sense. On
the other hand, None, True, and False are singleton values (like Ellispsis
and NotImplemented) that are additionally protected from be redefined,
which makes them similar to keywords. I think that the barrier for adding
new keyword constants is even higher than the barrier for adding new
ordinary keywords, and so may be omited when trying to solve the
keyword/identifier problem.
nonkeyword None # SyntaxError: a keyword constant 'None' cannot be an
identifier
> Remember the KISS principle.
It seemed to me that the syntactical block based declarations are quite
simple in principle, but you are right, there are many issues.
Best regards,
Adam Bartoš
Hello,
I have yet another idea regarding the the clashes between new keywords and
already used names. How about introducing two new keywords *wink* that
would serve as lexical keyword/nonkeyword declarations, similarly to
nonlocal and global declarations?
def f():
nonkeyword if
if = 2 # we use 'if' as an identifier
def g():
keyword if
if x > 0: pass # now 'if' again introduces a conditional statement
This allows to have a name as both identifier and keyword in a single
module, and since it's lexical, it could be in principle syntax-highlighted
correctly.
When a new keyword is added to the list of standard keywords like 'given'
or 'where', a module that uses the name as identifier could be easily fixed
by a global declaration 'nonkeyword given'. Maybe even exception messages
pointing to this could be added. If 'nonkeyword keyword' is allowed, we can
also fix code using the name 'keyword' as an identifier, but doing so in
the global scope couldn't be undone.
On the other hand, new language features depending on new keywords could be
made provisionary by not adding the keywords to the standard list, so
people who would like to use them would need to opt in by e.g. 'keyword
given'. Surely, this provisional mechanism isn't robust at all since new
features may just extend the usage of current keywords.
Declaring a keyword that has no meaning in the language would result in an
exception:
keyword foo # SyntaxError: undefined keyword 'foo'
It should be possible to use a keyword as a parameter name without need to
declare it in the surrounding scope, the local declaration would suffice:
# nonkeyword if # not needed
def f(if=3): # ok
nonkeyword if
Other option is to interpret parameters always as nonkeywords or to raise a
special syntax error when a keyword occurs at a place of a formal parameter
(similarly to 'def f(x): nonlocal x').
Clearly, even if this proposal diminished the cost of adding new keywords,
the cost would still be high.
Best regards,
Adam Bartoš
As anyone still following the inline assignment discussion knows, a problem
with designing new syntax is that it's hard to introduce new keywords into
the language, since all the nice words seem to be used as method names in
popular packages. (E.g. we can't use 'where' because there's numpy.where
<https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.where.html>,
and we can't use 'given' because it's used in Hypothesis
<http://hypothesis.readthedocs.io/en/latest/quickstart.html>.)
The idea I had (not for the first time :-) is that in many syntactic
positions we could just treat keywords as names, and that would free up
these keywords.
For example, we could allow keywords after 'def' and after a period, and
then the following would become legal:
class C:
def and(self, other):
return ...
a = C()
b = C()
print(a.and(b))
This does not create syntactic ambiguities because after 'def' and after a
period the grammar *always* requires a NAME.
There are other positions where we could perhaps allow this, e.g. in a
decorator, immediately after '@' (the only keyword that's *syntactically*
legal here is 'not', though I'm not sure it would ever be useful).
Of course this would still not help for names of functions that might be
imported directly (do people write 'from numpy import where'?). And it
would probably cause certain typos be harder to diagnose.
I should also mention that this was inspired from some messages where Tim
Peters berated the fashion of using "reserved words", waxing nostalgically
about the old days of Fortran (sorry, FORTRAN), which doesn't (didn't?)
have reserved words at all (nor significant whitespace, apart from the
"start in column 7" rule).
Anyway, just throwing this out. Please tear it apart!
--
--Guido van Rossum (python.org/~guido)
On 13/05/2018 19:19, Guido van Rossum wrote:
> As anyone still following the inline assignment discussion knows, a
> problem with designing new syntax is that it's hard to introduce new
> keywords into the language, since all the nice words seem to be used
> as method names in popular packages. (E.g. we can't use 'where'
> because there's numpy.where
> <https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.where.html>,
> and we can't use 'given' because it's used in Hypothesis
> <http://hypothesis.readthedocs.io/en/latest/quickstart.html>.)
>
> The idea I had (not for the first time :-) is that in many syntactic
> positions we could just treat keywords as names, and that would free
> up these keywords.
>
> For example, we could allow keywords after 'def' and after a period,
> and then the following would become legal:
>
> class C:
> def and(self, other):
> return ...
>
> a = C()
> b = C()
> print(a.and(b))
>
> This does not create syntactic ambiguities because after 'def' and
> after a period the grammar *always* requires a NAME.
>
> There are other positions where we could perhaps allow this, e.g. in a
> decorator, immediately after '@' (the only keyword that's
> *syntactically* legal here is 'not', though I'm not sure it would ever
> be useful).
So you would be allowing "second class" identifiers - legal in some
positions where an identifier is allowed, not legal in others.
With respect, that seems like a terrible idea; neither those who want to
use such identifiers, nor those who don't, would be happy. Especially
if it encourages work-arounds such as
def and(x, y):
return ...
# and(1,2) # Oops, SyntaxError. Oh, I know:
globals()['and'](1,2) # Works!
and a zillion others.
>
> [snip] And it would probably cause certain typos be harder to diagnose.
No doubt.
>
> I should also mention that this was inspired from some messages where
> Tim Peters berated the fashion of using "reserved words", waxing
> nostalgically about the old days of Fortran (sorry, FORTRAN), which
> doesn't (didn't?) have reserved words at all (nor significant
> whitespace, apart from the "start in column 7" rule).
>
> Anyway, just throwing this out. Please tear it apart!
Thanks. :-)
>
> --
> --Guido van Rossum (python.org/~guido <http://python.org/%7Eguido>)
>
Best wishes
Rob Cliffe
I'm hoping that the arguments for assignment expressions will be over by
Christmas *wink* so as a partial (and hopefully less controversial)
alternative, what do people think of the idea of flagging certain
expressions as "pure functions" so the compiler can automatically cache
results from it?
Let me explain: one of the use-cases for assignment expressions is to
reduce repetition of code which may be expensive. A toy example:
func(arg) + func(arg)*2 + func(arg)**2
If func() is a pure function with no side-effects, that is three times
as costly as it ought to be:
(f := func(arg)) + f*2 + f**2
Functional languages like Haskell can and do make this optimization all
the time (or so I am lead to believe), because the compiler knows that
func must be a pure, side-effect-free function. But the Python
interpreter cannot do this optimization for us, because it has no way of
knowing whether func() is a pure function.
Now for the wacky idea: suppose we could tell the interpreter to cache
the result of some sub-expression, and re-use it within the current
expression? That would satisfy one use-case for assignment operators,
and perhaps weaken the need for := operator.
Good idea? Dumb idea?
Good idea, but you want the assignment operator regardless?
I don't have a suggestion for syntax yet, so I'm going to make up syntax
which is *clearly and obviously rubbish*, a mere placeholder, so don't
bother telling me all the myriad ways it sucks. I know it sucks, that's
deliberate. Please focus on the *concept*, not the syntax.
We would need to flag which expression can be cached because it is PURE,
and tag how far the CACHE operates over:
<BEGIN CACHE>
<PURE>
func(arg)
<END PURE>
+ func(arg)*2 + func(arg)**2
<END CACHE>
This would tell the compiler to only evaluate the sub-expression
"func(arg)" once, cache the result, and re-use it each other time it
sees that same sub-expression within the surrounding expression.
To be clear: it doesn't matter whether or not the sub-expression
actually is pure. And it doesn't have to be a function call: it could be
anything legal in an expression.
If we had this, with appropriately awesome syntax, would that negate the
usefulness of assignment expresions in your mind?
--
Steve
Following up some of the discussions about the problems of adding keywords
and Guido's proposal of making tokenization context-dependent, I wanted to
propose an alternate way to go around the problem.
My proposal essentially boils down to:
1. The character "$" can be used as a prefix of identifiers. formally,
*identifier * ::= ["$"] xid_start
<https://docs.python.org/3/reference/lexical_analysis.html#grammar-token-xid…>
xid_continue <https://docs.python.org/3/reference/lexical_analysis.html#grammar-token-xid…>*
2. The "$" character is not part of the name. So the program
"foo=3;print($foo)" prints 3. So does the program "$foo=3; print(foo)".
Both set an entry to globals["foo"] and keep globals["$foo"] unset.
3. if "$" appears in a token, it's always an identifier. So "$with",
"$if", "$return" are all identifiers.
If you overcome the "yikes, that looks like awk/bash/perl/php, and I don't
like those", and consider it as an escape for "unusual"/"deprecation"
situations, I think it's not a bad chose, and allows to a simple solutions
to many problems that have been in discussion recently and not so recently.
[examples below]
For me the benefits of this approach are:
- It's very simple to explain how to use and its semantics
- It (seems to me it) should be easy to explain to a python apprentice
what a "$" means in code they read on a book/blogpost/manual
- It's very easy to implement, minimal changes in the tokenizer
- It's also easy to implement/integrate in other tools (editors with
syntax highlighters, code formatters, etc)
- It is easy to see that it's 100% backwards compatible (I understand
that "$" has never been used in python before)
- It is relatively unsurprising in the sense that other languages are
already using $ to label names (there may be some point of confusion to
people coming from javascript where "$" is a valid character in names and
is not ignored).
- It gives python devs and users a clear, easy and universal upgrade
path when keywords are added (language designers: Add a __future__ import
to enable keyword in python N+1, add warnings to change kw --> $kw in
python N+2, and then turn it on by default in python N+3... ; developers:
add the import when they want to upgrade , and fix their code with a
search&replace when adding the import or after getting a warning).
- It allows you to use new features even if some libraries were written
for older python versions, depending the deprecation period (this could be
improved with sth I'll write in another email, but that's the topic for
another proposal)
- When clashes occur, which they always do, there's one obvious way to
disambiguate (see today the "class_" argument for gettext.translation, the
"lambd" argument for random.expovariate, the "class_" filter in libraries
like pyquery for CSS class, functions like
pyquery, sqlalchemy.sql.operators.as_ , etc. Not counting all the "cls"
argument to every classmethod ever)
- If we're worried about over proliferation of "$" in code, I'm quite
sure given past experience that just a notice in PEP 8 of "only with $ in
names to prevent ambiguity" should be more than enough for the community
What are the drawbacks that you find in this?
Best,
Daniel
[The rest of this post is just examples]
Example 1:
Python 3.92 has just added a future import that makes "given" a keyword.
Then you can do:
# This works because we have no future import
*from* hypothesis *import* given, strategies *as* st
@given(st.integers())
*def* foo(i):
x = f(i)**2 + f(i)**3
....
if you want to use the new feature (or upgraded to python 3.93 and started
receiving warnings) you can then change it to:
*from* __future__ *import* given_expression
*from* hypothesis *import* $given, strategies *as* st
@$given(st.integers()) *# If you forget the $ you get a SyntaxError*
*def* foo(i):
x = z**2 + z**3 *given* z = f(i)
....
And also you could do:
*from* __future__ *import* given_expression
*import* hypothesis
@hypothesis.$given(hypothesis.strategies.integers())
*def* foo(i):
x = z**2 + z**3 *given* z = f(i)
....
Or even, if you want to prevent the "$" all over your code:
*from __future__ import given_expressionfrom* hypothesis *import* $given
*as* hgiven, strategies *as* st
@hgiven(st.integers())
*def* foo(i):
x = z**2 + z**3 *given* z = f(i)
....
If you have some library which uses a new keyword as a method name (you
can't rename those with import ... as ...), it still works perfectly:
*from* mylib *import* SomeClass
instance = SomeClass()
instance.$given("foo")
This is also helpful as a universal way to solve name clashes between
python keywords and libraries that use some external concept that overlaps
with python
(from https://pythonhosted.org/pyquery/attributes.html ):
>>> *import* pyquery *as* pq
>>> p = pq(*'<p id="hello" class="hello"></p>'*)(*'p'*)
>>> p.attr(id=*'hello'*, $class=*'hello2'*)
*[<p#hello.hello2>]*
Or even nameclashes within python itself
@classmethod
*def *new_with_color($class, color): *# Instead of the usual cls or class_*
result = $class()
result.set_color(color)
return result
--
<https://www.machinalis.co.uk>
Daniel Moisset
UK COUNTRY MANAGER
A: 1 Fore Street, EC2Y 9DT London <https://goo.gl/maps/pH9BBLgE8dG2>
P: +44 7398 827139 <+44+7398+827139>
M: dmoisset(a)machinalis.com <dmoisset(a)machinalis.com> | S: dmoisset
<http://www.linkedin.com/company/456525>
<http://www.twitter.com/machinalis> <http://www.facebook.com/machinalis>
<https://www.instagram.com/machinalis.life/>
Machinalis Limited is a company registered in England and Wales. Registered
number: 10574987.
I know that "self" parameter have been discussed a lot, but still I didn't
find this proposal. If it was instead take my sincere apologies and please
forget this mail.
The disturbing part of the "self parameter" is the asymmetry of the
definition and the call. So I was thinking: why not do define the methods
like: "def self.whatevermethod(par1, par2, etc)" instead of "def
whatevermethod(self, par1, par2, etc)"?
This will allow the call and the definition to be written exactly in the
same way, still leaving the "clarity" of the additional input for the
function. Moreover this can be made backward compatible (even still without
making self a reserved word, to be considered anyway).
I've been short by purpose but ready to elaborate if needed.
Thank you for the attention.
_Stefano
Sorry for chiming in so late; I was lurking using google groups and had to
subscribe to post - hence this new thread.
I gather that *where* has been discarded as a possible new keywords given
its use as a function in numpy (
https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.where.html)
... Still, I will include it below for completeness (and as I think it
reads better than the other choices)
Here are two sets of two examples illustrating a situation that I have not
seen before, and which read (to me) much better when using a keyword (given
or where) than a symbol (:=). Furthermore, the position of the temporary
assignment can, in my opinion, be done differently and help in making the
code clearer.
First example: single temporary assignment, done four different ways.
1) using :=
real_roots = [ (-b/(2*a) + (D:= sqrt( (b/(2*a))**2 - c/a), -b/(2*a) - D)
for a in range(10)
for b in range(10)
for c in range(10)
if D >= 0]
2) using *given* at the very end
real_roots = [ (-b/(2*a) + D, -b/(2*a) - D)
for a in range(10)
for b in range(10)
for c in range(10)
if D >= 0
given D= sqrt( (b/(2*a))**2 - c/a)]
3) using *given* before the iterations
real_roots = [ (-b/(2*a) + D, -b/(2*a) - D)
given D= sqrt( (b/(2*a))**2 - c/a)
for a in range(10)
for b in range(10)
for c in range(10)
if D >= 0]
4) using *where* before the iterations (which would be my preferred choice
if it were available)
real_roots = [ (-b/(2*a) + D, -b/(2*a) - D)
where D= sqrt( (b/(2*a))**2 - c/a)
for a in range(10)
for b in range(10)
for c in range(10)
if D >= 0]
Second example: multiple assignments.
When we have multiple temporary assignments, the situation can be more
complicated. In the following series of examples, I will start in reverse
order compared to above.
5) using *where* before the iterations
real_roots2 = [ (-b/(2*a) + D, -b/(2*a) - D)
where D= sqrt( (b/(2*a))**2 - c/a)
where c = c_values/100
for c_values in range(1000)
if D >= 0]
6) using *given* before the iterations
real_roots2 = [ (-b/(2*a) + D, -b/(2*a) - D)
given D= sqrt( (b/(2*a))**2 - c/a)
given c = c_values/100
for c_values in range(1000)
if D >= 0]
7) using *given* at the very end
real_roots2 = [ (-b/(2*a) + D, -b/(2*a) - D)
for c_values in range(1000)
if D >= 0
given D= sqrt( (b/(2*a))**2 - c/a)
given c = c_values/100]
8) Using :=
real_roots2 = [ ( -b/(2*a) + (D:= sqrt( (b/(2*a))**2 -
(c:=c_values/100)/a),
-b/(2*a) - D)
for c_values in range(1000)
if D >= 0]
I find this last version extremely difficult to understand compared with
the others where a keyword is used. Perhaps it is because I do not fully
understand how := should be used...
Finally ... if "where" cannot be used, given the very special status of
such temporary assignments, could "where_" (with a trailing underscore) be
considered? I would argue that any word followed by an underscore would be
more readable than a compound symbol such as ":=".
André
A new thread just to suggest taking the discussion about PEP572 well beyond
python-ideas (PyConn is good for that).
The least anyone should want is a language change that immediately gets
tagged on the networks as "don't use", or "use only for...", etc.
To be honest, I'll likely be on the "don't use :=, unless" band of pundits
(already a filibuster). ":=" is like going back to "reduce()", which is
almost defunct thanks to.. us!
Cheers!
--
Juancarlo *Añez*
Just for fun - no complaint, no suggestion, just sharing a bit of code
that tickled me.
The docs for `itertools.tee()` contain a Python work-alike, which is
easy to follow. It gives each derived generator its own deque, and
when a new value is obtained from the original iterator it pushes that
value onto each of those deques.
Of course it's possible for them to share a single deque, but the code
gets more complicated. Is it possible to make it simpler instead?
What it "really" needs is a shared singly-linked list of values,
pointing from oldest value to newest. Then each derived generator can
just follow the links, and yield its next result in time independent
of the number of derived generators. But how do you know when a new
value needs to be obtained from the original iterator, and how do you
know when space for an older value can be recycled (because all of the
derived generators have yielded it)?
I ended up with almost a full page of code to do that, storing with
each value and link a count of the number of derived generators that
had yet to yield the value, effectively coding my own reference-count
scheme by hand, along with "head" and "tail" pointers to the ends of
the linked list that proved remarkably tricky to keep correct in all
cases.
Then I thought "this is stupid! Python already does reference
counting." Voila! Vast swaths of tedious code vanished, giving this
remarkably simple implementation:
def mytee(xs, n):
last = [None, None]
def gen(it, mylast):
nonlocal last
while True:
mylast = mylast[1]
if not mylast:
mylast = last[1] = last = [next(it), None]
yield mylast[0]
it = iter(xs)
return tuple(gen(it, last) for _ in range(n))
There's no need to keep a pointer to the start of the shared list at
all - we only need a pointer to the end of the list ("last"), and each
derived generator only needs a pointer to its own current position in
the list ("mylast").
What I find kind of hilarious is that it's no help at all as a
prototype for a C implementation: Python recycles stale `[next(it),
None]` pairs all by itself, when their internal refcounts fall to 0.
That's the hardest part.
BTW, I certainly don't suggest adding this to the itertools docs
either. While it's short and elegant, it's too subtle to grasp easily
- if you think "it's obvious", you haven't yet thought hard enough
about the problem ;-)