[Python-Dev] Examples for PEP 572
Terry Reedy
tjreedy at udel.edu
Tue Jul 3 21:54:22 EDT 2018
On 7/3/2018 5:37 PM, Serhiy Storchaka wrote:
> I like programming languages in which all are expressions (including
> function declarations, branching and loops) and you can use an
> assignment at any point, but Python is built on other ways, and I like
> Python too. PEP 572 looks violating several Python design principles.
> Python looks simple language, and this is its strong side. I believe
> most Python users are not professional programmers -- they are
> sysadmins, scientists, hobbyists and kids -- but Python is suitable for
> them because its clear syntax and encouraging good style of programming.
> In particularly mutating and non-mutating operations are separated. The
> assignment expression breaks this. There should be very good reasons for
> doing this. But it looks to me that all examples for PEP 572 can be
> written better without using the walrus operator.
I appreciate you showing alternatives I can use now. Even once
implemented, one can not use A.E's until one no longer cares about 3.7
compatibility. Then there will still be a choice.
>> results = [(x, y, x/y) for x in input_data if (y := f(x)) > 0]
>
> results = [(x, y, x/y) for x in input_data for y in [f(x)] if y > 0]
Would (f(x),) be faster?
import timeit as ti
print(ti.timeit('for y in {x}: pass', 'x=1'))
print(ti.timeit('for y in [x]: pass', 'x=1'))
print(ti.timeit('for y in (x,): pass', 'x=1'))
# prints
0.13765254499999996 # seconds per 1_000_000 = microseconds each.
0.10321274000000003
0.09492473300000004
Yes, but not enough to pay for adding ',', and sometimes forgetting.
>> stuff = [[y := f(x), x/y] for x in range(5)]
> stuff = [[y, x/y] for x in range(5) for y in [f(x)]]
Creating an leaky name binding appears to about 5 x faster than
iterating a temporary singleton.
print(ti.timeit('y=x', 'x=1'))
print(ti.timeit('y=x; del y', 'x=1'))
#
0.017357778999999907
0.021115051000000107
If one adds 'del y' to make the two equivalent, the chars typed is about
the same. To me, the choice amounts to subject reference. Even with
y:=x available, I would write the expansion as
res = []
for x in range(5):
y = f(x)
res.append((y, x/y))
rather than use the assignment expression in the tuple. It creates a
'hitch' in thought.
> This idiom looks unusual for you? But this is a legal Python syntax, and
> it is not more unusual than the new walrus operator. This idiom is not
> commonly used because there is very little need of using above examples
> in real code. And I'm sure that the walrus operator in comprehension
> will be very rare unless PEP 572 will encourage writing complicated
> comprehensions. Most users prefer to write an explicit loop.
> I want to remember that PEP 572 started from the discussion on
> Python-ideas which proposed a syntax for writing the following code as a
> comprehension:
>
> smooth_signal = []
> average = initial_value
> for xt in signal:
> average = (1-decay)*average + decay*xt
> smooth_signal.append(average)
>
> Using the "for in []" idiom this can be written (if you prefer
> comprehensions) as:
>
> smooth_signal = [average
> for average in [initial_value]
> for x in signal
> for average in [(1-decay)*average + decay*x]]
>
> Try now to write this using PEP 572. The walrus operator turned to be
> less suitable for solving the original problem because it doesn't help
> to initialize the initial value.
>
>
> Examples from PEP 572:
>
>> # Loop-and-a-half
>> while (command := input("> ")) != "quit":
>> print("You entered:", command)
>
> The straightforward way:
>
> while True:
> command = input("> ")
> if command == "quit": break
> print("You entered:", command)
>
> The clever way:
>
> for command in iter(lambda: input("> "), "quit"):
> print("You entered:", command)
The 2-argument form of iter is under-remembered and under-used. The
length difference is 8.
while (command := input("> ")) != "quit":
for command in iter(lambda: input("> "), "quit"):
I like the iter version, but the for-loop machinery and extra function
call makes a minimal loop half a millisecond slower.
import timeit as ti
def s():
it = iter(10000*'0' + '1')
def w():
it = iter(10000*'0' + '1')
while True:
command = next(it)
if command == '1': break
def f():
it = iter(10000*'0' + '1')
for command in iter(lambda: next(it), '1'): pass
print(ti.timeit('s()', 'from __main__ import s', number=1000))
print(ti.timeit('w()', 'from __main__ import w', number=1000))
print(ti.timeit('f()', 'from __main__ import f', number=1000))
#
0.0009702129999999975
0.9365254250000001
1.5913117949999998
Of course, with added processing of 'command' the time difference
disappears. Printing (in IDLE) is an extreme case.
def wp():
it = iter(100*'0' + '1')
while True:
command = next(it)
if command == '1': break
print('w', command)
def fp():
it = iter(100*'0' + '1')
for command in iter(lambda: next(it), '1'):
print('f', command)
print(ti.timeit('wp()', 'from __main__ import wp', number=1))
print(ti.timeit('fp()', 'from __main__ import fp', number=1))
#
0.48
0.47
>> # Capturing regular expression match objects
>> # See, for instance, Lib/pydoc.py, which uses a multiline spelling
>> # of this effect
>> if match := re.search(pat, text):
>> print("Found:", match.group(0))
>> # The same syntax chains nicely into 'elif' statements, unlike the
>> # equivalent using assignment statements.
>> elif match := re.search(otherpat, text):
>> print("Alternate found:", match.group(0))
>> elif match := re.search(third, text):
>> print("Fallback found:", match.group(0))
> It may be more efficient to use a single regular expression which
> consists of multiple or-ed patterns
My attempt resulted in a slowdown. Duplicating the dominance of pat
over otherpat over third requires, I believe, negative lookahead assertions.
---
import re
import timeit as ti
##print(ti.timeit('for y in {x}: pass', 'x=1'))
##print(ti.timeit('for y in [x]: pass', 'x=1'))
##print(ti.timeit('for y in (x,): pass', 'x=1'))
##
##print(ti.timeit('y=x', 'x=1'))
##print(ti.timeit('y=x; del y', 'x=1'))
pat1 = re.compile('1')
pat2 = re.compile('2')
pat3 = re.compile('3')
pat123 = re.compile('1|2(?!.*1)|3(?!.*(1|2))')
# I think most people would prefer to use the 3 simple patterns.
def ifel(text):
match = re.search(pat1, text)
if match: return match.group()
match = re.search(pat2, text)
if match: return match.group()
match = re.search(pat3, text)
if match: return match.group()
def mach(text):
match = re.search(pat123, text)
return match.group()
print([ifel('321'), ifel('32x'), ifel('3xx')] == ['1', '2', '3'])
print([mach('321'), mach('32x'), mach('3xx')] == ['1', '2', '3'])
# True, True
text = '0'*10000 + '321'
print(ti.timeit('ifel(text)', "from __main__ import ifel, text",
number=100000))
print(ti.timeit('mach(text)', "from __main__ import mach, text",
number=100000))
# 0.77, 7.22
> marked as different groups.
When I put parens around 1, 2, 3 in pat123, the 2nd timeit continued
until I restarted Shell. Maybe you can do better.
> For example see the cute regex-based tokenizer in gettext.py:
>
>> _token_pattern = re.compile(r"""
>> (?P<WHITESPACES>[ \t]+) | # spaces and
>> horizontal tabs
>> (?P<NUMBER>[0-9]+\b) | # decimal integer
>> (?P<NAME>n\b) | # only n is allowed
>> (?P<PARENTHESIS>[()]) |
>> (?P<OPERATOR>[-*/%+?:]|[><!]=?|==|&&|\|\|) | # !, *, /, %, +,
>> -, <, >,
>> # <=, >=, ==, !=,
>> &&, ||,
>> # ? :
>> # unary and
>> bitwise ops
>> # not allowed
>> (?P<INVALID>\w+|.) # invalid token
>> """, re.VERBOSE|re.DOTALL)
>>
>> def _tokenize(plural):
>> for mo in re.finditer(_token_pattern, plural):
>> kind = mo.lastgroup
>> if kind == 'WHITESPACES':
>> continue
>> value = mo.group(kind)
>> if kind == 'INVALID':
>> raise ValueError('invalid token in plural form: %s' % value)
>> yield value
>> yield ''
> I have not found any code similar to the PEP 572 example in pydoc.py. It
> has different code:
>
>> pattern = re.compile(r'\b((http|ftp)://\S+[\w/]|'
>> r'RFC[- ]?(\d+)|'
>> r'PEP[- ]?(\d+)|'
>> r'(self\.)?(\w+))')
> ...
>> start, end = match.span()
>> results.append(escape(text[here:start]))
>>
>> all, scheme, rfc, pep, selfdot, name = match.groups()
>> if scheme:
>> url = escape(all).replace('"', '"')
>> results.append('<a href="%s">%s</a>' % (url, url))
>> elif rfc:
>> url = 'http://www.rfc-editor.org/rfc/rfc%d.txt' % int(rfc)
>> results.append('<a href="%s">%s</a>' % (url, escape(all)))
>> elif pep:
> ...
>
> It doesn't look as a sequence of re.search() calls. It is more clear and
> efficient, and using the assignment expression will not make it better.
>
>> # Reading socket data until an empty string is returned
>> while data := sock.recv():
>> print("Received data:", data)
>
> for data in iter(sock.recv, b''):
> print("Received data:", data)
>
>> if pid := os.fork():
>> # Parent code
>> else:
>> # Child code
>
> pid = os.fork()
> if pid:
> # Parent code
> else:
> # Child code
>
>
> It looks to me that there is no use case for PEP 572. It just makes
> Python worse.
>
--
Terry Jan Reedy
More information about the Python-Dev
mailing list