[Python-Dev] Examples for PEP 572

Tue Jul 3 21:54:22 EDT 2018

On 7/3/2018 5:37 PM, Serhiy Storchaka wrote:
> I like programming languages in which all are expressions (including 
> function declarations, branching and loops) and you can use an 
> assignment at any point, but Python is built on other ways, and I like 
> Python too. PEP 572 looks violating several Python design principles. 
> Python looks simple language, and this is its strong side. I believe 
> most Python users are not professional programmers -- they are 
> sysadmins, scientists, hobbyists and kids -- but Python is suitable for 
> them because its clear syntax and encouraging good style of programming. 
> In particularly mutating and non-mutating operations are separated. The 
> assignment expression breaks this. There should be very good reasons for 
> doing this. But it looks to me that all examples for PEP 572 can be 
> written better without using the walrus operator.
I appreciate you showing alternatives I can use now.  Even once 
implemented, one can not use A.E's until one no longer cares about 3.7 
compatibility.  Then there will still be a choice.

>> results = [(x, y, x/y) for x in input_data if (y := f(x)) > 0]
> 
>      results = [(x, y, x/y) for x in input_data for y in [f(x)] if y > 0]

Would (f(x),) be faster?

import timeit as ti

print(ti.timeit('for y in {x}: pass', 'x=1'))
print(ti.timeit('for y in [x]: pass', 'x=1'))
print(ti.timeit('for y in (x,): pass', 'x=1'))

# prints
0.13765254499999996  # seconds per 1_000_000 = microseconds each.
0.10321274000000003
0.09492473300000004

Yes, but not enough to pay for adding ',', and sometimes forgetting.

>> stuff = [[y := f(x), x/y] for x in range(5)]
>  stuff = [[y, x/y] for x in range(5) for y in [f(x)]]

Creating an leaky name binding appears to about 5 x faster than 
iterating a temporary singleton.

print(ti.timeit('y=x', 'x=1'))
print(ti.timeit('y=x; del y', 'x=1'))
#
0.017357778999999907
0.021115051000000107

If one adds 'del y' to make the two equivalent, the chars typed is about 
the same.  To me, the choice amounts to subject reference.  Even with 
y:=x available, I would write the expansion as

res = []
for x in range(5):
     y = f(x)
     res.append((y, x/y))

rather than use the assignment expression in the tuple.  It creates a 
'hitch' in thought.

> This idiom looks unusual for you? But this is a legal Python syntax, and 
> it is not more unusual than the new walrus operator. This idiom is not 
> commonly used because there is very little need of using above examples 
> in real code. And I'm sure that the walrus operator in comprehension 
> will be very rare unless PEP 572 will encourage writing complicated 
> comprehensions. Most users prefer to write an explicit loop.

> I want to remember that PEP 572 started from the discussion on 
> Python-ideas which proposed a syntax for writing the following code as a 
> comprehension:
> 
>      smooth_signal = []
>      average = initial_value
>      for xt in signal:
>          average = (1-decay)*average + decay*xt
>          smooth_signal.append(average)
> 
> Using the "for in []" idiom this can be written (if you prefer 
> comprehensions) as:
> 
>      smooth_signal = [average
>                       for average in [initial_value]
>                       for x in signal
>                       for average in [(1-decay)*average + decay*x]]
> 
> Try now to write this using PEP 572. The walrus operator turned to be 
> less suitable for solving the original problem because it doesn't help 
> to initialize the initial value.
> 
> 
> Examples from PEP 572:
> 
>> # Loop-and-a-half
>> while (command := input("> ")) != "quit":
>>     print("You entered:", command)
> 
> The straightforward way:
> 
>      while True:
>          command = input("> ")
>          if command == "quit": break
>          print("You entered:", command)
> 
> The clever way:
> 
>      for command in iter(lambda: input("> "), "quit"):
>          print("You entered:", command)

The 2-argument form of iter is under-remembered and under-used.  The 
length difference is 8.
     while (command := input("> ")) != "quit":
     for command in iter(lambda: input("> "), "quit"):

I like the iter version, but the for-loop machinery and extra function 
call makes a minimal loop half a millisecond slower.

import timeit as ti

def s():
     it = iter(10000*'0' + '1')

def w():
     it = iter(10000*'0' + '1')
     while True:
         command = next(it)
         if command == '1': break

def f():
     it = iter(10000*'0' + '1')
     for command in iter(lambda: next(it), '1'): pass

print(ti.timeit('s()', 'from __main__ import s', number=1000))
print(ti.timeit('w()', 'from __main__ import w', number=1000))
print(ti.timeit('f()', 'from __main__ import f', number=1000))
#
0.0009702129999999975
0.9365254250000001
1.5913117949999998

Of course, with added processing of 'command' the time difference 
disappears.  Printing (in IDLE) is an extreme case.

def wp():
     it = iter(100*'0' + '1')
     while True:
         command = next(it)
         if command == '1': break
         print('w', command)

def fp():
     it = iter(100*'0' + '1')
     for command in iter(lambda: next(it), '1'):
         print('f', command)

print(ti.timeit('wp()', 'from __main__ import wp', number=1))
print(ti.timeit('fp()', 'from __main__ import fp', number=1))
#
0.48
0.47

>> # Capturing regular expression match objects
>> # See, for instance, Lib/pydoc.py, which uses a multiline spelling
>> # of this effect
>> if match := re.search(pat, text):
>>     print("Found:", match.group(0))
>> # The same syntax chains nicely into 'elif' statements, unlike the
>> # equivalent using assignment statements.
>> elif match := re.search(otherpat, text):
>>     print("Alternate found:", match.group(0))
>> elif match := re.search(third, text):
>>     print("Fallback found:", match.group(0))

> It may be more efficient to use a single regular expression which 
> consists of multiple or-ed patterns

My attempt resulted in a slowdown.  Duplicating the dominance of pat 
over otherpat over third requires, I believe, negative lookahead assertions.
---

import re
import timeit as ti

##print(ti.timeit('for y in {x}: pass', 'x=1'))
##print(ti.timeit('for y in [x]: pass', 'x=1'))
##print(ti.timeit('for y in (x,): pass', 'x=1'))
##
##print(ti.timeit('y=x', 'x=1'))
##print(ti.timeit('y=x; del y', 'x=1'))

pat1 = re.compile('1')
pat2 = re.compile('2')
pat3 = re.compile('3')
pat123 = re.compile('1|2(?!.*1)|3(?!.*(1|2))')
# I think most people would prefer to use the 3 simple patterns.

def ifel(text):
     match = re.search(pat1, text)
     if match: return match.group()
     match = re.search(pat2, text)
     if match: return match.group()
     match = re.search(pat3, text)
     if match: return match.group()

def mach(text):
     match = re.search(pat123, text)
     return match.group()

print([ifel('321'), ifel('32x'), ifel('3xx')] == ['1', '2', '3'])
print([mach('321'), mach('32x'), mach('3xx')] == ['1', '2', '3'])
# True, True

text = '0'*10000 + '321'
print(ti.timeit('ifel(text)', "from __main__ import ifel, text", 
number=100000))
print(ti.timeit('mach(text)', "from __main__ import mach, text", 
number=100000))
# 0.77, 7.22

> marked as different groups.

When I put parens around 1, 2, 3 in pat123, the 2nd timeit continued 
until I restarted Shell.  Maybe you can do better.

> For example see the cute regex-based tokenizer in gettext.py:
> 
>> _token_pattern = re.compile(r"""
>>         (?P<WHITESPACES>[ \t]+)                    | # spaces and 
>> horizontal tabs
>>         (?P<NUMBER>[0-9]+\b)                       | # decimal integer
>>         (?P<NAME>n\b)                              | # only n is allowed
>>         (?P<PARENTHESIS>[()])                      |
>>         (?P<OPERATOR>[-*/%+?:]|[><!]=?|==|&&|\|\|) | # !, *, /, %, +, 
>> -, <, >,
>>                                                      # <=, >=, ==, !=, 
>> &&, ||,
>>                                                      # ? :
>>                                                      # unary and 
>> bitwise ops
>>                                                      # not allowed
>>         (?P<INVALID>\w+|.)                           # invalid token
>>     """, re.VERBOSE|re.DOTALL)
>>
>> def _tokenize(plural):
>>     for mo in re.finditer(_token_pattern, plural):
>>         kind = mo.lastgroup
>>         if kind == 'WHITESPACES':
>>             continue
>>         value = mo.group(kind)
>>         if kind == 'INVALID':
>>             raise ValueError('invalid token in plural form: %s' % value)
>>         yield value
>>     yield ''

> I have not found any code similar to the PEP 572 example in pydoc.py. It 
> has different code:
> 
>> pattern = re.compile(r'\b((http|ftp)://\S+[\w/]|'
>>                         r'RFC[- ]?(\d+)|'
>>                         r'PEP[- ]?(\d+)|'
>>                         r'(self\.)?(\w+))')
> ...
>> start, end = match.span()
>> results.append(escape(text[here:start]))
>>
>> all, scheme, rfc, pep, selfdot, name = match.groups()
>> if scheme:
>>     url = escape(all).replace('"', '"')
>>     results.append('<a href="%s">%s</a>' % (url, url))
>> elif rfc:
>>     url = 'http://www.rfc-editor.org/rfc/rfc%d.txt' % int(rfc)
>>     results.append('<a href="%s">%s</a>' % (url, escape(all)))
>> elif pep:
> ...
> 
> It doesn't look as a sequence of re.search() calls. It is more clear and 
> efficient, and using the assignment expression will not make it better.
> 
>> # Reading socket data until an empty string is returned
>> while data := sock.recv():
>>     print("Received data:", data)
> 
>      for data in iter(sock.recv, b''):
>          print("Received data:", data)
> 
>> if pid := os.fork():
>>     # Parent code
>> else:
>>     # Child code
> 
>      pid = os.fork()
>      if pid:
>          # Parent code
>      else:
>          # Child code
> 
> 
> It looks to me that there is no use case for PEP 572. It just makes 
> Python worse.
> 

-- 
Terry Jan Reedy