statements in control structures (Re: Conditional Expressions don't solve the problem)

Thu Oct 18 04:10:40 EDT 2001

Huaiyu Zhu:
>You raised several valid points, but your picking on the examples sometimes
>hinges on specific wart of the examples.

Yep.  My belief is that cases where your proposal shines doesn't
occur very often in real code, and that the additional flexibility
will instead lead to more obsfucated code.

>Do you recommend always using this
  ...

>This style change is orthogonal to the issue of statements before
condition.
>The question is whether the else-clause in while-loop has any real use.

Consider the following code, which finds files that use the while/else
construct.

  ===
import sys, re

wh = re.compile("( *)while")
spaces = re.compile("^ *$")

for filename in sys.argv[1:]:
    in_while = 0
    indent = 0
    for line in open(filename).readlines():
        if spaces.match(line):
            continue
        if in_while and spaces.match(line[:indent+1]):
            continue
        if in_while and spaces.match(line[:indent]) and \
           line[indent:indent+5] == "else:":
            print filename
            in_while = 0
            continue
        in_while = 0
        m = wh.match(line)
        if m:
            in_while = 1
            indent = len(m.group(1))
  ===

Running this on *.py in the standard library I found
 1 in fpformat (written by Tim Peters)
 2 in threading (also written by Tim Peters)
 1 in test_grammar  (doesn't count - it's testing this construct)
 1 in distutils.command.build_py (not written by Tim Peters :)

In other words, it isn't used very often.

By comparison, there are nearly 200 uses of 'while 1:' and 500
uses of 'while'.  So 1% of whiles use end.

To answer your question.  No, I don't "always" decline to use
else in a while.  I have a sense of when to use it, and I recall
being happy when I had a case that was appropriate, but sadly I don't
know how to convert that sense into words.

>>I prefer the syntax mentioned some months ago when this discussion
>>last flared up.

>That is something I'd like too, except:
>- new keyword "loop",
>- do we want to make "while" obsolete?
>- need two kinds of "break" (one impact "else" the other does not).

That's a spelling issue and should not be the primary issue.
That is,

- "while 1:" is a possible spelling for "loop:" (with
      "while <expression>:" as the full generalization)
- that means 'while' won't be obsolete
- Really?  I thought 'break expression:' would be identical to
    if expression:
        break
and wouldn't hit the else.  Neither does the existing else.
So there are only two ways to write the same thing.

>if m = re.match(patt1, x); len(m.groups())>3:
>    a = process_header(m)
>elif m = re.match(patt2, x); m:
>    b = process_body(m, some_other_data)
>...

Ahh, this is more realistic, but still not realistic enough.
What do you do with a and b?  Are headers and bodies interleaved
or is the header optional, followed by a body, or ...?

My code that looks like this is more of the form

pat1 = re.compile("pattern1")  # but I use a more descriptive
pat2 = re.compile("pattern2")  # name, not 'pat1', 'pat2'

for x in reader:

  # Example of a string that should match
  m = pat1.match(x)
  if len(m.groups()) > 3:
    a = process_header(m)
    print a
    continue

  # This is an example of a body
  m = pat2.match(x)
  if m:
    b = process_body(m, some_other_data)
    print b
    continue

  raise AssertionError("Unknown line: %s" % x)

or I use while 1/breaks.

But in real life I wrote my own parser generator :)

I looked at xmllib in the standard library, as a module which
uses the re library.  Most of its uses of the match function
are in the context of

  if pattern.match(s):
     ....
    continue

 -or-

  m = pattern.match(s)
  if m:
     ....
    continue

 -or, raise an exception instead of using a continue --

    self.syntax_error(msg)

This style is more like what I'm used to - where there is no
series of if/elif/elif's because the end of each branch is
a continue, break or raise.  So there are no deeply nested
and ugly constructs like you think there are.

>Do you suggest that the normal way to use such control structures is to
>always enclose them in a function?

Nope.  Specifically I said

] Need a real life example here.  Is there something in the standard
] library which would be improved with this change?  The reason I
] ask is because this code can be refactored into a function.

Note the "can be" (not should or must) and "need a real life example"
because the best solution is dependent on context, and it's too
easy to come up with an contrived example which emphasis a problem
that doesn't usually occur.

>I'd say the above regular expression example is quite common, and your
>function would need quite some optional variables.

So what are some example from the standard library which would be
improved?

>>but increasing the temptation for people to write
>>
>>  a = f(x)
>>  if a > 2:
>>    g()
>>
>>as
>>
>>  if a = f(x); a > 2:
>>    g()
>
>Good point.  Although I do not see the temptation, I don't see the use,
>either.  Maybe it should not be allowed after "if", only after "elif".

I don't use semicolons in my python code except for two cases:
  1) simple assignments, like
       a = 5; b = 12; c = 13
    or
       a = min(x, y); b = max(x, y)
  2) deliberately use few lines of code, like the RSA in
      four lines of Python.

I looked in the standard library and found semicolon uses in
these modules

UserList (simple assignments)
UserString (simple assignments)
asyncore (simple assignments)
base64 (in self-test code - not in user visible code)
bdb.py (in an exec'ed string - case 2)
cgitb (in a docstring - not in real code)
ftplib (one use is obsolete, other use okay - simple assignments
   followed by a del)
imaplib (simple assignment)
mailcap (simple assignment - an increment)
mhlib (in self-test code)
os (simple assignments)
quopri (simple increments and decrements, also does a break)
regsub (don't like the two instances - they are incrs)
sre_compile (I'm mixed about its uses)
types (tb = None; del tb)
warnings (in internal test code)

So it seems that ';' isn't used very often.  I don't want to
increase that.  At all.  Multiple statements on the same line
is a bad thing except in tiny amounts.

>If we are not considering typos between ";" and ":", this can already
happen
>
>x = 1;0; y = 2.0
>if x > 1,0:
>   ...

It could, except that using ';' is not used very often and is not
promoted as a good use.  So the first line easily stands out as unusual.
I have been hit by variants of the second.

BUT!  You had said

> This structure is safe against single typing errors

My point was to show a counter-example to this assertion.
It was not to show that the rest of Python is not prone to
single character errors.

>>if a; \
>>   b; \
>>   c; \
>>   d: \
>>   e; \
>>   f; \
>>   g
>
>I would rather consider this as contrived.  :-)

Yep.  But that confusion is not possible with current Python because
after an "if" you need only look for the first ";" or ":"-like object.
So I'm presenting a counter-example to your statement that

>        Since the change is only in the syntax of "if", "elif" and "while",
>        not in the fundamentals of expressions and statements, there is not
>        much more chance of obfuscation than existing syntax.

'Course, I'm also exaggerating what "much more" means.  :)

>Maybe our intuition is different, but it appears to me that
>
>      while char=file.readline()[3]; char != 'x':
>
>is exactly following things one step at a time, in the right order.

For me, I need to scan the line, find the semicolons, keep
track of where they are, break it down into the different parts,
then understand what each one does.

Breaking it down as
] for line in iter(file.readline, ""):
]  char = line[3]
]  if char != 'x':
]    break
]  process(char)

is easier for me because the layout of the code over the vertical
makes it easier for me to track where the components are, as
does naming those components.

>But more seriously, is it practical to change every while loop to for-loops
>with iterators?

No, and I never said it was.  I rarely assert such absolutist beliefs.
What I'm pointing out is that 1) many solutions exist and 2) why I
might prefer something different.

>  Consider my other example you ignored
>
>    while string = raw_input(prompt): not string.startswith("n"):
>        do things with string
>        break the loop according to condition
>        change prompt according to condition
>        do something more

I ignored it because I'm pushing for less contrived examples.
I want to see something that comes from real code.

Here's a contrived example.  Try doing this in Python

i = 5;
a:
printf("In a with %d\n", i);
if (i % 2 == 1) goto c;
i = i*3 + 1;

b:
printf("In b with %d\n", i);
i--;
if (i * i < 200) goto a;
if (i > 5) goto c;

c:
printf("In c with %d\n", i);
if (i < drand48() * 100) {
  i *= 2;
  goto c;
}
i -= 5;
if (i<0) i=-i;
goto a;

It's very hard to translate (keep adding gotos and statements if
it isn't hard enough).  Does that mean goto should be added to
Python?  No, it means that contrived examples aren't very useful.

(BTW, there is a program from Knuth in MIX which I found very
hard to translate to Python because of the lack of a goto.
I ended up writing it in C++ instead.)

Since you asked, here's a more realistic chunk of code
which meets your criterion, is much more extensible than
what you have, and doesn't need your proposal.

   ======
import re

def change_prompt(line, state):
  state.prompt = line[7:]

def quit(line, state):
  print "Later!"
  raise SystemExit(0)

def set(line, state):
  words = line.split()
  state.values[words[1]] = words[2]

def show(line, state):
  words = line.split()
  v = words[1]
  print state.values.get(v)

def has_balanced_parens(s):
  n = 0
  has_parens = 0
  for c in s:
    if c == "(":
      n = n + 1
      has_parens = 1
    elif c == ")":
      n = n - 1
      if n < 0:
        return 0
  return has_parens and n == 0

def yippee(line, state):
  print "Yippee!"

def dump(line, state):
  for k, v in state.values.items():
    print "%s -> %s" % (k, v)

def clear(line, state):
  state.values.clear()

actions = (
  (re.compile(r"prompt=").match, change_prompt),
  (re.compile(r"quit$").match, quit),
  (re.compile(r"n").match, quit),
  (re.compile(r"set [^ ]+ [^ ]+$").match, set),
  (re.compile(r"print [^ ]+$").match, show),
  (re.compile(r"dump$").match, dump),
  (re.compile(r"clear$").match, clear),
  (has_balanced_parens, yippee),
)

class State:
  def __init__(self):
    self.prompt = "> "
    self.values = {}

state = State()
while 1:
  try:
    line = raw_input(state.prompt)
  except EOFError:
    break
  for action in actions:
    if action[0](line):
      action[1](line, state)
      break
  else:
    print "Could not understand:", repr(line)

 =====

Here's an example interaction to show it really does work

% python spam.py
> set a Andrew
> print a
Andrew
> set x 9
> dump
x -> 9
a -> Andrew
> clear
> dump
> prompt=Whaddya want?
Whaddya want? q=9
Could not understand: 'q=9'
Whaddya want? ()())
Could not understand: '()())'
Whaddya want? ()()
Yippee!
Whaddya want? nothing
Later!
%

>In practice it is not always a good idea to put every code block in a
>separate function, although it might be an ideal goal in some style.

Again, despite my counterexamples I saying this is the style that
much be used in all cases.  I'm pointing out that the examples
you give are too limiting to tell if your proposal really has
merit.  There may be other solutions to the underlying problem which
are more general, more maintainable, and which don't need this ';'
ugliness.

>This example does not fit the pattern.

You're right.  I couldn't come up with a good example.  My
above change prompt/print/set code above is much better.

>This proposal is not meant to
>replace all the usage of "for" by "while".  On the other hand, your
>alternatives seem to recommend changing all the "while" to "for".

'Tis true.  I like iterators.  But perhaps you aren't fully
considering the alternatives when you come up with examples.

>>while expr:
>>  statements
>>break if expr:
>>  statements
>
>Now I do.  :-) (See near top.)  I'll add it to the PEP.  The main issue is
>that it does not distinguish two kinds of "break" (in terms of interaction
>with "else").

Could you clarify what the two types are?  Both breaks skip the
else clause.  (This is a repeat question from earlier in this post.)

>1. Is it practical to change all while loops to for-loops with iterators?

No, and I didn't propose that.  I only point out an alternative
solution exists, so the usefulness of your proposal is not clear cut.

>2. Is it practical to change all elif into nested scope with break /
return?

No, and I didn't propose that.  But for the cases you gave, the standard
solution is to use continue/break/return/exceptions.  When those are
appropriate solutions, the benefit of your proposal decreases.  So
you need to show a realistic example where the solutions using
existing Python constructs is ugly but which can be improved
(in succientness and readability) with your proposal.

My beliefs are that:
  1) most of your examples are too contrived/limited to provide a
      useful understanding of the advantage of your proposal.

  2) semicolon-ed statements lead to less understandable code

>I'd guess no.  Of course for any simple example it would appear so, and any
>complicated example may appear contrived.

It's easy.  Scan the standard library and find a chunk of code which
you believe will be improved with your construct.

                    Andrew
                    dalke at dalkescientific.com