[Python-Dev] Re: anonymous blocks

Guido van Rossum gvanrossum at gmail.com
Tue Apr 26 13:37:47 CEST 2005


[Greg Ewing]
> I like the general shape of this, but I have one or two
> reservations about the details.

That summarizes the feedback so far pretty well. I think we're on to
something. And I'm not too proud to say that Ruby has led the way here
to some extent (even if Python's implementation would be fundamentally
different, since it's based on generators, which has some different
possibilities and precludes some Ruby patterns).

> 1) We're going to have to think carefully about the naming of
> functions designed for use with this statement. If 'with'
> is going to be in there as a keyword, then it really shouldn't
> be part of the function name as well.

Of course. I only used 'with_opened' because it's been the running
example in this thread.

> I would rather see something like
> 
>    with f = opened(pathname):
>      ...
> 
> This sort of convention (using a past participle as a function
> name) would work for some other cases as well:
> 
>    with some_data.locked():
>      ...
> 
>    with some_resource.allocated():
>      ...


Or how about

    with synchronized(some_resource):
        ...

> On the negative side, not having anything like 'with' in the
> function name means that the fact the function is designed for
> use in a with-statement could be somewhat non-obvious. Since
> there's not going to be much other use for such a function,
> this is a bad thing.

This seems a pretty mild problem; one could argue that every function
is only useful in a context where its return type makes sense, and we
seem to be getting along just fine with naming conventions (or just
plain clear naming).

> It could also lead people into subtle usage traps such as
> 
>    with f = open(pathname):
>      ...
> 
> which would fail in a somewhat obscure way.

Ouch. That one hurts. (I was going to say "but f doesn't have a next()
method" when I realized it *does*. :-) It is *almost* equivalent to

    for f in open(pathname):
        ...

except if the "..." block raises an exception.  Fortunately your
proposal to use 'as' makes this mistake less likely.

> So maybe the 'with' keyword should be dropped (again!) in
> favour of
> 
>    with_opened(pathname) as f:
>      ...

But that doesn't look so great for the case where there's no variable
to be assigned to -- I wasn't totally clear about it, but I meant the
syntax to be

    with [VAR =] EXPR: BLOCK

where VAR would have the same syntax as the left hand side of an
assignment (or the variable in a for-statement).

> 2) I'm not sure about the '='. It makes it look rather deceptively
> like an ordinary assignment, and I'm sure many people are going
> to wonder what the difference is between
> 
>    with f = opened(pathname):
>      do_stuff_to(f)
> 
> and simply
> 
>    f = opened(pathname)
>    do_stuff_to(f)
> 
> or even just unconsciously read the first as the second without
> noticing that anything special is going on. Especially if they're
> coming from a language like Pascal which has a much less magical
> form of with-statement.

Right.

> So maybe it would be better to make it look more different:
> 
>    with opened(pathname) as f:
>      ...

Fredrik said this too, and as long as we're going to add 'with' as a
new keyword, we might as well promote 'as' to become a real
keyword. So then the syntax would become

    with EXPR [as VAR]: BLOCK

I don't see a particular need for assignment to multiple VARs (but VAR
can of course be a tuple of identifiers).

> * It seems to me that this same exception-handling mechanism
> would be just as useful in a regular for-loop, and that, once
> it becomes possible to put 'yield' in a try-statement, people
> are going to *expect* it to work in for-loops as well.

(You can already put a yield inside a try-except, just not inside a
try-finally.)

> Guido has expressed concern about imposing extra overhead on
> all for-loops. But would the extra overhead really be all that
> noticeable? For-loops already put a block on the block stack,
> so the necessary processing could be incorporated into the
> code for unwinding a for-block during an exception, and little
> if anything would need to change in the absence of an exception.

Probably.

> However, if for-loops also gain this functionality, we end up
> with the rather embarrassing situation that there is *no difference*
> in semantics between a for-loop and a with-statement!

There would still be the difference that a for-loop invokes iter() and
a with-block doesn't.

Also, for-loops that don't exhaust the iterator leave it available for
later use. I believe there are even examples of this pattern, where
one for-loop searches the iterable for some kind of marker value and
the next for-loop iterates over the remaining items. For example:

    f = open(messagefile)
    # Process message headers
    for line in f:
        if not line.strip():
            break
        if line[0].isspace():
            addcontinuation(line)
        else:
            addheader(line)
    # Process message body
    for line in f:
        addbody(line)

> This could be "fixed" by making the with-statement not loop,
> as has been suggested. That was my initial thought as well,
> but having thought more deeply, I'm starting to think that
> Guido was right in the first place, and that a with-statement
> should be capable of looping. I'll elaborate in another post.

So perhaps the short description of a with-statement that we give to
newbies could be the following:

    """
    The statement:

        for VAR in EXPR:
            BLOCK

    does the same thing as:

        with iter(EXPR) as VAR:        # Note the iter() call
            BLOCK

    except that:

    - you can leave out the "as VAR" part from the with-statement;
    - they work differently when an exception happens inside BLOCK;
    - break and continue don't always work the same way.

    The only time you should write a with-statement is when the
    documentation for the function you are calling says you should.
    """

> > So a block could return a value to the generator using a return
> > statement; the generator can catch this by catching ReturnFlow.
> > (Syntactic sugar could be "VAR = yield ..." like in Ruby.)
> 
> This is a very elegant idea, but I'm seriously worried by the
> possibility that a return statement could do something other
> than return from the function it's written in, especially if
> for-loops also gain this functionality.

But they wouldn't!

> Intercepting break
> and continue isn't so bad, since they're already associated
> with the loop they're in, but return has always been an
> unconditional get-me-out-of-this-function. I'd feel uncomfortable
> if this were no longer true.

Me too.

Let me explain the use cases that led me to throwing that in (I ws
running out of time and didn't properly explain it) and then let me
propose an alternative.  This is a bit long, but important!

*First*, in the non-looping use cases (like acquiring and releasing a
lock), a return-statement should definitely be allowed when the
with-statement is contained in a function.  There's lots of code like
this out there:

    def search(self, eligible, default=None):
        self.lock.acquire()
        try:
            for item in self.elements:
                if eligible(item):
                    return item
            # no eligible iems
            return default
        finally:
            self.lock.release()

and this translates quite nicely to a with-statement:

    def search(self, eligible, default=None):
        with synchronized(self.lock):
            for item in self.elements:
                if eligible(item):
                    return item
            # no eligible iems
            return default

*Second*, it might make sense if break and continue would be handled
the same way; here's an example:

    def alt_search(self):
        for item in self.elements:
            with synchronized(item):
               if item.abandoned():
                   continue
               if item.eligible():
                   break
        else:
            item = self.default_item
        return item.post_process()

(I realize the case for continue isn't as strong as that for break,
but I think we have to support both if we support one.)

*Third*, if there is a try-finally block around a yield in the
generator, the finally clause absolutely must be executed when control
leaves the body of the with-statement, whether it is through return,
break, or continue.  This pretty much means these have to be turned
into some kind of exception.  So the first example would first be
transformed into this:

    def search(self, eligible, default=None):
        try:
            with synchronized(self.lock):
                for item in self.elements:
                    if eligible(item):
                        raise ReturnFlow(item)  # was "return item"
                # no eligible iems
                raise ReturnFlow(default)    # was "return default"
        except ReturnFlow, exc:
            return exc.value

before applying the transformation of the with-statement, which I
won't repeat here (look it up in my previous long post in this thread).
(BTW I do agree that it should use __next__(), not next_ex().)

I'm assuming the following definition of the ReturnFlow exception:

    class ReturnFlow(Exception):
        def __init__(self, value=None):
             self.value = value

The translation of break into raise BreakFlow() and continue into rase
ContinueFlow() is now obvious.  (BTW ReturnFlow etc. aren't great
names.  Suggestions?)

*Fourth*, and this is what makes Greg and me uncomfortable at the same
time as making Phillip and other event-handling folks drool: from the
previous three points it follows that an iterator may *intercept* any
or all of ReturnFlow, BreakFlow and ContinueFlow, and use them to
implement whatever cool or confusing magic they want.  For example, a
generator can decide that for the purposes of break and continue, the
with-statement that calls it is a loop, and give them the usual
semantics (or the opposite, if you're into that sort of thing :-).  Or
a generator can receive a value from the block via a return statement.

Notes:

- I think there's a better word than Flow, but I'll keep using it
  until we find something better.

- This is not limited to generators -- the with-statement uses an
  arbitrary "new-style" iterator (something with a __next__() method
  taking an optional exception argument).

- The new __next__() API can also (nay, *must*, to make all this work
  reliably) be used to define exception and cleanup semantics for
  generators, thereby rendering obsolete PEP 325 and the second half
  of PEP 288.  When a generator is GC'ed (whether by reference
  counting or by the cyclical garbage collector), its __next__()
  method is called with a BreakFlow exception instance as argument (or
  perhaps some other special exception created for the purpose).  If
  the generator catches the exception and yields another value, too
  bad -- I consider that broken behavior.  (The alternative would be
  to keep calling __next__(BreakFlow()) until it doesn't return a
  value, but that feels uncomfortable in a finalization context.)

- Inside a with-statement, user code raising a Flow exception acts the
  same as the corresponding statement.  This is slightly unfortunate,
  because it might lead one to assume that the same is true for
  example in a for-loop or while-loop, but I don't want to make that
  change.  I don't think it's a big problem.

Given that 1, 2 and 3 combined make 4 inevitable, I think we might as
well give in, and *always* syntactically accept return, break and
continue in a with-statement, whether or not it is contained in a loop
or function.  When the iterator does not handle the Flow exceptions,
and there is no outer context in which the statement is valid, the
Flow exception is turned into an IllegalFlow exception, which is the
run-time equivalent of SyntaxError: 'return' outside function (or
'break' outside loop, etc.).

Now there's one more twist, which you may or may not like.  Presumably
(barring obfuscations or bugs) the handling of BreakFlow and
ContinueFlow by an iterator (or generator) is consistent for all uses
of that particular iterator.  For example synchronized(lock) and
transactional(db) do not behave as loops, and forever() does.  Ditto
for handling ReturnFlow.  This is why I've been thinking of leaving
out the 'with' keyword: in your mind, these calls would become new
statement types, even though the compiler sees them all the same:

    synchronized(lock):
        BLOCK

    transactional(db):
        BLOCK

    forever():
        BLOCK

    opening(filename) as f:
        BLOCK

It does require the authors of such iterators to pick good names, and
it doesn't look as good when the iterator is a method of some object:

    self.elements[0].locker.synchronized():
        BLOCK

You proposed this too (and I even commented on it, ages ago in this
same endless message :-) and while I'm still on the fence, at least I
now have a better motivational argument (i.e., that each iterator
becomes a new statement type in your mind).

One last thing: if we need a special name for iterators and generators
designed for use in a with-statement, how about calling them
with-iterators and with-generators.  The non-looping kind can be
called resource management iterators / generators.  I think whatever
term we come up with should not be a totally new term but a
combination of iterator or generator with some prefix, and it should
work both for iterators and for generators.

That's all I can muster right now (I should've been in bed hours ago)
but I'm feeling pretty good about this.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-Dev mailing list