[Python-ideas] Protecting finally clauses of interruptions

Paul Colomiets paul at colomiets.name
Wed Apr 4 22:43:05 CEST 2012


Hi Yury,

On Wed, Apr 4, 2012 at 10:37 PM, Yury Selivanov <yselivanov.ml at gmail.com> wrote:
> On 2012-04-04, at 2:44 PM, Paul Colomiets wrote:
>> I have a global timeout for processing single request. It's actually higher
>> in a chain of generator calls. So dispatcher looks like:
>>
>> def dispatcher(self, method, args):
>>    with timeout(10):
>>        yield getattr(self.method)(*args)
>
> How does it work?  To what object are you actually attaching timeout?
>

There is basically a "Coroutine" object. It's actually a list with
paused generators, with top of them being currently running
(or stopped for doing IO). It represents stack, because there
is no built-in stack for generators.

> And what's the advantage of having some "global" timeout instead
> of a timeout specifically bound to some coroutine?
>

We have guaranteed time of request processing. Or to
be more precise guaranteed time when request stops
processing so we don't have a lot of coroutines hanging
forever. It allows to not to place timeouts all over the code.

May be your use case is very different. E.g. this pattern
doesn't work well with batch processing of big data. We
process many tiny user requests per second.

> Do you have that code publicly released somewhere?  I just really want
> to understand how exactly your architecture works to come with a
> better proposal (if there is one possible ;).
>

This framework does timeout handling in described way:

https://github.com/tailhook/zorro

Although, it's using greenlets. The difference is that we
we don't need to keep a stack in our own scheduler
when using greenlets, but everything else applies.

> As an off-topic: would be interesting to have various coroutines
> approaches and architectures listed somewhere, to understand how
> python programmers actually do it.
>

Sure :)

>> And all the local timeouts, like timeout for single request are
>> usually applied at a socket level, where specific protocol
>> is implemented:
>>
>> def redis_unlock(lock):
>>    yield redis_socket.wait_write(2)  # wait two seconds
>>   # TimeoutError may have been raised in wait_write()
>>    cmd = ('DEL user:'+lock+'\n').encode('ascii')
>>    redis_socket.write(cmd)  # should be loop here, actually
>>    yield redis_socket.wait_read(2)  # another two seconds
>>    result = redis_socket.read(1024)  # here loop too
>>    assert result == 'OK\n'
>
> So you have explicit timeouts in the 'redis_unlock', but you want
> them to be ignored if it was called from some 'finally' block?
>

No! I'd just omit them if I wanted. I don't want interruption of
`add_money` which calls `redis_unlock` in finally to be done.

>> So they are not interruptions. Although, we don't use them
>> much with coroutines, global timeout for request is
>> usually enough.
>
> Don't really follow you here.
>

You may think of it as socket with timeout set.

socket.set_timeout(2)
socket.recv(1024)

It will raise TimeoutError, this should propagate as
a normal exception. As opposed to being externally
interrupted e.g. with SIGINT or SIGALERT.

>> But anyway I don't see a reason to protect a single frame,
>> because even if you have a simple mutex without coroutines
>> you end up with:
>>
>> def something():
>>  lock.acquire()
>>  try:
>>    pass
>>  finally:
>>    lock.release()
>>
>> And if lock's imlementation is something along the lines of:
>>
>> def release(self):
>>    self._native_lock.release()
>>
>> How would you be sure that interruption is not executed
>> when interpreter resolved `self._native_lock.release` but
>> not yet called it?
>
> Is it in a context of coroutines or threads?

I don't see a difference, except the code which maintains
stack. I'd say both are problem, if you neither propagate
f_in_finally nor traverse a stack (although, a way of
propagation may be different)

> If former, the
> because you, perhaps, want to interrupt 'something()'?

I want to interrupt a thread. Or "Coroutine" in definition
described above (having a stack of frames) or in greenlet's
definition.

> And it is a
> separate frame from the frame where 'release()' is running?

Of course (How it can be inlined? :) )

>>> That's the second reason I don't like your proposal.
>>>
>>> def foo():
>>>   try:
>>>      ..
>>>   finally:
>>>      yield unlock()
>>>   # <--- the ideal point to interrupt foo
>>>
>>>   f = open('a', 'w')
>>>   # what if we interrupt it here?
>>>   try:
>>>      ..
>>>   finally:
>>>      f.close()
>>>
>>
>> And which one fixes this problem? There is no guarantee
>> that your timeout code haven't interrupted
>> at " # what if we interrupt it here?". If it's a bit less likely,
>> it's not real solution. Please, don't present it as such.
>
> Sorry, I must had it explained in more details.  Right now we
> interrupt code only where we have a 'yield', a greenlet.switch(),
> or at the end of finally block, not at some arbitrary opcode.
>

Sure I do similar. But it doesn't work with threads, as
they have no explicit yield or switch.


On Wed, Apr 4, 2012 at 11:07 PM, Yury Selivanov <yselivanov.ml at gmail.com> wrote:
> On 2012-04-04, at 2:44 PM, Paul Colomiets wrote:
>
>> But anyway I don't see a reason to protect a single frame,
>> because even if you have a simple mutex without coroutines
>> you end up with:
>
> BTW, for instance, in our framework each coroutine is a special
> object that wraps generator/plain function.  It controls everything
> that the underlying generator/function yields/returns.  But the actual
> execution, propagation of returned values and raised errors is the
> scheduler's job.  So when you are yielding a coroutine from another
> coroutine, frames are not even connected to each other, since the
> actual execution of the callee will be performed by the scheduler.
> It's not like a regular python call.
>

Same applies here. But you propagate return value/error right?
So you can't say "frames are not connected". They aren't from
the interpreter point of view. But they are logically connected.

So for example:

def a():
  yield b()

def b():
  yield

If `a().with_timeout(0.1)` is interrupted when it's waiting for value
of `b()`, will `b()` continue it's execution?

> For us, having 'f_in_finally' somehow propagated would be completely
> useless.
>

I hope I can convince you with this email :)

> I think even if it's decided to implement just your proposal, I feel
> that 'f_in_finally' should indicate the state of only its *own* frame.

That was original intention. But it requires stack traversing. Andrew
proposed to propagate this flag, which is another point of view
on the same thing (not sure which one to pick though)

-- 
Paul



More information about the Python-ideas mailing list