In asyncio, when a task awaits for another task (or future), it can be
cancelled right after the awaited task finished. Thus, if the awaited task
has consumed data, the data is lost.
For instance, with the following code:
import asyncio
available_data = []
data_ready = asyncio.Future()
def feed_data(data):
global data_ready
available_data.append(data)
data_ready.set_result(None)
data_ready = asyncio.Future()
async def consume_data():
while not available_data:
await asyncio.shield(data_ready)
return available_data.pop()
async def wrapped_consumer():
task = asyncio.ensure_future(consume_data())
return await task
If I perform those exact steps:
async def test():
task = asyncio.ensure_future(wrapped_consumer())
await asyncio.sleep(0)
feed_data('data')
await asyncio.sleep(0)
task.cancel()
await asyncio.sleep(0)
print ('task', task)
print ('available_data', available_data)
loop = asyncio.get_event_loop()
loop.run_until_complete(test())
Then I can see that the task has been cancelled despite the data being
consumed. Since the result of `wrapped_consumer` cannot be retrieved, the
data is forever lost.
task <Task cancelled coro=<wrapped_consumer() done, defined at
<ipython-input-1-de4ad193b1d0>:17>>
available_data []
This side effect does not happen when awaiting a coroutine, but coroutines
are not as flexible as tasks (unless manipulated as a generator). It
happens when awaiting a `Future`, a `Task`, or any function like
`asyncio.wait`, `asyncio.wait_for` or `asyncio.gather` (which all inherit
from or use `Future`). There is then no way to do anything equivalent to:
stop_future = asyncio.Future()
async def wrapped_consumer2():
task = asyncio.ensure_future(consume_data())
try:
await asyncio.wait([task, stop_future])
finally:
task.cancel()
if not task.cancelled():
return task.result()
else:
raise RuntimeError('stopped')
This is due to the Future calling the callback asynchronously:
https://github.com/python/cpython/blob/3.6/Lib/asyncio/futures.py#L214
for callback in callbacks:
self._loop.call_soon(callback, self)
I propose to create synchronous versions of those, or a
`synchronous_callback` parameter, that turns the callbacks of `Future`
synchronous. I've experimented a simple library `syncio` with CPython 3.6
to do this (it is harder to patch later versions due to the massive use of
private methods).
Basically, needs to:
1) replace the `Future._schedule_callbacks` method by a synchronous version
2) fix `Task._step` to not fail when cleaning `_current_tasks` (
https://github.com/python/cpython/blob/3.6/Lib/asyncio/tasks.py#L245)
3) rewrite all the functions to use synchronous futures instead of normal
ones
With that library, the previous functions are possible and intuitive
import syncio
async def wrapped_consumer():
task = syncio.ensure_sync_future(consume_data())
return await task
stop_future = asyncio.Future()
async def wrapped_consumer2():
task = syncio.ensure_sync_future(consume_data())
try:
await syncio.sync_wait([task, stop_future])
finally:
task.cancel()
if not task.cancelled():
return task.result()
else:
raise RuntimeError('stopped')
No need to use `syncio` anywhere else in the code, which makes it totally
transparent for the end user. `wrapped_consumer` and `wrapped_consumer2`
are now cancelled if and only if the data hasn't been consumed, whatever is
the order of the steps (and the presence of `asyncio.sleep`).
This "library" can be found here:
https://github.com/aure-olli/aiokafka/blob/3acb88d6ece4502a78e230b234f47b90…
It implements `SyncFuture`, `SyncTask`, `ensure_sync_future`, `sync_wait`,
`sync_wait_for`, `sync_gather` and `sync_shield`. It works with CPython 3.6.
To conclude:
- asynchronous callbacks are preferable in most cases, but do not provide a
coherent cancelled status in specific cases
- implementing a version with synchronous callback (or a
`synchronous_callback` parameter) is rather easy (however step 2 need to be
clarified, probably a cleaner way to fix this)
- it is totally transparent for the end user, as synchronous callbacks are
totally compatible with asynchronous ones
I think it looks very fine when you type {1, 2, 3} * {"a", "b", "c"} and
get set(itertools.product({1, 2, 3}, {"a", "b", "c"})). So i am proposing
set multiplication implementation as cartesian product.
>>>
And another message that was rejected (I sent from an unregistered email
address)
On Sat, Jul 27, 2019 at 1:49 AM Serhiy Storchaka <storchaka(a)gmail.com>
wrote:
> 26.07.19 21:52, Bruce Leban пише:
>
>
> To put this in a simpler way: the proposal is to add an except clause that
> applies ONLY to the direct operation of the with or for statement and not
> to the block. That's an interesting idea.
>
> The one thing I find confusing about your proposal is that the proposed
> syntax does not imply the behavior. In a try statement, the except appears
> at the end and after all possible statements that it could cover. The
> proposal mimics that syntax but with different semantics. Something like
> this would be much more clear what is going on:
>
> for VARIABLE in EXPRESSION:
> except EXCEPTION:
> BLOCK
> BLOCK
>
> with EXPRESSION as VARIABLE:
> except EXCEPTION:
> BLOCK
> BLOCK
>
> while EXPRESSION:
> except EXCEPTION:
> BLOCK
> BLOCK
>
>
> Besides an unusual for Python layout (a clause has different indentation
> than the initial clause of the statement to which it belongs) there is
> other problem. The exception block is not the part of the "for" or "with"
> block. After handling an exception in the "for" clause you do not continue
> to execute the "for" block, but leave the loop. After handling an exception
> in the "with" clause you do not continue to execute the "with" block and do
> not call `__exit__` when leave it. To me, this syntax is much more
> confusing than my initial proposition.
>
And I find it less confusing. And neither of those is the standard to use.
The goal is for syntax to imply semantics (which my proposal does and I do
not think yours does, given several people commenting that they thought it
applied to the entire loop) and to choose syntax that is more clear to more
people (which requires more than two peoples' opinions).
Consider how you would write this if everything was an expression in Python
and we had braces:
for VAR in ( EXPR except EXCEPTION: { BLOCK; break; } ):
BLOCK
I do agree that it is not obvious that the exception block breaks out of
the loop. I think in actual code it will be fairly obvious what's happening
as continuing into the loop when the loop expression through an expression
doesn't make sense. I'm open to alternatives. On the other hand, an except
clause at the bottom of the loop that does not apply to the loop body is
going to catch me every time I see it.
--- Bruce
I sent this message earlier but it was rejected by the mailer.
On Fri, Jul 26, 2019 at 11:27 AM Serhiy Storchaka <storchaka(a)gmail.com>
wrote:
>
> So you will be able to add errors handling like in:
>
> with connect() as stream:
> for data in stream:
> try:
> write(data)
> except OSError:
> handle_write_error()
> except OSError:
> handle_read_error()
> except OSError:
> handle_connection_error()
>
To put this in a simpler way: the proposal is to add an except clause that
applies ONLY to the direct operation of the with or for statement and not
to the block. That's an interesting idea.
The one thing I find confusing about your proposal is that the proposed
syntax does not imply the behavior. In a try statement, the except appears
at the end and after all possible statements that it could cover. The
proposal mimics that syntax but with different semantics. Something like
this would be much more clear what is going on:
for VARIABLE in EXPRESSION:
except EXCEPTION:
BLOCK
BLOCK
with EXPRESSION as VARIABLE:
except EXCEPTION:
BLOCK
BLOCK
while EXPRESSION:
except EXCEPTION:
BLOCK
BLOCK
or rewriting your example:
with connect() as stream:
except OSError:
handle_connection_error()
for data in stream:
except OSError:
handle_read_error()
try:
write(data)
except OSError:
handle_write_error()
Hi,
I don't know if this was already debated but I don't know how to search
in the whole archive of the list.
For now the adoption of pyproject.toml file is more difficult because
toml is not in the standard library.
Each tool which wants to use pyproject.toml has to add a toml lib as a
conditional or hard dependency.
Since toml is now the standard configuration file format, it's strange
the python does not support it in the stdlib lije it would have been
strange to not have the configparser module.
I know it's complicated to add more and more thing to the stdlib but I
really think it is necessary for python packaging being more consistent.
Maybe we could thought to a readonly lib to limit the added code.
If it's conceivable, I'd be happy to help in it.
Nice Day guys and girls.
Jimmy
Forward to the list because Abusix had blocked google.com initially.
Nam
---------- Forwarded message ---------
From: Nam Nguyen <bitsink(a)gmail.com>
Date: Sun, Jul 28, 2019 at 10:18 AM
Subject: Re: [Python-ideas] Re: Universal parsing library in the stdlib to
alleviate security issues
To: Sebastian Kreft <skreft(a)gmail.com>
Cc: Paul Moore <p.f.moore(a)gmail.com>, python-ideas <python-ideas(a)python.org>
Let's circle back to the beginning one last time ;).
On Thu, Jul 25, 2019 at 8:15 AM Sebastian Kreft <skreft(a)gmail.com> wrote:
> Nam, I think it'd be better to frame the proposal as a security
> enhancement. Stating some of the common bugs/gotchas found when manually
> implementing parsers, and the impact this has had on python over the years.
> Seeing a full list of security issues (CVEs) by module would give us a
> sense of how widespread the problem is.
>
Since my final exam was done this weekend, I gathered some more info into
this spreadsheet.
https://docs.google.com/spreadsheets/d/1TlWSf8iM7eIzEPXanJAP8Ztyzt4ZD28xFvU…
I think a strict parser can help with the majority of those problems. They
are in HTTP headers, emails, cookies, URLs, and even low level socket code
(inet_atoi).
> Then survey the stdlib for what kind of grammars are currently being
> parsed, what ad-hoc parsing strategy are implemented and provide examples
> of whether having a general purpose parser would have prevented the
> security issues you have previously cited.
>
Most grammars I have seen here come straight from RFCs, which are in ABNF
and thus context-free. Current implementations are based on regexes or
string splitting. My previous example showed that at least 30500, 36216,
36742 were non-issues if we started out with a strict parser.
>
> Right now, it is not clear what the impact of such refactor would be, nor
> the worth of such attempt.
>
Exactly the kind of response I'm looking for. It is okay to suggest that
the benefits aren't clear or that there are requirements X and Y that a
general parser won't be able to meet, but it's not convincing to brush
aside this because there is "existing, working code." Many of the bugs in
that sheet are still open. It's not comfortable to say the code is working
with a straight face as I have experienced with my own fix for 30500. I
just couldn't tell if it was doing the right thing.
>
> What others have said earlier is that you are the one that needs to
> provide some of the requirements for the proposed private parsing library.
> And from what I read from your emails you do have some ideas. For example,
> you want it to be easy to write and review (I guess here you would
> eventually like it to be a close translation from whatever is specified in
> the RFC or grammar specification).
>
Yes, that's the most important point because "readability counts." It's
hard to reason about correctness when there are many transformations
between the authoritative spec and the implementation. I definitely don't
want to touch the regexes, string splits, and custom logic that I don't
understand "why" they are that way in the beginning. How do I, for example,
know what this regex is about
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
(It's from RFC 3986.)
But you also need to take into consideration some of the list's concerns,
> the parser library has to be performant, as a performance regression is
> likely not to be tolerable.
>
Absolutely. That's where I need inputs from the list. I have provided my
own set of requirements for such a parser library. I'm sure most of us have
different needs too. So if a parser library can help you, let's hear what
you want from it. If you think it can't, please let me understand why.
Thanks,
Nam
>
>
> On Thu, Jul 25, 2019 at 10:54 AM Nam Nguyen <bitsink(a)gmail.com> wrote:
>
>> On Thu, Jul 25, 2019 at 2:32 AM Paul Moore <p.f.moore(a)gmail.com> wrote:
>>
>>> On Thu, 25 Jul 2019 at 02:16, Nam Nguyen <bitsink(a)gmail.com> wrote:
>>> > Back to my original requests to the list: 1) Whether we want to have a
>>> (possibly private) parsing library in the stdlib
>>>
>>> In the abstract, no. Propose a specific library, and that answer would
>>> change to "maybe".
>>>
>>
>> I have no specific library to propose. I'm looking for a list of features
>> such a library should have.
>>
>>
>>>
>>> > and 2) What features it should have.
>>>
>>> That question only makes sense if you get agreement to the abstract
>>> proposal that "we should add a parsing library. And as I said, I don't
>>> agree to that so I can't answer the second question.
>>>
>>
>> As Chris summarized it correctly, I am advocating for a general solution
>> to individual problems (which have the same nature). We can certainly solve
>> the problems when they are reported, or we can take a proactive approach to
>> make them less likely to occur. I am talking about a class of input
>> validation issues here and I thought parsing would be a very natural
>> solution to that. This is quite similar to a context-sensitive templating
>> library that prevents cross-site-scripting on the output side. So I don't
>> know why (or what it takes) to convince people that it's a good thing(tm).
>>
>>
>>>
>>> Generally, things go into the stdlib when they have been developed
>>> externally and proved their value. The bar for designing a whole
>>> library from scratch, "specifically" targeted at stdlib inclusion, is
>>> very high, and you're nowhere near reaching it IMO.
>>>
>>
>> This is a misunderstanding. I have not proposed any from-scratch, or
>> existing library to be used. And on this note, please allow me to make it
>> clear once more time that I am not asking for a publicly-facing library
>> either.
>>
>>
>>>
>>> > These are good points to set as targets! What does it take for me to
>>> get the list to agree on one such set of criteria?
>>>
>>> You need to start by getting agreement on the premise that adding a
>>> newly-written parser to the stdlib is a good idea. And so far your
>>> *only* argument seems to be that "it will avoid a class of security
>>> bugs" which I find extremely unconvincing (and I get the impression
>>> others do, too).
>>
>>
>> Why? What is unconvincing about a parsing library being able... parse
>> (and therefore, validate) inputs?
>>
>>
>>> But even if "using a real parser" was useful in that
>>> context, there's *still* no argument for writing one from scratch,
>>> rather than using an existing, proven library.
>>
>>
>> Never a goal.
>>
>>
>>> At the most basic
>>> level, what if there's a bug in your new parsing library? If we're
>>> using it in security-critical code, such a bug would be a
>>> vulnerability just like the ones you're suggesting your parser would
>>> avoid. Are you asking us to believe that your code will be robust
>>> enough to trust over code that's been used in production systems for
>>> years?
>>>
>>> I think you need to stop getting distracted by details, and focus on
>>> your stated initial request "Whether we want to have a (possibly
>>> private) parsing library in the stdlib". You don't seem to me to have
>>> persuaded anyone of this basic suggestion yet,
>>
>>
>> Good observation. How do I convince you that complex input validation
>> tasks should be left to a parser?
>>
>> Thanks!
>> Nam
>>
>> _______________________________________________
>> Python-ideas mailing list -- python-ideas(a)python.org
>> To unsubscribe send an email to python-ideas-leave(a)python.org
>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-ideas@python.org/message/FCPU4…
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
> --
> Sebastian Kreft
>
Minimal strawman proposal. New keyword debug.
debug EXPRESSION
Executes EXPRESSION when in debug mode.
debug context
Prints all the variables of the enclosing closure and all the variable names accessed within that block. For example, if in foo you access the global variable spam, spam would be printed. The format would be:
variableName: value
variableTwo: value
where "value" is the repr() of the variable.
Separated by new lines. The exact output format would not be part of the spec.
?identifier
would print "identifier: value." Repr as before. Using this in non-debug mode emits a warning.
?identifier.property.property
is also valid.
A new property descriptor on the global variable, “debugger.” This is an alias for importing PDB and causing the debugger to pause there.
The behavior of this descriptor in non-debug mode is TBD.
Debug mode may be specified per-module at interpreter launch.
Sent from my iPhone
Begin forwarded message:
> From: James Lu <jamtlu(a)gmail.com>
> Date: July 28, 2019 at 6:22:11 PM EDT
> To: Andrew Barnert <abarnert(a)yahoo.com>
> Subject: Re: [Python-ideas] Utilities for easier debugging
>
>
>> On Jul 28, 2019, at 4:26 PM, Andrew Barnert <abarnert(a)yahoo.com> wrote:
>>
>> This would break iPython’s improved interactive console, which already uses this syntax to provide a similar feature.
> If it’s so similar, I doubt it would break anything. This is intended to make it easier to log variables for a complex application (multiple files).
Sent from my iPhone
Begin forwarded message:
> From: James Lu <jamtlu(a)gmail.com>
> Date: July 28, 2019 at 6:21:04 PM EDT
> To: Andrew Barnert <abarnert(a)yahoo.com>
> Subject: Re: [Python-ideas] Utilities for easier debugging
>
>
>> On Jul 28, 2019, at 4:26 PM, Andrew Barnert <abarnert(a)yahoo.com> wrote:
>>
>> Why not just allow anything that’s valid as a target, like identifier[expr]? Or even just any expression at all? Is there an advantage to defining and using a similar but more limited syntax here?
> Hmm, it should support any expression. I limited it at first because it was a minimal strawman.
Sent from my iPhone
Begin forwarded message:
> From: James Lu <jamtlu(a)gmail.com>
> Date: July 28, 2019 at 6:21:04 PM EDT
> To: Andrew Barnert <abarnert(a)yahoo.com>
> Subject: Re: [Python-ideas] Utilities for easier debugging
>
>
>> On Jul 28, 2019, at 4:26 PM, Andrew Barnert <abarnert(a)yahoo.com> wrote:
>>
>> Why not just allow anything that’s valid as a target, like identifier[expr]? Or even just any expression at all? Is there an advantage to defining and using a similar but more limited syntax here?
> Hmm, it should support any expression. I limited it at first because it was a minimal strawman.