"read-write" synchronization
I'm relatively new to async programming in Python and am thinking through possibilities for doing "read-write" synchronization. I'm using asyncio, and the synchronization primitives that asyncio exposes are relatively simple [1]. Have options for async read-write synchronization already been discussed in any detail? I'm interested in designs where "readers" don't need to acquire a lock -- only writers. It seems like one way to deal with the main race condition I see that comes up would be to use loop.time(). Does that ring a bell, or might there be a much simpler way? Thanks, --Chris [1] https://docs.python.org/3/library/asyncio-sync.html
There is https://github.com/aio-libs/aiorwlock On Mon, Jun 26, 2017 at 12:13 AM Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
I'm relatively new to async programming in Python and am thinking through possibilities for doing "read-write" synchronization.
I'm using asyncio, and the synchronization primitives that asyncio exposes are relatively simple [1]. Have options for async read-write synchronization already been discussed in any detail?
I'm interested in designs where "readers" don't need to acquire a lock -- only writers. It seems like one way to deal with the main race condition I see that comes up would be to use loop.time(). Does that ring a bell, or might there be a much simpler way?
Thanks, --Chris
[1] https://docs.python.org/3/library/asyncio-sync.html _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
-- Thanks, Andrew Svetlov
Thank you. I had seen that, but it seems heavier weight than needed. And it also requires locking on reading. --Chris On Sun, Jun 25, 2017 at 2:16 PM, Andrew Svetlov <andrew.svetlov@gmail.com> wrote:
There is https://github.com/aio-libs/aiorwlock
On Mon, Jun 26, 2017 at 12:13 AM Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
I'm relatively new to async programming in Python and am thinking through possibilities for doing "read-write" synchronization.
I'm using asyncio, and the synchronization primitives that asyncio exposes are relatively simple [1]. Have options for async read-write synchronization already been discussed in any detail?
I'm interested in designs where "readers" don't need to acquire a lock -- only writers. It seems like one way to deal with the main race condition I see that comes up would be to use loop.time(). Does that ring a bell, or might there be a much simpler way?
Thanks, --Chris
[1] https://docs.python.org/3/library/asyncio-sync.html _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
-- Thanks, Andrew Svetlov
The secret is that as long as you don't yield no other task will run so you don't need locks at all. On Jun 25, 2017 2:24 PM, "Chris Jerdonek" <chris.jerdonek@gmail.com> wrote:
Thank you. I had seen that, but it seems heavier weight than needed. And it also requires locking on reading.
--Chris
On Sun, Jun 25, 2017 at 2:16 PM, Andrew Svetlov <andrew.svetlov@gmail.com> wrote:
There is https://github.com/aio-libs/aiorwlock
On Mon, Jun 26, 2017 at 12:13 AM Chris Jerdonek < chris.jerdonek@gmail.com> wrote:
I'm relatively new to async programming in Python and am thinking through possibilities for doing "read-write" synchronization.
I'm using asyncio, and the synchronization primitives that asyncio exposes are relatively simple [1]. Have options for async read-write synchronization already been discussed in any detail?
I'm interested in designs where "readers" don't need to acquire a lock -- only writers. It seems like one way to deal with the main race condition I see that comes up would be to use loop.time(). Does that ring a bell, or might there be a much simpler way?
Thanks, --Chris
[1] https://docs.python.org/3/library/asyncio-sync.html _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
-- Thanks, Andrew Svetlov
Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
The read-write operations I'm protecting will have coroutines inside that need to be awaited on, so I don't think I'll be able to take advantage to that extreme. But I think I might be able to use your point to simplify the logic a little. (To rephrase, you're reminding me that context switches can't happen at arbitrary lines of code. I only need to be prepared for the cases where there's an await / yield from.) --Chris On Sun, Jun 25, 2017 at 2:30 PM, Guido van Rossum <gvanrossum@gmail.com> wrote:
The secret is that as long as you don't yield no other task will run so you don't need locks at all.
On Jun 25, 2017 2:24 PM, "Chris Jerdonek" <chris.jerdonek@gmail.com> wrote:
Thank you. I had seen that, but it seems heavier weight than needed. And it also requires locking on reading.
--Chris
On Sun, Jun 25, 2017 at 2:16 PM, Andrew Svetlov <andrew.svetlov@gmail.com> wrote:
There is https://github.com/aio-libs/aiorwlock
On Mon, Jun 26, 2017 at 12:13 AM Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
I'm relatively new to async programming in Python and am thinking through possibilities for doing "read-write" synchronization.
I'm using asyncio, and the synchronization primitives that asyncio exposes are relatively simple [1]. Have options for async read-write synchronization already been discussed in any detail?
I'm interested in designs where "readers" don't need to acquire a lock -- only writers. It seems like one way to deal with the main race condition I see that comes up would be to use loop.time(). Does that ring a bell, or might there be a much simpler way?
Thanks, --Chris
[1] https://docs.python.org/3/library/asyncio-sync.html _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
-- Thanks, Andrew Svetlov
Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
On Sun, Jun 25, 2017 at 4:54 PM Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
The read-write operations I'm protecting will have coroutines inside that need to be awaited on, so I don't think I'll be able to take advantage to that extreme.
But I think I might be able to use your point to simplify the logic a little. (To rephrase, you're reminding me that context switches can't happen at arbitrary lines of code. I only need to be prepared for the cases where there's an await / yield from.)
The "secret" Guido refers to we should pull out front and center, explicitly at all times - asynchronous programming is nothing more than cooperative multitasking. Patterns suited for preemptive multi-tasking (executive-based, interrupt based, etc.) are suspect, potentially misplaced when they show up in a cooperative multitasking context. To be a well-behaved (capable of effective cooperation) task in such a system, you should guard against getting embroiled in potentially blocking I/O tasks whose latency you are not able to control (within facilities available in a cooperative multitasking context). The raises a couple of questions: to be well-behaved, simple control flow is desireable (i.e. not nested layers of yields, except perhaps for a pipeline case); and "read/write" control from memory space w/in the process (since external I/O is generally not for async) begs the question: what for? Eliminate globals, encapsulate and limit access as needed theough usual programming methods. I'm sure someone will find an edgecase to challenge my above rule-of-thumb, but as you're new to this, I think this is a pretty good place to start. Ask yourself if what your trying to do w/ async is suited for async. Cheers, Yarko
--Chris
On Sun, Jun 25, 2017 at 2:30 PM, Guido van Rossum <gvanrossum@gmail.com> wrote:
The secret is that as long as you don't yield no other task will run so you don't need locks at all.
On Jun 25, 2017 2:24 PM, "Chris Jerdonek" <chris.jerdonek@gmail.com> wrote:
Thank you. I had seen that, but it seems heavier weight than needed. And it also requires locking on reading.
--Chris
On Sun, Jun 25, 2017 at 2:16 PM, Andrew Svetlov <andrew.svetlov@gmail.com> wrote:
There is https://github.com/aio-libs/aiorwlock
On Mon, Jun 26, 2017 at 12:13 AM Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
I'm relatively new to async programming in Python and am thinking through possibilities for doing "read-write" synchronization.
I'm using asyncio, and the synchronization primitives that asyncio exposes are relatively simple [1]. Have options for async read-write synchronization already been discussed in any detail?
I'm interested in designs where "readers" don't need to acquire a
lock
-- only writers. It seems like one way to deal with the main race condition I see that comes up would be to use loop.time(). Does that ring a bell, or might there be a much simpler way?
Thanks, --Chris
[1] https://docs.python.org/3/library/asyncio-sync.html _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
-- Thanks, Andrew Svetlov
Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
On Sun, Jun 25, 2017 at 3:38 PM, Yarko Tymciurak <yarkot1@gmail.com> wrote:
To be a well-behaved (capable of effective cooperation) task in such a system, you should guard against getting embroiled in potentially blocking I/O tasks whose latency you are not able to control (within facilities available in a cooperative multitasking context). The raises a couple of questions: to be well-behaved, simple control flow is desireable (i.e. not nested layers of yields, except perhaps for a pipeline case); and "read/write" control from memory space w/in the process (since external I/O is generally not for async) begs the question: what for? Eliminate globals, encapsulate and limit access as needed through usual programming methods.
Before anyone takes this paragraph too seriously, there seem to be a bunch of misunderstandings underlying this paragraph. - *All* blocking I/O is wrong in an async task, regardless of whether you can control its latency. (The only safe way to do I/O is using a primitive that works with `await`.) - There's nothing wrong with `yield` itself. (You shouldn't do I/O in a generator used in an async task -- but that's just due to the general ban on I/O.) - Using async tasks don't make globals more risky than regular code (in fact they are safer here than in traditional multi-threaded code). - What on earth is "read/write" control from memory space w/in the process? -- --Guido van Rossum (python.org/~guido)
On Sun, Jun 25, 2017 at 10:33 PM, Guido van Rossum <gvanrossum@gmail.com> wrote:
On Sun, Jun 25, 2017 at 3:38 PM, Yarko Tymciurak <yarkot1@gmail.com> wrote:
To be a well-behaved (capable of effective cooperation) task in such a system, you should guard against getting embroiled in potentially blocking I/O tasks whose latency you are not able to control (within facilities available in a cooperative multitasking context). The raises a couple of questions: to be well-behaved, simple control flow is desireable (i.e. not nested layers of yields, except perhaps for a pipeline case); and "read/write" control from memory space w/in the process (since external I/O is generally not for async) begs the question: what for? Eliminate globals, encapsulate and limit access as needed through usual programming methods.
Before anyone takes this paragraph too seriously, there seem to be a bunch of misunderstandings underlying this paragraph.
yes - thanks for the clarifications... I'm speaking from the perspective of an ECE, and thinking in the small-scale (embedded) of things like when in general is cooperative multitasking (very light-weight) more performant than pre-emptive... so from that space:
- *All* blocking I/O is wrong in an async task, regardless of whether you can control its latency. (The only safe way to do I/O is using a primitive that works with `await`.)
- There's nothing wrong with `yield` itself. (You shouldn't do I/O in a generator used in an async task -- but that's just due to the general ban on I/O.)
- Using async tasks don't make globals more risky than regular code (in fact they are safer here than in traditional multi-threaded code).
- What on earth is "read/write" control from memory space w/in the process?
-- --Guido van Rossum (python.org/~guido)
On Sun, Jun 25, 2017 at 10:46 PM, Yarko Tymciurak <yarkot1@gmail.com> wrote:
On Sun, Jun 25, 2017 at 10:33 PM, Guido van Rossum <gvanrossum@gmail.com> wrote:
On Sun, Jun 25, 2017 at 3:38 PM, Yarko Tymciurak <yarkot1@gmail.com> wrote:
To be a well-behaved (capable of effective cooperation) task in such a system, you should guard against getting embroiled in potentially blocking I/O tasks whose latency you are not able to control (within facilities available in a cooperative multitasking context). The raises a couple of questions: to be well-behaved, simple control flow is desireable (i.e. not nested layers of yields, except perhaps for a pipeline case); and "read/write" control from memory space w/in the process (since external I/O is generally not for async) begs the question: what for? Eliminate globals, encapsulate and limit access as needed through usual programming methods.
Before anyone takes this paragraph too seriously, there seem to be a bunch of misunderstandings underlying this paragraph.
yes - thanks for the clarifications... I'm speaking from the perspective of an ECE, and thinking in the small-scale (embedded) of things like when in general is cooperative multitasking (very light-weight) more performant than pre-emptive... so from that space:
- *All* blocking I/O is wrong in an async task, regardless of whether you can control its latency. (The only safe way to do I/O is using a primitive that works with `await`.)
yes, and from ECE perspective the only I/O is "local" device (e.g. RAM, which itself has rather deterministic setup and write times...), etc. my more general point (sorry - should have made it explicit) is that if you call a library routine, you may not expect it's calling external I/O, so that requires either care (or defensively guarding against it, e.g. with timers ... another story). This in particular is an error which I saw in OpenStack swift project - they depended on fast local storage device I/O. Except when devices started failing. Then they mistakenly assumed this was python's fault - missing the programming error of doing async (gevent - but same issue) I/O (which might be ok, within limits, but was not guarded against - was done in an unreliable way). So - whether intentionally doing such "risky" but seemingly reliable and "ok" I/O and failing to put in place guards, as must be in cooperative multitasking, or if you just get surprised that some library you thought was inoccuous is somewhere doing some surprise I/O (logging? anything...).... in cooperative multi-tasking, you can get away with some things, but it is _all_ your responsibility to guard against problems. That was my point here.
- There's nothing wrong with `yield` itself. (You shouldn't do I/O in a generator used in an async task -- but that's just due to the general ban on I/O.)
Yes; as above. But I'm calling local variables (strictly speaking) I/O too. And you might consider REDIS as "to RAM, so how different is that?" --- well, it's through another process, and ... up to a preemptive scheduler, and all sorts of things. So, sure - you _can_ do it, if you put in guards. But don't. Or at least, have very specific good reasons, and understand the coding cost of trying to do so. In other words - don't.
- Using async tasks don't make globals more risky than regular code (in fact they are safer here than in traditional multi-threaded code).
- What on earth is "read/write" control from memory space w/in the process?
Sorry - these last two were a bit of a joke on my part. The silly: only valid I/O is to variables. But you don't need that, because you have normal variable scoping/encapsulation rules. So (I suppose my joke continued), the only reason to have "read/write controls left is against (!) global variables. Answer - don't; and you don'' need R/W controls, because you have normal encapsulation controls of variables from the language. So - in cooperative multitasking, my argument goes, there can be (!) no reasonable motivation for R/W controls. -- Yarko
-- --Guido van Rossum (python.org/~guido)
On Sun, Jun 25, 2017 at 2:13 PM, Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
I'm relatively new to async programming in Python and am thinking through possibilities for doing "read-write" synchronization.
I'm using asyncio, and the synchronization primitives that asyncio exposes are relatively simple [1]. Have options for async read-write synchronization already been discussed in any detail?
As a general comment: I used to think rwlocks were a simple extension to regular locks, but it turns out there's actually this huge increase in design complexity. Do you want your lock to be read-biased, write-biased, task-fair, phase-fair? Can you acquire a write lock if you already hold one (i.e., are write locks reentrant)? What about acquiring a read lock if you already hold the write lock? Can you atomically upgrade/downgrade a lock? This makes it much harder to come up with a one-size-fits-all design suitable for adding to something like the python stdlib. -n -- Nathaniel J. Smith -- https://vorpus.org
On Sun, Jun 25, 2017 at 3:09 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Sun, Jun 25, 2017 at 2:13 PM, Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
I'm using asyncio, and the synchronization primitives that asyncio exposes are relatively simple [1]. Have options for async read-write synchronization already been discussed in any detail?
As a general comment: I used to think rwlocks were a simple extension to regular locks, but it turns out there's actually this huge increase in design complexity. Do you want your lock to be read-biased, write-biased, task-fair, phase-fair? Can you acquire a write lock if you already hold one (i.e., are write locks reentrant)? What about acquiring a read lock if you already hold the write lock? Can you atomically upgrade/downgrade a lock? This makes it much harder to come up with a one-size-fits-all design suitable for adding to something like the python stdlib.
I agree. And my point about asyncio's primitives wasn't a criticism or request that more be added. I was asking more if there has been any discussion of general approaches and patterns that take advantage of the event loop's single thread, etc. Maybe what I'll do is briefly write up the approach I have in mind, and people can let me know if I'm on the right track. :) --Chris
So here's one approach I'm thinking about for implementing readers-writer synchronization. Does this seem reasonable as a starting point, or am I missing something much simpler? I know there are various things you can prioritize for (readers vs. writers, etc), but I'm less concerned about those for now. The global state is-- * reader_count: an integer count of the active (reading) readers * writer_lock: an asyncio Lock object * no_readers_event: an asyncio Event object signaling no active readers * no_writer_event: an asyncio Event object signaling no active writer Untested pseudo-code for a writer-- async with writer_lock: no_writer_event.clear() # Wait for the readers to finish. await no_readers_event.wait() # Do the write. await write() # Awaken waiting readers. no_writer_event.set() Untested pseudo-code for a reader-- while True: await no_writer_event.wait() # Check the writer_lock again in case a new writer has # started writing. if not writer_lock.locked(): # Then we can do the read. break reader_count += 1 if reader_count == 1: no_readers_event.clear() # Do the read. await read() reader_count -= 1 if reader_count == 0: # Awaken any waiting writer. no_readers_event.set() One thing I'm not clear about is when the writer_lock is released and the no_writer_event set, are there any guarantees about what coroutine will be awakened first -- a writer waiting on the lock or the readers waiting on the no_writer_event? Similarly, is there a way to avoid having to have readers check the writer_lock again when a reader waiting on no_writer_event is awakened? --Chris On Sun, Jun 25, 2017 at 3:27 PM, Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
On Sun, Jun 25, 2017 at 3:09 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Sun, Jun 25, 2017 at 2:13 PM, Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
I'm using asyncio, and the synchronization primitives that asyncio exposes are relatively simple [1]. Have options for async read-write synchronization already been discussed in any detail?
As a general comment: I used to think rwlocks were a simple extension to regular locks, but it turns out there's actually this huge increase in design complexity. Do you want your lock to be read-biased, write-biased, task-fair, phase-fair? Can you acquire a write lock if you already hold one (i.e., are write locks reentrant)? What about acquiring a read lock if you already hold the write lock? Can you atomically upgrade/downgrade a lock? This makes it much harder to come up with a one-size-fits-all design suitable for adding to something like the python stdlib.
I agree. And my point about asyncio's primitives wasn't a criticism or request that more be added. I was asking more if there has been any discussion of general approaches and patterns that take advantage of the event loop's single thread, etc.
Maybe what I'll do is briefly write up the approach I have in mind, and people can let me know if I'm on the right track. :)
--Chris
Chris, this led to an interesting discussion, which then went pretty far from the original concern. Perhaps you can share your use-case, both as pseudo-code and a link to real code. I'm specifically interested to see why/where you'd like to use a read-write async lock, to evaluate if this is something common or specific, and if, perhaps, some other paradigm (like queue, worker pool, ...) may be more useful in general case. I'm also curious if a full set of async sync primitives may one day lead to async monitors. Granted, simple use of async monitor is really a future/promise, but perhaps there are complex use cases in the UI/react domain with its promise/stream dichotomy. Cheers, d. On 25 June 2017 at 23:13, Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
I'm relatively new to async programming in Python and am thinking through possibilities for doing "read-write" synchronization.
I'm using asyncio, and the synchronization primitives that asyncio exposes are relatively simple [1]. Have options for async read-write synchronization already been discussed in any detail?
I'm interested in designs where "readers" don't need to acquire a lock -- only writers. It seems like one way to deal with the main race condition I see that comes up would be to use loop.time(). Does that ring a bell, or might there be a much simpler way?
Thanks, --Chris
[1] https://docs.python.org/3/library/asyncio-sync.html _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
On Mon, Jun 26, 2017 at 1:43 AM, Dima Tisnek <dimaqq@gmail.com> wrote:
Perhaps you can share your use-case, both as pseudo-code and a link to real code.
I'm specifically interested to see why/where you'd like to use a read-write async lock, to evaluate if this is something common or specific, and if, perhaps, some other paradigm (like queue, worker pool, ...) may be more useful in general case.
I'm also curious if a full set of async sync primitives may one day lead to async monitors. Granted, simple use of async monitor is really a future/promise, but perhaps there are complex use cases in the UI/react domain with its promise/stream dichotomy.
Thank you, Dima. In my last email I shared pseudo-code for an approach to read-write synchronization that is independent of use case. [1] For the use case, my original purpose in mind was to synchronize many small file operations on disk like creating and removing directories that possibly share intermediate segments. The real code isn't public. But these would be operations like os.makedirs() and os.removedirs() that would be wrapped by loop.run_in_executor() to be non-blocking. The directory removal using os.removedirs() is the operation I thought should require exclusive access, so as not to interfere with directory creations in progress. Perhaps a simpler, dirtier approach would be not to synchronize at all and simply retry directory creations that fail until they succeed. That could be enough to handle rare cases where simultaneous creation and removal causes an error. You could view this an EAFP approach. Either way, I think the process of thinking through patterns for read-write synchronization is helpful for getting a better general feel and understanding of async. --Chris
Cheers, d.
On 25 June 2017 at 23:13, Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
I'm relatively new to async programming in Python and am thinking through possibilities for doing "read-write" synchronization.
I'm using asyncio, and the synchronization primitives that asyncio exposes are relatively simple [1]. Have options for async read-write synchronization already been discussed in any detail?
I'm interested in designs where "readers" don't need to acquire a lock -- only writers. It seems like one way to deal with the main race condition I see that comes up would be to use loop.time(). Does that ring a bell, or might there be a much simpler way?
Thanks, --Chris
[1] https://docs.python.org/3/library/asyncio-sync.html _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
A little epiphany on my part: In threaded world, a lock (etc.) can be used for 2 distinct purposes: *1 synchronise [access to resource in the] library implementation, and *2 synchronise users of a library It's easy since taken lock has an owner (thread). Both library and user stack frames belong to either this thread or some other. In the async world, users are opaque to library implementation (technically own async threads). Therefore only use case #1 is valid. Moreover, it occurs to me that lock/unlock pair must be confined to same async function. Going beyond that restriction is bug-prone like crazy (even for me). Chris, coming back to your use-case. Do you want to synchronise side-effect creation/deletion for the sanity of side-effects only? Or do you imply that callers' actions are synchronised too? In other words, do your callers use those directories out of band? P.S./O.T. when it comes to directories, you probably want hierarchical locks rather than RW. On 26 June 2017 at 11:28, Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
On Mon, Jun 26, 2017 at 1:43 AM, Dima Tisnek <dimaqq@gmail.com> wrote:
Perhaps you can share your use-case, both as pseudo-code and a link to real code.
I'm specifically interested to see why/where you'd like to use a read-write async lock, to evaluate if this is something common or specific, and if, perhaps, some other paradigm (like queue, worker pool, ...) may be more useful in general case.
I'm also curious if a full set of async sync primitives may one day lead to async monitors. Granted, simple use of async monitor is really a future/promise, but perhaps there are complex use cases in the UI/react domain with its promise/stream dichotomy.
Thank you, Dima. In my last email I shared pseudo-code for an approach to read-write synchronization that is independent of use case. [1]
For the use case, my original purpose in mind was to synchronize many small file operations on disk like creating and removing directories that possibly share intermediate segments. The real code isn't public. But these would be operations like os.makedirs() and os.removedirs() that would be wrapped by loop.run_in_executor() to be non-blocking. The directory removal using os.removedirs() is the operation I thought should require exclusive access, so as not to interfere with directory creations in progress.
Perhaps a simpler, dirtier approach would be not to synchronize at all and simply retry directory creations that fail until they succeed. That could be enough to handle rare cases where simultaneous creation and removal causes an error. You could view this an EAFP approach.
Either way, I think the process of thinking through patterns for read-write synchronization is helpful for getting a better general feel and understanding of async.
--Chris
Cheers, d.
On 25 June 2017 at 23:13, Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
I'm relatively new to async programming in Python and am thinking through possibilities for doing "read-write" synchronization.
I'm using asyncio, and the synchronization primitives that asyncio exposes are relatively simple [1]. Have options for async read-write synchronization already been discussed in any detail?
I'm interested in designs where "readers" don't need to acquire a lock -- only writers. It seems like one way to deal with the main race condition I see that comes up would be to use loop.time(). Does that ring a bell, or might there be a much simpler way?
Thanks, --Chris
[1] https://docs.python.org/3/library/asyncio-sync.html _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
On Mon, Jun 26, 2017 at 10:02 AM, Dima Tisnek <dimaqq@gmail.com> wrote:
Chris, coming back to your use-case. Do you want to synchronise side-effect creation/deletion for the sanity of side-effects only? Or do you imply that callers' actions are synchronised too? In other words, do your callers use those directories out of band?
If I understand your question, the former. The callers aren't / need not be synchronized, and they aren't aware of the underlying synchronization happening inside the higher-level create() and delete() functions they would be using. (These are the two higher-level functions described in my pseudocode.) The synchronization is needed inside these create() and delete() functions since the low-level directory operations occur in different threads (because they are wrapped by run_in_executor()). --Chris
P.S./O.T. when it comes to directories, you probably want hierarchical locks rather than RW.
On 26 June 2017 at 11:28, Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
On Mon, Jun 26, 2017 at 1:43 AM, Dima Tisnek <dimaqq@gmail.com> wrote:
Perhaps you can share your use-case, both as pseudo-code and a link to real code.
I'm specifically interested to see why/where you'd like to use a read-write async lock, to evaluate if this is something common or specific, and if, perhaps, some other paradigm (like queue, worker pool, ...) may be more useful in general case.
I'm also curious if a full set of async sync primitives may one day lead to async monitors. Granted, simple use of async monitor is really a future/promise, but perhaps there are complex use cases in the UI/react domain with its promise/stream dichotomy.
Thank you, Dima. In my last email I shared pseudo-code for an approach to read-write synchronization that is independent of use case. [1]
For the use case, my original purpose in mind was to synchronize many small file operations on disk like creating and removing directories that possibly share intermediate segments. The real code isn't public. But these would be operations like os.makedirs() and os.removedirs() that would be wrapped by loop.run_in_executor() to be non-blocking. The directory removal using os.removedirs() is the operation I thought should require exclusive access, so as not to interfere with directory creations in progress.
Perhaps a simpler, dirtier approach would be not to synchronize at all and simply retry directory creations that fail until they succeed. That could be enough to handle rare cases where simultaneous creation and removal causes an error. You could view this an EAFP approach.
Either way, I think the process of thinking through patterns for read-write synchronization is helpful for getting a better general feel and understanding of async.
--Chris
Cheers, d.
On 25 June 2017 at 23:13, Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
I'm relatively new to async programming in Python and am thinking through possibilities for doing "read-write" synchronization.
I'm using asyncio, and the synchronization primitives that asyncio exposes are relatively simple [1]. Have options for async read-write synchronization already been discussed in any detail?
I'm interested in designs where "readers" don't need to acquire a lock -- only writers. It seems like one way to deal with the main race condition I see that comes up would be to use loop.time(). Does that ring a bell, or might there be a much simpler way?
Thanks, --Chris
[1] https://docs.python.org/3/library/asyncio-sync.html _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
Chris, here's a simple RWLock implementation and analysis: ``` import asyncio class RWLock: def __init__(self): self.cond = asyncio.Condition() self.readers = 0 self.writer = False async def lock(self, write=False): async with self.cond: # write requested: there cannot be readers or writers # read requested: there can be other readers but not writers while self.readers and write or self.writer: self.cond.wait() if write: self.writer = True else: self.readers += 1 # self.cond.notifyAll() would be good taste # however no waiters can be unblocked by this state change async def unlock(self, write=False): async with self.cond: if write: self.writer = False else: self.readers -= 1 self.cond.notifyAll() # notify (one) could be used `if not write:` ``` Note that `.unlock` cannot validate that it's called by same coroutine as `.lock` was. That's because there's no concept for "current_thread" for coroutines -- there can be many waiting on each other in the stack. Obv., this code could be nicer: * separate context managers for read and write cases * .unlock can be automatic (if self.writer: unlock_for_write()) at the cost of opening doors wide open to bugs * policy can be introduced if `.lock` identified itself (by an object(), since there's no thread id) in shared state * notifyAll() makes real life use O(N^2) for N being number of simultaneous write lock requests Feel free to use it :) On 26 June 2017 at 20:21, Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
On Mon, Jun 26, 2017 at 10:02 AM, Dima Tisnek <dimaqq@gmail.com> wrote:
Chris, coming back to your use-case. Do you want to synchronise side-effect creation/deletion for the sanity of side-effects only? Or do you imply that callers' actions are synchronised too? In other words, do your callers use those directories out of band?
If I understand your question, the former. The callers aren't / need not be synchronized, and they aren't aware of the underlying synchronization happening inside the higher-level create() and delete() functions they would be using. (These are the two higher-level functions described in my pseudocode.)
The synchronization is needed inside these create() and delete() functions since the low-level directory operations occur in different threads (because they are wrapped by run_in_executor()).
--Chris
P.S./O.T. when it comes to directories, you probably want hierarchical locks rather than RW.
On 26 June 2017 at 11:28, Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
On Mon, Jun 26, 2017 at 1:43 AM, Dima Tisnek <dimaqq@gmail.com> wrote:
Perhaps you can share your use-case, both as pseudo-code and a link to real code.
I'm specifically interested to see why/where you'd like to use a read-write async lock, to evaluate if this is something common or specific, and if, perhaps, some other paradigm (like queue, worker pool, ...) may be more useful in general case.
I'm also curious if a full set of async sync primitives may one day lead to async monitors. Granted, simple use of async monitor is really a future/promise, but perhaps there are complex use cases in the UI/react domain with its promise/stream dichotomy.
Thank you, Dima. In my last email I shared pseudo-code for an approach to read-write synchronization that is independent of use case. [1]
For the use case, my original purpose in mind was to synchronize many small file operations on disk like creating and removing directories that possibly share intermediate segments. The real code isn't public. But these would be operations like os.makedirs() and os.removedirs() that would be wrapped by loop.run_in_executor() to be non-blocking. The directory removal using os.removedirs() is the operation I thought should require exclusive access, so as not to interfere with directory creations in progress.
Perhaps a simpler, dirtier approach would be not to synchronize at all and simply retry directory creations that fail until they succeed. That could be enough to handle rare cases where simultaneous creation and removal causes an error. You could view this an EAFP approach.
Either way, I think the process of thinking through patterns for read-write synchronization is helpful for getting a better general feel and understanding of async.
--Chris
Cheers, d.
On 25 June 2017 at 23:13, Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
I'm relatively new to async programming in Python and am thinking through possibilities for doing "read-write" synchronization.
I'm using asyncio, and the synchronization primitives that asyncio exposes are relatively simple [1]. Have options for async read-write synchronization already been discussed in any detail?
I'm interested in designs where "readers" don't need to acquire a lock -- only writers. It seems like one way to deal with the main race condition I see that comes up would be to use loop.time(). Does that ring a bell, or might there be a much simpler way?
Thanks, --Chris
[1] https://docs.python.org/3/library/asyncio-sync.html _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
- self.cond.wait() + await self.cond.wait() I've no tests for this :P On 26 June 2017 at 21:37, Dima Tisnek <dimaqq@gmail.com> wrote:
Chris, here's a simple RWLock implementation and analysis:
``` import asyncio
class RWLock: def __init__(self): self.cond = asyncio.Condition() self.readers = 0 self.writer = False
async def lock(self, write=False): async with self.cond: # write requested: there cannot be readers or writers # read requested: there can be other readers but not writers while self.readers and write or self.writer: self.cond.wait() if write: self.writer = True else: self.readers += 1 # self.cond.notifyAll() would be good taste # however no waiters can be unblocked by this state change
async def unlock(self, write=False): async with self.cond: if write: self.writer = False else: self.readers -= 1 self.cond.notifyAll() # notify (one) could be used `if not write:` ```
Note that `.unlock` cannot validate that it's called by same coroutine as `.lock` was. That's because there's no concept for "current_thread" for coroutines -- there can be many waiting on each other in the stack.
Obv., this code could be nicer: * separate context managers for read and write cases * .unlock can be automatic (if self.writer: unlock_for_write()) at the cost of opening doors wide open to bugs * policy can be introduced if `.lock` identified itself (by an object(), since there's no thread id) in shared state * notifyAll() makes real life use O(N^2) for N being number of simultaneous write lock requests
Feel free to use it :)
On 26 June 2017 at 20:21, Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
On Mon, Jun 26, 2017 at 10:02 AM, Dima Tisnek <dimaqq@gmail.com> wrote:
Chris, coming back to your use-case. Do you want to synchronise side-effect creation/deletion for the sanity of side-effects only? Or do you imply that callers' actions are synchronised too? In other words, do your callers use those directories out of band?
If I understand your question, the former. The callers aren't / need not be synchronized, and they aren't aware of the underlying synchronization happening inside the higher-level create() and delete() functions they would be using. (These are the two higher-level functions described in my pseudocode.)
The synchronization is needed inside these create() and delete() functions since the low-level directory operations occur in different threads (because they are wrapped by run_in_executor()).
--Chris
P.S./O.T. when it comes to directories, you probably want hierarchical locks rather than RW.
On 26 June 2017 at 11:28, Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
On Mon, Jun 26, 2017 at 1:43 AM, Dima Tisnek <dimaqq@gmail.com> wrote:
Perhaps you can share your use-case, both as pseudo-code and a link to real code.
I'm specifically interested to see why/where you'd like to use a read-write async lock, to evaluate if this is something common or specific, and if, perhaps, some other paradigm (like queue, worker pool, ...) may be more useful in general case.
I'm also curious if a full set of async sync primitives may one day lead to async monitors. Granted, simple use of async monitor is really a future/promise, but perhaps there are complex use cases in the UI/react domain with its promise/stream dichotomy.
Thank you, Dima. In my last email I shared pseudo-code for an approach to read-write synchronization that is independent of use case. [1]
For the use case, my original purpose in mind was to synchronize many small file operations on disk like creating and removing directories that possibly share intermediate segments. The real code isn't public. But these would be operations like os.makedirs() and os.removedirs() that would be wrapped by loop.run_in_executor() to be non-blocking. The directory removal using os.removedirs() is the operation I thought should require exclusive access, so as not to interfere with directory creations in progress.
Perhaps a simpler, dirtier approach would be not to synchronize at all and simply retry directory creations that fail until they succeed. That could be enough to handle rare cases where simultaneous creation and removal causes an error. You could view this an EAFP approach.
Either way, I think the process of thinking through patterns for read-write synchronization is helpful for getting a better general feel and understanding of async.
--Chris
Cheers, d.
On 25 June 2017 at 23:13, Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
I'm relatively new to async programming in Python and am thinking through possibilities for doing "read-write" synchronization.
I'm using asyncio, and the synchronization primitives that asyncio exposes are relatively simple [1]. Have options for async read-write synchronization already been discussed in any detail?
I'm interested in designs where "readers" don't need to acquire a lock -- only writers. It seems like one way to deal with the main race condition I see that comes up would be to use loop.time(). Does that ring a bell, or might there be a much simpler way?
Thanks, --Chris
[1] https://docs.python.org/3/library/asyncio-sync.html _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
On Mon, Jun 26, 2017 at 12:37 PM, Dima Tisnek <dimaqq@gmail.com> wrote:
Note that `.unlock` cannot validate that it's called by same coroutine as `.lock` was. That's because there's no concept for "current_thread" for coroutines -- there can be many waiting on each other in the stack.
This is also a surprisingly complex design question. Your async RWLock actually matches how Python's threading.Lock works: you're explicitly allowed to acquire it in one thread and then release it from another. People sometimes find this surprising, and it prevents some kinds of error-checking. For example, this code *probably* deadlocks: lock = threading.Lock() lock.acquire() # probably deadlocks lock.acquire() but the interpreter can't detect this and raise an error, because in theory some other thread might come along and call lock.release(). On the other hand, it is sometimes useful to be able to acquire a lock in one thread and then "hand it off" to e.g. a child thread. (Reentrant locks, OTOH, do have an implicit concept of ownership -- they kind of have to, if you think about it -- so even if you don't need reentrancy they can be useful because they'll raise a noisy error if you accidentally try to release a lock from the wrong thread.) In trio we do have a current_task() concept, and the basic trio.Lock [1] does track ownership, and I even have a Semaphore-equivalent that tracks ownership as well [2]. The motivation here is that I want to provide nice debugging tools to detect things like deadlocks, which is only possible when your primitives have some kind of ownership tracking. So far this just means that we detect and error on these kinds of simple cases: lock = trio.Lock() await lock.acquire() # raises an error await lock.acquire() But I have ambitions to do more [3] :-). However, this raises some tricky design questions around how and whether to support the "passing ownership" cases. Of course you can always fall back on something like a raw Semaphore, but it turns out that trio.run_in_worker_thread (our equivalent of asyncio's run_in_executor) actually wants to do something like pass ownership from the calling task into the spawned thread. So far I've handled this by adding acquire_on_behalf_of/release_on_behalf_of methods to the primitive that run_in_worker_thread uses, but this isn't really fully baked yet. -n [1] https://trio.readthedocs.io/en/latest/reference-core.html#trio.Lock [2] https://trio.readthedocs.io/en/latest/reference-core.html#trio.CapacityLimit... [3] https://github.com/python-trio/trio/issues/182 -- Nathaniel J. Smith -- https://vorpus.org
On Mon, Jun 26, 2017 at 12:37 PM, Dima Tisnek <dimaqq@gmail.com> wrote:
Chris, here's a simple RWLock implementation and analysis: ... Obv., this code could be nicer: * separate context managers for read and write cases * .unlock can be automatic (if self.writer: unlock_for_write()) at the cost of opening doors wide open to bugs * policy can be introduced if `.lock` identified itself (by an object(), since there's no thread id) in shared state * notifyAll() makes real life use O(N^2) for N being number of simultaneous write lock requests
Feel free to use it :)
Thanks, Dima. However, as I said in my earlier posts, I'm actually more interested in exploring approaches to synchronizing readers and writers in async code that don't require locking on reads. (This is also why I've always been saying RW "synchronization" instead of RW "locking.") I'm interested in this because I think the single-threadedness of the event loop might be what makes this simplification possible over the traditional multi-threaded approach (along the lines Guido was mentioning). It also makes the "fast path" faster. Lastly, the API for the callers is just to call read() or write(), so there is no need for a general RWLock construct or to work through RWLock semantics of the sort Nathaniel mentioned. I coded up a working version of the pseudo-code I included in an earlier email so people can see how it works. I included it at the bottom of this email and also in this gist: https://gist.github.com/cjerdonek/858e1467f768ee045849ea81ddb47901 --Chris import asyncio import random NO_READERS_EVENT = asyncio.Event() NO_WRITERS_EVENT = asyncio.Event() WRITE_LOCK = asyncio.Lock() class State: reader_count = 0 mock_file_data = 'initial' async def read_file(): data = State.mock_file_data print(f'read: {data}') async def write_file(data): print(f'writing: {data}') State.mock_file_data = data await asyncio.sleep(0.5) async def write(data): async with WRITE_LOCK: NO_WRITERS_EVENT.clear() # Wait for the readers to finish. await NO_READERS_EVENT.wait() # Do the file write. await write_file(data) # Awaken waiting readers. NO_WRITERS_EVENT.set() async def read(): while True: await NO_WRITERS_EVENT.wait() # Check the writer_lock again in case a new writer has # started writing. if WRITE_LOCK.locked(): print(f'cannot read: still writing: {State.mock_file_data!r}') else: # Otherwise, we can do the read. break State.reader_count += 1 if State.reader_count == 1: NO_READERS_EVENT.clear() # Do the file read. await read_file() State.reader_count -= 1 if State.reader_count == 0: # Awaken any waiting writer. NO_READERS_EVENT.set() async def delayed(coro): await asyncio.sleep(random.random()) await coro async def test_synchronization(): NO_READERS_EVENT.set() NO_WRITERS_EVENT.set() coros = [ read(), read(), read(), read(), read(), read(), write('apple'), write('banana'), ] # Add a delay before each coroutine for variety. coros = [delayed(coro) for coro in coros] await asyncio.gather(*coros) if __name__ == '__main__': asyncio.get_event_loop().run_until_complete(test_synchronization()) # Sample output: # # read: initial # read: initial # read: initial # read: initial # writing: banana # writing: apple # cannot read: still writing: 'apple' # cannot read: still writing: 'apple' # read: apple # read: apple
On Mon, Jun 26, 2017 at 6:41 PM, Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
On Mon, Jun 26, 2017 at 12:37 PM, Dima Tisnek <dimaqq@gmail.com> wrote:
Chris, here's a simple RWLock implementation and analysis: ... Obv., this code could be nicer: * separate context managers for read and write cases * .unlock can be automatic (if self.writer: unlock_for_write()) at the cost of opening doors wide open to bugs * policy can be introduced if `.lock` identified itself (by an object(), since there's no thread id) in shared state * notifyAll() makes real life use O(N^2) for N being number of simultaneous write lock requests
Feel free to use it :)
Thanks, Dima. However, as I said in my earlier posts, I'm actually more interested in exploring approaches to synchronizing readers and writers in async code that don't require locking on reads. (This is also why I've always been saying RW "synchronization" instead of RW "locking.")
I'm interested in this because I think the single-threadedness of the event loop might be what makes this simplification possible over the traditional multi-threaded approach (along the lines Guido was mentioning). It also makes the "fast path" faster. Lastly, the API for the callers is just to call read() or write(), so there is no need for a general RWLock construct or to work through RWLock semantics of the sort Nathaniel mentioned.
I coded up a working version of the pseudo-code I included in an earlier email so people can see how it works. I included it at the bottom of this email and also in this gist: https://gist.github.com/cjerdonek/858e1467f768ee045849ea81ddb47901
FWIW, to me this just looks like an implementation of an async RWLock? It's common for async synchronization primitives to be simpler internally than threading primitives because the async ones don't need to worry about being pre-empted at arbitrary points, but from the caller's point of view you still have basically a blocking acquire() method, and then you do your stuff (potentially blocking while you're at it), and then you call a non-blocking release(), just like every other async lock. -n -- Nathaniel J. Smith -- https://vorpus.org
On Tue, Jun 27, 2017 at 3:52 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Mon, Jun 26, 2017 at 6:41 PM, Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
I coded up a working version of the pseudo-code I included in an earlier email so people can see how it works. I included it at the bottom of this email and also in this gist: https://gist.github.com/cjerdonek/858e1467f768ee045849ea81ddb47901
FWIW, to me this just looks like an implementation of an async RWLock? It's common for async synchronization primitives to be simpler internally than threading primitives because the async ones don't need to worry about being pre-empted at arbitrary points, but from the caller's point of view you still have basically a blocking acquire() method, and then you do your stuff (potentially blocking while you're at it), and then you call a non-blocking release(), just like every other async lock.
Yes and no I think. Internally, the implementation does just amount to applying an async RWLock. But the difference I was getting at is that the use case doesn't require exposing the RWLock in the API (e.g. underlying acquire() and release() methods). This means you can avoid having to think about some of the tricky design questions you started discussing in an earlier email of yours:
This is also a surprisingly complex design question. Your async RWLock actually matches how Python's threading.Lock works: you're explicitly allowed to acquire it in one thread and then release it from another. People sometimes find this surprising, and it prevents some kinds of error-checking. For example, this code *probably* deadlocks: ...
So my point was just that if the API is narrowed to exposing only "read" and "write" operations (to support the easier task of synchronizing reads and writes) and the RWLock kept private, you can avoid having to think through and support full-blown RWLock design and use cases, like with issues around passing ownership, etc. The API restricts how the RWLock is ever used, so it needn't be a complete RWLock implementation. --Chris
participants (6)
-
Andrew Svetlov
-
Chris Jerdonek
-
Dima Tisnek
-
Guido van Rossum
-
Nathaniel Smith
-
Yarko Tymciurak