data:image/s3,"s3://crabby-images/33250/33250af20922a831c31f7ef0da1e3e089214cd2b" alt=""
Here is the draft PEP for the ideas posted here. Regards, Thomas -------- PEP: xxx Title: The Safe Buffer Interface Version: $Revision: $ Last-Modified: $Date: 2002/07/26 14:19:38 $ Author: theller@python.net (Thomas Heller) Status: Draft Type: Standards Track Created: 26-Jul-2002 Python-Version: 2.3 Post-History: 26-Jul-2002 Abstract This PEP proposes an extension to the buffer interface called the 'safe buffer interface'. The safe buffer interface fixes the flaws of the 'old' buffer interface as defined in Python versions up to and including 2.2: The lifetime of the retrieved pointer is clearly defined. The buffer size is returned as a 'size_t' data type, which allows access to 'large' buffers on platforms where sizeof(int) != sizeof(void *). Specification The 'safe' buffer interface exposes new functions which return the size and the pointer to the internal memory block of any python object which chooses to implement this interface. The size and pointer returned must be valid as long as the object is alive (has a positive reference count). So, only objects which never reallocate or resize the memory block are allowed to implement this interface. The safe buffer interface ommits the memory segment model which is present in the old buffer interface - only a single memory block can be exposed. Implementation Define a new flag in Include/object.h: #define Py_TPFLAGS_HAVE_GETSAFEBUFFER /* PyBufferProcs contains bf_getsafereadbuffer and bf_getsafewritebuffer */ #define Py_TPFLAGS_HAVE_GETSAFEBUFFER (1L<<15) This flag would be included in Py_TPFLAGS_DEFAULT: #define Py_TPFLAGS_DEFAULT ( \ .... Py_TPFLAGS_HAVE_GETCHARBUFFER | \ .... 0) Extend the PyBufferProcs structure by new fields in Include/object.h: typedef size_t (*getlargereadbufferproc)(PyObject *, void **); typedef size_t (*getlargewritebufferproc)(PyObject *, void **); typedef struct { getreadbufferproc bf_getreadbuffer; getwritebufferproc bf_getwritebuffer; getsegcountproc bf_getsegcount; getcharbufferproc bf_getcharbuffer; /* safe buffer interface functions */ getsafereadbufferproc bf_getsafereadbufferproc; getsafewritebufferproc bf_getsafewritebufferproc; } PyBufferProcs; The new fields are present if the Py_TPFLAGS_HAVE_GETLARGEBUFFER flag is set in the object's type. XXX Py_TPFLAGS_HAVE_GETLARGEBUFFER implies the Py_TPFLAGS_HAVE_GETCHARBUFFER flag. The getsafereadbufferproc and getsafewritebufferproc functions return the size in bytes of the memory block on success, and fill in the passed void * pointer on success. If these functions fail - either because an error occurs or no memory block is exposed - they must set the void * pointer to NULL and raise an exception. The return value is undefined in these cases and should not be used. Backward Compatibility There are no backward compatibility problems. Reference Implementation Will be uploaded to the sourceforge patch manager by the author. Additional Notes/Comments It may be a good idea to expose the following convenience functions: int PyObject_AsSafeReadBuffer(PyObject *obj, void **buffer, size_t *buffer_len); int PyObject_AsSafeWriteBuffer(PyObject *obj, void **buffer, size_t *buffer_len); These functions return 0 on success, set buffer to the memory location and buffer_len to the length of the memory block in bytes. On failure, they return -1 and set an exception. Python strings, unicode strings, mmap objects, and maybe other types would expose the safe buffer interface, but the array type would *not*, because it's memory block may be reallocated during it's lifetime. References [1] The buffer interface http://mail.python.org/pipermail/python-dev/2000-October/009974.html [2] The Buffer Problem http://www.python.org/peps/pep-0296.html Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End:
data:image/s3,"s3://crabby-images/f0776/f07765b8fc6a5b69e717d7e4e22dc176ca8602e5" alt=""
--- Thomas Heller <thomas.heller@ion-tof.com> wrote:
Here is the draft PEP for the ideas posted here.
[...] I like it. :-)
typedef size_t (*getlargereadbufferproc)(PyObject *, void **); typedef size_t (*getlargewritebufferproc)(PyObject *, void **);
I'm sure this is a cut-and-pasto for typedef size_t (*getsafereadbufferproc)(PyObject *, void **); typedef size_t (*getsafewritebufferproc)(PyObject *, void **); __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com
data:image/s3,"s3://crabby-images/33250/33250af20922a831c31f7ef0da1e3e089214cd2b" alt=""
From: "Scott Gilbert" <xscottg@yahoo.com>
Here is the draft PEP for the ideas posted here.
[...]
I like it. :-) :-)
typedef size_t (*getlargereadbufferproc)(PyObject *, void **); typedef size_t (*getlargewritebufferproc)(PyObject *, void **);
I'm sure this is a cut-and-pasto for
typedef size_t (*getsafereadbufferproc)(PyObject *, void **); typedef size_t (*getsafewritebufferproc)(PyObject *, void **);
Exactly. Everything is named safebuffer instead of largebuffer. Thanks, Thomas
data:image/s3,"s3://crabby-images/54459/544597433b501b30460d150d60abaa21fda3d0ab" alt=""
Thomas Heller:
The size and pointer returned must be valid as long as the object is alive (has a positive reference count). So, only objects which never reallocate or resize the memory block are allowed to implement this interface.
I'd prefer an interface that allows for reallocation but has an explicit locked state during which the buffer must stay still. My motivation comes from the data structures implemented in Scintilla (an editor component), which could be exposed through this buffer interface to other code. The most important type in Scintilla (as in many editors) is a split (or gapped) buffer. Upon receiving a lock call, it could collapse the gap and return a stable pointer to its contents and then revert to its normal behaviour on receiving an unlock. Neil
data:image/s3,"s3://crabby-images/f0776/f07765b8fc6a5b69e717d7e4e22dc176ca8602e5" alt=""
--- Neil Hodgson <nhodgson@bigpond.net.au> wrote:
Thomas Heller:
The size and pointer returned must be valid as long as the object is alive (has a positive reference count). So, only objects which never reallocate or resize the memory block are allowed to implement this interface.
I'd prefer an interface that allows for reallocation but has an explicit locked state during which the buffer must stay still. My motivation comes from the data structures implemented in Scintilla (an editor component), which could be exposed through this buffer interface to other code. The most important type in Scintilla (as in many editors) is a split (or gapped) buffer. Upon receiving a lock call, it could collapse the gap and return a stable pointer to its contents and then revert to its normal behaviour on receiving an unlock.
A couple of questions come to mind: First, could this be implemented by a gapped_buffer object that implements the locking functionality you want, but that returns simple buffers to work with when the object is locked. In other words, do we need to add this extra functionality up in the core protocol when it can be implemented specifically the way Scintilla (cool editor by the way) wants it to be in the Scintilla specific extension. Second, if you are using mutexes to do this stuff, you'll have to be very careful about deadlock. I imagine: thread 1: grab the object lock grab the object pointer release the GIL do some work acquire the GIL # deadlock thread 2: acquire the GIL try to resize the object # requires no outstanding locks Thread 2 needs to make sure no objects are holding the object lock when it does the resize, but thread 1 can't acquire the GIL until thread 2 gives it up. Both are stuck. If you choose not to implement the locks with true mutexes, then you're probably going to end up polling and that's bad too. Is there a way out of this? This is part of the reason I didn't want to put a lock state into the bytes object. __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com
data:image/s3,"s3://crabby-images/54459/544597433b501b30460d150d60abaa21fda3d0ab" alt=""
Scott Gilbert:
First, could this be implemented by a gapped_buffer object that implements the locking functionality you want, but that returns simple buffers to work with when the object is locked. In other words, do we need to add this extra functionality up in the core protocol when it can be implemented specifically the way Scintilla (cool editor by the way) wants it to be in (Thanks) the Scintilla specific extension.
Would this mean that the explicit locking completely defines the validity of the address or is the address valid until the 'view' buffer object is garbage collected? I would like the gapped_buffer to be put back into gapped mode as soon as possible and depending on the lifetime of a view buffer object is not that robust in the face of alternate Python implementations that use non-reference-counted GC implementations (Jython / Python .Net).
Second, if you are using mutexes to do this stuff, you'll have to be very careful about deadlock.
By locking, I want to change state on the buffer from having a gap and allowing resizes to having a static size and address which will remain valid until an unlock. The lock and unlock are not treating the buffer as a mutex (I'd call the operations 'acquire' and 'release' then) although mutexes may be needed for safety in the lock and unlock implementations. It is likely that the lock and unlock would be counted (it can be locked twice and then won't be expandable until it is unlocked twice) and that exceptions would be thrown for length changing operations while locked. If you think my particular use is out of the scope of what you are trying to achieve then that is fine. Neil
data:image/s3,"s3://crabby-images/f0776/f07765b8fc6a5b69e717d7e4e22dc176ca8602e5" alt=""
--- Neil Hodgson <nhodgson@bigpond.net.au> wrote:
Would this mean that the explicit locking completely defines the validity of the address or is the address valid until the 'view' buffer object is garbage collected? I would like the gapped_buffer to be put back into gapped mode as soon as possible and depending on the lifetime of a view buffer object is not that robust in the face of alternate Python implementations that use non-reference-counted GC implementations (Jython / Python .Net).
If you're worried about exactly when the object is released, you could add a specific release() method to your object indicating that you don't intend to use it anymore. My point was that, with Thomas Heller's safe buffer protocol (or my bytes object), you would have a pointer that could be manipulated independently of the GIL, but that putting locking semantics into your gapped_buffer is something you could add on top without complicating the core. In other words, his PEP (or mine) allows you to do something you couldn't necessarily do previously, and it doesn't sound like there is anything you want to do that you won't be able to.
By locking, I want to change state on the buffer from having a gap and allowing resizes to having a static size and address which will remain valid until an unlock. The lock and unlock are not treating the buffer as a mutex (I'd call the operations 'acquire' and 'release' then) although mutexes may be needed for safety in the lock and unlock implementations. It is likely that the lock and unlock would be counted (it can be locked twice and then won't be expandable until it is unlocked twice) and that exceptions would be thrown for length changing operations while locked.
You could easily implement the a counting (recursive) mutex as described above, and it might be the case that throwing an exception on the length changing operations keeps the dead lock from occurring. I'm still a bit confused though. When thread A locks (acquires) the buffer, and thread B tries to do a resize and it generates an exception, what is thread B supposed to do next? I assume that the resize was due to something like the user typing somewhere in the buffer. From a user interface point of view, you can't just ignore their request to insert text. Would you just try the same operation again after catching the exception? How long would you wait?
If you think my particular use is out of the scope of what you are trying to achieve then that is fine.
It is definitely up to Thomas Heller to decide what he wants his scope to be, and I don't want to step on his toes at all. Especially since the reason for his PEP getting written is that I didn't want to add this stuff to mine. :-) I'm just trying to point out two things: 1) With his PEP, there is a way to get the behavior you desire with out adding the complexity to the core of Python. And with recursive/counting mutexes, the behavior you want is getting more complicated. The "safe buffer protocol" is likely to cater to a wide class of users. I could be wrong, but the "lockable gapped buffer protocol" probably appeals to a much smaller set. 2) Any time you go from one lock (mutex, GIL, semaphore) to multiple locks, you can introduce deadlock states. Without my understanding your design fully, your use case sounds to me like it either has the potential for deadlock, or the potential for polling. There are ways to avoid this of course, but then everyone has to follow a more complicated set of rules (for instance build a hierarchy describing the order of locks to acquire). Since Thomas's PEP doesn't introduce any new types of locks, it sidesteps these problems. Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com
data:image/s3,"s3://crabby-images/54459/544597433b501b30460d150d60abaa21fda3d0ab" alt=""
Scott Gilbert:
You could easily implement the a counting (recursive) mutex as described above, and it might be the case that throwing an exception on the length changing operations keeps the dead lock from occurring. I'm still a bit confused though.
Not as confused as I am. I don't think deadlocks or threads are that relevant to me. The most likely situations in which I would use the buffer interface is to perform large I/O operations without copying or when performing asynchronous I/O to load or save documents while continuing to run styling or linting tasks. I think its likely that the pieces of code accessing the buffer will not be real threads, but instead be cooperating contexts within a single-threaded UI framework so using semaphores will not be possible.
1) With his PEP, there is a way to get the behavior you desire with out adding the complexity to the core of Python. And with recursive/counting mutexes, the behavior you want is getting more complicated.
I don't want counting mutexes. I'm not defining behaviour that needs them.
The "safe buffer protocol" is likely to cater to a wide class of users. I could be wrong, but the "lockable gapped buffer protocol" probably appeals to a much smaller set.
Its not that a "lockable gapped buffer protocol" is needed. It is that the problem with the old buffer was that the lifetime of the pointer is not well defined. The proposal changes that by making the lifetime of the pointer be the same as the underlying object. This restricts the set of objects that can be buffers to statically sized objects. I'd prefer that dynamically resizable objects be able to be buffers.
2) Any time you go from one lock (mutex, GIL, semaphore) to multiple locks, you can introduce deadlock states.
My defined behaviour was "Upon receiving a lock call, it could collapse the gap and return a stable pointer to its contents and then revert to its normal behaviour on receiving an unlock". Where is a semaphore involved? Without a semaphore (or equivalent) there can be no deadlock. Neil
data:image/s3,"s3://crabby-images/33250/33250af20922a831c31f7ef0da1e3e089214cd2b" alt=""
[Scott]
The "safe buffer protocol" is likely to cater to a wide class of users. I could be wrong, but the "lockable gapped buffer protocol" probably appeals to a much smaller set.
[Neil]
Its not that a "lockable gapped buffer protocol" is needed. It is that the problem with the old buffer was that the lifetime of the pointer is not well defined. The proposal changes that by making the lifetime of the pointer be the same as the underlying object.
That's exactly what *I* need, ...
This restricts the set of objects that can be buffers to statically sized objects. I'd prefer that dynamically resizable objects be able to be buffers.
..., but I understand Neil's requirements. Can they be fulfilled by adding some kind of UnlockObject() call to the 'safe buffer interface', which should mean 'I won't use the pointer received by getsaferead/writebufferproc any more'? Thomas
data:image/s3,"s3://crabby-images/f0776/f07765b8fc6a5b69e717d7e4e22dc176ca8602e5" alt=""
--- Thomas Heller <thomas.heller@ion-tof.com> wrote:
This restricts the set of objects that can be buffers to statically sized objects. I'd prefer that dynamically resizable objects be able to be buffers.
..., but I understand Neil's requirements.
Can they be fulfilled by adding some kind of UnlockObject() call to the 'safe buffer interface', which should mean 'I won't use the pointer received by getsaferead/writebufferproc any more'?
I assume this means any call to getsafereadpointer()/getsafewritepointer() will increment the lock count. So the UnlockObject() calls will be mandatory. Either that, or you'll have an explicit LockObject() call as well. What behavior should happen when a resise is attempted while the lock count is positive? __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com
data:image/s3,"s3://crabby-images/33250/33250af20922a831c31f7ef0da1e3e089214cd2b" alt=""
From: "Scott Gilbert" <xscottg@yahoo.com>
--- Thomas Heller <thomas.heller@ion-tof.com> wrote:
This restricts the set of objects that can be buffers to statically sized objects. I'd prefer that dynamically resizable objects be able to be buffers.
..., but I understand Neil's requirements.
Can they be fulfilled by adding some kind of UnlockObject() call to the 'safe buffer interface', which should mean 'I won't use the pointer received by getsaferead/writebufferproc any more'?
I assume this means any call to getsafereadpointer()/getsafewritepointer() will increment the lock count. So the UnlockObject() calls will be mandatory. Either that, or you'll have an explicit LockObject() call as well. What behavior should happen when a resise is attempted while the lock count is positive?
This question is not difficult to answer;-) The resize should fail. That's the only possibility. If this can be handled robust enough by the object is another question. Probably this all is too complicated to be solved by the safe buffer interface, and it should be left out? Thomas
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
--- Thomas Heller <thomas.heller@ion-tof.com> wrote:
This restricts the set of objects that can be buffers to statically sized objects. I'd prefer that dynamically resizable objects be able to be buffers.
..., but I understand Neil's requirements.
Can they be fulfilled by adding some kind of UnlockObject() call to the 'safe buffer interface', which should mean 'I won't use the pointer received by getsaferead/writebufferproc any more'?
I assume this means any call to getsafereadpointer()/getsafewritepointer() will increment the lock count. So the UnlockObject() calls will be mandatory. Either that, or you'll have an explicit LockObject() call as well. What behavior should happen when a resise is attempted while the lock count is positive?
I don't like where this is going. Let's not add locking to the buffer protocol. If an object's buffer isn't allocated for the object's life when the object is created, it should not support the "safe" version of the protocol (maybe a different name would be better), and users should not release the GIL while using on to the pointer. (Exactly which other API calls are safe while using the pointer is not clear; probably nothing that could possibly invoke the Python interpreter recursively, since that might release the GIL. This would generally mean that calls to Py_DECREF() are unsafe while holding on to a buffer pointer!) --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/33250/33250af20922a831c31f7ef0da1e3e089214cd2b" alt=""
From: "Guido van Rossum" <guido@python.org>
If an object's buffer isn't allocated for the object's life when the object is created, it should not support the "safe" version of the protocol (maybe a different name would be better), and users should not release the GIL while using on to the pointer.
'Persistent' buffer interface? Too long? Thomas
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
If an object's buffer isn't allocated for the object's life when the object is created, it should not support the "safe" version of the protocol (maybe a different name would be better), and users should not release the GIL while using on to the pointer.
'Persistent' buffer interface? Too long?
No, persistent typically refers to things that survive longer than a process. Maybe 'static' buffer interface would work. --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/33250/33250af20922a831c31f7ef0da1e3e089214cd2b" alt=""
If an object's buffer isn't allocated for the object's life when the object is created, it should not support the "safe" version of the protocol (maybe a different name would be better), and users should not release the GIL while using on to the pointer.
'Persistent' buffer interface? Too long?
No, persistent typically refers to things that survive longer than a process. Maybe 'static' buffer interface would work.
Ahem, right. Maybe Barry can change it before committing this? Thomas
data:image/s3,"s3://crabby-images/f0776/f07765b8fc6a5b69e717d7e4e22dc176ca8602e5" alt=""
--- Thomas Heller and Guido wrote:
If an object's buffer isn't allocated for the object's life when the object is created, it should not support the "safe" version of the protocol (maybe a different name would be better), and users should not release the GIL while using on to the pointer.
'Persistent' buffer interface? Too long?
No, persistent typically refers to things that survive longer than a process. Maybe 'static' buffer interface would work.
I'll just chime in with the name "Fixed" Buffer Interface. They aren't really static either, and fixed applies in at least two senses. :-) __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
'Persistent' buffer interface? Too long?
No, persistent typically refers to things that survive longer than a process. Maybe 'static' buffer interface would work.
I'll just chime in with the name "Fixed" Buffer Interface. They aren't really static either, and fixed applies in at least two senses. :-)
Nice! --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/33250/33250af20922a831c31f7ef0da1e3e089214cd2b" alt=""
From: "Scott Gilbert" <xscottg@yahoo.com>
--- Thomas Heller and Guido wrote:
If an object's buffer isn't allocated for the object's life when the object is created, it should not support the "safe" version of the protocol (maybe a different name would be better), and users should not release the GIL while using on to the pointer.
'Persistent' buffer interface? Too long?
No, persistent typically refers to things that survive longer than a process. Maybe 'static' buffer interface would work.
I'll just chime in with the name "Fixed" Buffer Interface. They aren't really static either, and fixed applies in at least two senses. :-)
Yup. I'll change it. Thanks, Thomas
data:image/s3,"s3://crabby-images/ce4fa/ce4fae6402cd9b6b9824c9c561bbb34574206016" alt=""
----- Original Message ----- From: "Guido van Rossum" <guido@python.org> To: "Thomas Heller" <thomas.heller@ion-tof.com> Cc: "Scott Gilbert" <xscottg@yahoo.com>; "Neil Hodgson" <nhodgson@bigpond.net.au>; <python-dev@python.org> Sent: Monday, July 29, 2002 1:10 PM Subject: Re: [Python-Dev] pre-PEP: The Safe Buffer Interface
If an object's buffer isn't allocated for the object's life when the object is created, it should not support the "safe" version of the protocol (maybe a different name would be better), and users should not release the GIL while using on to the pointer.
'Persistent' buffer interface? Too long?
No, persistent typically refers to things that survive longer than a process. Maybe 'static' buffer interface would work.
"cautious"? regards ----------------------------------------------------------------------- Steve Holden http://www.holdenweb.com/ Python Web Programming http://pydish.holdenweb.com/pwp/ -----------------------------------------------------------------------
data:image/s3,"s3://crabby-images/ede6d/ede6d2cca33d547132f95e3f8c841d9976575f77" alt=""
Guido:
I don't like where this is going. Let's not add locking to the buffer protocol.
Do you still object to it even in the form I proposed in my last message? (I.e. no separate "lock" call, locking is implicit in the getxxxbuffer calls.) It does make the protocol slightly more complicated to use (must remember to make a release call when you're finished with the pointer) but it seems like a good tradeoff to me for the flexibility gained. Note that there can't be any problems with deadlock, since no blocking is involved. Maybe "locking" is even the wrong term -- it's more a form of reference counting.
probably nothing that could possibly invoke the Python interpreter recursively, since that might release the GIL. This would generally mean that calls to Py_DECREF() are unsafe while holding on to a buffer pointer!
That could be fixed by incrementing the Python refcount as long as a pointer is held. That could be done even without the rest of my locking proposal. Of course, if you do that you need a matching release call, so you might as well implement the locking while you're at it. Mind you, if a release call is necessary, whoever holds the pointer must also hold a reference to the Python object, so that they can make the release call. So incrementing the Python refcount might not be necessary after all! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
data:image/s3,"s3://crabby-images/f0776/f07765b8fc6a5b69e717d7e4e22dc176ca8602e5" alt=""
--- Greg Ewing <greg@cosc.canterbury.ac.nz> wrote:
Guido:
I don't like where this is going. Let's not add locking to the buffer protocol.
Do you still object to it even in the form I proposed in my last message? (I.e. no separate "lock" call, locking is implicit in the getxxxbuffer calls.)
It does make the protocol slightly more complicated to use (must remember to make a release call when you're finished with the pointer) but it seems like a good tradeoff to me for the flexibility gained.
I realize this wasn't addressed to me, and that I said I would butt out when you were in favor of canning the proposal altogether, but I won't let that get in the way. :-) We haven't seen a semi-thorough use case where the locking behavior is beneficial yet. While I appreciate and agree with the intent of trying to get a more flexible object, I think there is at least one of several problems buried down a little further than you and Neil are looking. I'm concerned that this is very much like the segment count features of the current PyBufferProcs. It was apparently designed for more generality, and while no one uses it, everyone has to check that the segment count is one or raise an exception. If there is no realizable benefit to the acquire/release semantics of the new interface, then this is just extra burden too. Lets find a realizable benefit before we muck up Thomas's good simple proposal with this stuff. In the current Python core, I can think of the following objects that would need a retrofit to this new interface (there may be more): string unicode mmap array The string, unicode, and mmap objects do not resize or reallocate by design. So for them the extra acquire/release requirements are burden with no benefit. The array object does resize (via the extend method among others). So lets say that an array object gets passed to an extension that locks the buffer and grabs the pointer. The extension releases the GIL so that another thread can work on the array object. Another thread comes in and wants to do a resize (via the extend method). (We don't need to introduce threads for this since the asynchronous I/O case is just the same.) If extend() is called while thread 1 has the array locked, it can: A) raise an exception or return an error B) block until the lock count returns to zero C) ??? .) .) Case A is troublesome because depending on thread scheduling/disk performance, you will or won't get the exception. So you've got a weird race condition where an operation might have been valid if it had only executed a split second later, but due to misfortune it raised an exception. I think this non-determinism is ugly at the very least. However since it's recoverable, you could try again (polling), or ignore the request completely (odd behavior). I think this is what both you and Neil are proposing, and I don't see how this is terribly useful. While I don't think B is the strategy anyone is proposing, it means you have two blocking objects in effect (the GIL and whatever the array uses to implement blocking). If we're not extremely careful, we can get deadlock here. I'm still looking for any good examples that fall into cases C and beyond. Neil offered a third example that might fit. He says that he could buffer the user event that led to the resize operation. If that is his strategy, I'd like to see it explained further. It sounds like taking the event and not processing it until the asynchronous I/O operation has completed. At which point I wonder what using asynchronous I/O achieved since the resize operation had to wait synchronously for the I/O to complete. This also sounds suspiciously like blocking the resize thread, but I won't argue that point. __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com
data:image/s3,"s3://crabby-images/ede6d/ede6d2cca33d547132f95e3f8c841d9976575f77" alt=""
Scott Gilbert <xscottg@yahoo.com>:
We haven't seen a semi-thorough use case where the locking behavior is beneficial yet. ... If there is no realizable benefit to the acquire/release semantics of the new interface, then this is just extra burden too.
The proposer of the original safe-buffer interface claimed to have a use case where the existing buffer interface is not safe enough, involving asynchronous I/O. I've been basing my comments on the assumption that he does actually have a need for it. The original proposal was restricted to non-resizable objects. I suggested a small extension which would remove this restriction, at what seems to me quite a small cost. It may turn out that the restriction is easily lived with. On the other hand, we might decide later that it's a nuisance. What worries me is if we design a restricted safe-buffer interface now, and start using it, and later decide that we want an unrestricted safe-buffer interface, we'll then have two different safe-buffer interfaces around, with lots of code that will only accept non-resizable objects for no reason other than that it's using the old interface. So I think it's worth putting in some thought and getting it as right as we can from the beginning.
I'm concerned that this is very much like the segment count features of the current PyBufferProcs. It was apparently designed for more generality, and while no one uses it, everyone has to check that the segment count is one or raise an exception.
It's not as bad as that! My version of the proposal would impose *no* burden on implementations that did not require locking, for the following reasons: 1) Locking is an optional task performed by the getxxxbuffer routines. Objects which do not require locking just don't do it. 2) For objects not requiring locking, the releasebuffer operation is a no-op. Such an object can simply not implement this routine, and the type machinery can fill it in with a stub. It does place one extra burden on users of the interface, namely calling the release routine. But I believe that this could even be beneficial, in a way. The user is going to have to think about the lifetime of the pointer, and be sure to keep a reference to the underlying Python object as long as the pointer is needed. Having to keep it around so that you can call the release routine on it would help to bring this into sharp focus.
The extension releases the GIL so that another thread can work on the array object.
Hey, whoa right there! If you have two threads accessing this array object simulaneously, you should be using a mutex or semaphore or something to coordinate them. As I pointed out before, thread synchronisation is outside the scope of my proposal. The only purpose of the locking, in my proposal, is to ensure that an exception occurs instead of a crash if the programmer screws up and tries to resize an object whose internals are being messed with. It's up to the programmer to do whatever is necessary to ensure that he doesn't do that.
If extend() is called while thread 1 has the array locked, it can:
A) raise an exception or return an error
Yes. (Raise an exception.)
Case A is troublesome because depending on thread scheduling/disk performance, you will or won't get the exception.
As I said before, you should be synchronising your threads somehow *before* they operate on the object! If you don't, you deserve whatever you get. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
data:image/s3,"s3://crabby-images/f0776/f07765b8fc6a5b69e717d7e4e22dc176ca8602e5" alt=""
--- Greg Ewing <greg@cosc.canterbury.ac.nz> wrote:
Scott Gilbert <xscottg@yahoo.com>:
We haven't seen a semi-thorough use case where the locking behavior is beneficial yet. ... If there is no realizable benefit to the acquire/release semantics of the new interface, then this is just extra burden too.
The proposer of the original safe-buffer interface claimed to have a use case where the existing buffer interface is not safe enough, involving asynchronous I/O. I've been basing my comments on the assumption that he does actually have a need for it.
I believe Thomas Heller's needs were met without making locking part of the interface, but that he was willing to bend to please you and Neil. His original proposal did not include any notion of locking. Nor does his current since Guido has taken a stand on this issue.
So I think it's worth putting in some thought and getting it as right as we can from the beginning.
Absolutely. I just wanted to make sure that there is at least one sensible use case before adding the complexity. Moreover, if the sensible use cases for locking are few and far between, then I'm still inclined to leave it out since you can add the locking semantics at a different level. It looks like Neil has sufficiently defined an example where it's useful. His use case is a bit complicated though, and I think he could get every bit of that functionality by putting the locking in a smarter object tailored for his application, and working with temporary "snapshot" objects with an explicit release() method. What if Neil decides he needs Reader/Writer locks? This is completely justifiable too, since multiple threads can read an object without interfering, but only one should be writing it. We shouldn't arbitrarily add complexity for the exceptional cases.
I'm concerned that this is very much like the segment count features of the current PyBufferProcs. It was apparently designed for more generality, and while no one uses it, everyone has to check that the segment count is one or raise an exception.
It's not as bad as that! My version of the proposal would impose *no* burden on implementations that did not require locking, for the following reasons:
Your use of the word *no* is different than mine. :-) I could similarly claim that the segment count puts no burden on implementations that don't need it.
1) Locking is an optional task performed by the getxxxbuffer routines. Objects which do not require locking just don't do it.
2) For objects not requiring locking, the releasebuffer operation is a no-op. Such an object can simply not implement this routine, and the type machinery can fill it in with a stub.
I believe it will be a no-op in enough places that extension writers will do it wrong without even knowing.
The extension releases the GIL so that another thread can work on the array object.
Hey, whoa right there! If you have two threads accessing this array object simulaneously, you should be using a mutex or semaphore or something to coordinate them. As I pointed out before, thread synchronisation is outside the scope of my proposal.
This is exactly Neil's use case. He's got two threads reading it simultaneously. One thread (not really a thread, but the asynchronous I/O operation) is writing to disk, and the other thread is keeping the user interface updated. There is no problem until the user tries to enter text (which forces a resize) before the asynchronous I/O is complete. Neil has a solution for this, but I think it's less than typical.
The only purpose of the locking, in my proposal, is to ensure that an exception occurs instead of a crash if the programmer screws up and tries to resize an object whose internals are being messed with. It's up to the programmer to do whatever is necessary to ensure that he doesn't do that.
If extend() is called while thread 1 has the array locked, it can:
A) raise an exception or return an error
Yes. (Raise an exception.)
Which exception? Would you introduce a standard exception that should be raised when the user tries to do an operation that currently isn't allowed because the buffer is locked? Truthfully, now that Neil has given his explanation, I'm beginning to bend on this a bit. You're right in that it's not that much burden (however, it's more than *no* burden :-), and someone might find it useful. I still think it's going to be pretty uncommon, and I still believe the locking can be added on top of the simpler interface as needed. Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com
data:image/s3,"s3://crabby-images/ede6d/ede6d2cca33d547132f95e3f8c841d9976575f77" alt=""
Moreover, if the sensible use cases for locking are few and far between, then I'm still inclined to leave it out since you can add the locking semantics at a different level.
Are you sure about that? Without the locking, only non-resizable objects would be able to implement the protocol. So any higher level locking would have to be implemented on top of the old, non-safe version. Then you'd have to make sure that all parts of your application accessed the object through the extra layer. The "safe" part would be lost.
Your use of the word *no* is different than mine. :-) I could similarly claim that the segment count puts no burden on implementations that don't need it.
I think I may have been replying to something other than what was said. But what I said is still true -- it imposes no extra burden on *implementers* of the interface which don't use the extra feature. I acknowledge that it complicates things slightly for *users* of the interface, but not as much as the seg count stuff does (there's no need for any testing or exception raising).
I believe it will be a no-op in enough places that extension writers will do it wrong without even knowing.
Well, there's not much that can be done about extension writers who fail to read the documentation, or wilfully ignore it.
Which exception? Would you introduce a standard exception that should be raised when the user tries to do an operation that currently isn't allowed because the buffer is locked?
Maybe. It doesn't matter. The important thing is that the interpeter does not crash.
I still believe the locking can be added on top of the simpler interface as needed.
But it can't, since as I pointed out above, resizable objects won't be able to provide the simpler interface! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
I don't like where this is going. Let's not add locking to the buffer protocol.
Do you still object to it even in the form I proposed in my last message? (I.e. no separate "lock" call, locking is implicit in the getxxxbuffer calls.)
Yes, I still object. Having to make a call to release a resource with a function call is extremely error-prone, as we've seen with reference counting. There are too many cases where some early exit from a piece of code doesn't make the release call.
It does make the protocol slightly more complicated to use (must remember to make a release call when you're finished with the pointer) but it seems like a good tradeoff to me for the flexibility gained.
I'm not sure I see the use case. The main data types for which I expect this will be used would be strings and the new 'bytes' type, and both have fixed buffers that never move.
probably nothing that could possibly invoke the Python interpreter recursively, since that might release the GIL. This would generally mean that calls to Py_DECREF() are unsafe while holding on to a buffer pointer!
That could be fixed by incrementing the Python refcount as long as a pointer is held. That could be done even without the rest of my locking proposal. Of course, if you do that you need a matching release call, so you might as well implement the locking while you're at it.
I think you misunderstand what I wrote. A py_DECREF() for an *unrelated* object can invoke Python code (if it ends up deleting a class instance with a __del__ method). --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/ede6d/ede6d2cca33d547132f95e3f8c841d9976575f77" alt=""
I think you misunderstand what I wrote. A py_DECREF() for an *unrelated* object can invoke Python code (if it ends up deleting a class instance with a __del__ method).
I don't see why that's a problem. If the unrelated object's __del__ ends up messing with the object in question, that's an issue for the programmer to sort out. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
data:image/s3,"s3://crabby-images/54459/544597433b501b30460d150d60abaa21fda3d0ab" alt=""
Scott Gilbert:
I assume this means any call to getsafereadpointer()/getsafewritepointer() will increment the lock count. So the UnlockObject() calls will be mandatory.
The UnlockObject call will be needed if you do want to permit resizing (again). It will not be needed for statically sized objects, including all the types that are included in the PEP currently, or where you have an object that will no longer need to be resizable. For example: you construct a sound buffer, fill it with noise, then lock it so that a pointer to its data can be given to the asynch sound playing function. If you don't need to write to the sound buffer again, it doesn't need to be unlocked.
Either that, or you'll have an explicit LockObject() call as well. What behavior should happen when a resise is attempted while the lock count is positive?
The most common response will be some form of failure, probably throwing an exception. Other responses, such as buffering the resize, may be sensible in particular circumstances. Neil
data:image/s3,"s3://crabby-images/54459/544597433b501b30460d150d60abaa21fda3d0ab" alt=""
Thomas Heller:
..., but I understand Neil's requirements.
Can they be fulfilled by adding some kind of UnlockObject() call to the 'safe buffer interface', which should mean 'I won't use the pointer received by getsaferead/writebufferproc any more'?
Yes, that is exactly what I want. Neil
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
..., but I understand Neil's requirements.
Can they be fulfilled by adding some kind of UnlockObject() call to the 'safe buffer interface', which should mean 'I won't use the pointer received by getsaferead/writebufferproc any more'?
Yes, that is exactly what I want.
I guess I still don't understand Neil's requirements. What can't be done with the existing buffer interface (which requires you to hold the GIL while using the pointer)? --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/33250/33250af20922a831c31f7ef0da1e3e089214cd2b" alt=""
From: "Guido van Rossum" <guido@python.org>
..., but I understand Neil's requirements.
Can they be fulfilled by adding some kind of UnlockObject() call to the 'safe buffer interface', which should mean 'I won't use the pointer received by getsaferead/writebufferproc any more'?
Yes, that is exactly what I want.
I guess I still don't understand Neil's requirements. What can't be done with the existing buffer interface (which requires you to hold the GIL while using the pointer)?
Processing in Python :-(. Thoms
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
From: "Guido van Rossum" <guido@python.org>
..., but I understand Neil's requirements.
Can they be fulfilled by adding some kind of UnlockObject() call to the 'safe buffer interface', which should mean 'I won't use the pointer received by getsaferead/writebufferproc any more'?
Yes, that is exactly what I want.
I guess I still don't understand Neil's requirements. What can't be done with the existing buffer interface (which requires you to hold the GIL while using the pointer)?
Processing in Python :-(.
Can you work out an example? I don't understand what you can do in Python, apart from passing it to something else that takes the buffer API or converting the data to a string or a bytes buffer. --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/33250/33250af20922a831c31f7ef0da1e3e089214cd2b" alt=""
[Guido]
I guess I still don't understand Neil's requirements. What can't be done with the existing buffer interface (which requires you to hold the GIL while using the pointer)?
Processing in Python :-(.
Can you work out an example? Not sure, maybe Neil could do it better.
However, you yourself pointed out to Greg that it may be unsafe to even call Py_DECREF() on an unrelated object.
I don't understand what you can do in Python, apart from passing it to something else that takes the buffer API or converting the data to a string or a bytes buffer.
Or pack it into a buffer *object* and hand it to arbitrary Python code. That's what we have now. What does 'hold the GIL' mean in this context? No other thread can execute: we have complete control over what we do. But what are we *allowed* to do? Thomas
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
I guess I still don't understand Neil's requirements. What can't be done with the existing buffer interface (which requires you to hold the GIL while using the pointer)?
Processing in Python :-(.
Can you work out an example? Not sure, maybe Neil could do it better.
However, you yourself pointed out to Greg that it may be unsafe to even call Py_DECREF() on an unrelated object.
The safe rule is that you should grab the pointer and then do some I/O on it and nothing else.
I don't understand what you can do in Python, apart from passing it to something else that takes the buffer API or converting the data to a string or a bytes buffer.
Or pack it into a buffer *object* and hand it to arbitrary Python code. That's what we have now.
Since the object you're packing already supports the buffer API, I don't see the point of packing it in a buffer object.
What does 'hold the GIL' mean in this context? No other thread can execute: we have complete control over what we do. But what are we *allowed* to do?
When accessing a movable buffer, the safest rule is no Python API calls. There's a less restrictive safe rule, but it's messy because the end goal is "don't do anything that could conceivably end up in the Python interpreter main loop (ceval.c)" and there's no easy rule for that -- anything that uses Py_DECREF can end up doing that. --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/54459/544597433b501b30460d150d60abaa21fda3d0ab" alt=""
Thomas Heller (Guido, Thomas, Guido):
[Guido]
I guess I still don't understand Neil's requirements. What can't be done with the existing buffer interface (which requires you to hold the GIL while using the pointer)?
Processing in Python :-(.
Can you work out an example? Not sure, maybe Neil could do it better.
I see this interface as a bridge between objects offering generic buffer oriented facilities (asynch or low level I/O for example) and objects that want to make it possible to use these facilities on their data (text buffers, multimedia buffers, numeric arrays) by yielding a pointer to their otherwise internal data. The bridging code between the two objects is unrestricted Python code that may cause memory to be moved around. Neil
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
I see this interface as a bridge between objects offering generic buffer oriented facilities (asynch or low level I/O for example) and objects that want to make it possible to use these facilities on their data (text buffers, multimedia buffers, numeric arrays) by yielding a pointer to their otherwise internal data.
The bridging code between the two objects is unrestricted Python code that may cause memory to be moved around.
If the buffer is relatively small, copying the data an extra time shouldn't be a problem, and you can use the old API. If the buffer is huge, you probably shouldn't want to move the buffer around in memory anyway, So I don't think your case for needing a lockable interface is very strong. --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/54459/544597433b501b30460d150d60abaa21fda3d0ab" alt=""
Guido van Rossum:
If the buffer is relatively small, copying the data an extra time shouldn't be a problem, and you can use the old API.
If the buffer is huge, you probably shouldn't want to move the buffer around in memory anyway,
Even large (or huge) buffers may need extension (inserting text in Scintilla, adding a frame to a movie), leading to a reallocation and thus a move. Neil
data:image/s3,"s3://crabby-images/f0776/f07765b8fc6a5b69e717d7e4e22dc176ca8602e5" alt=""
--- Neil Hodgson <nhodgson@bigpond.net.au> wrote:
Scott Gilbert:
You could easily implement the a counting (recursive) mutex as described above, and it might be the case that throwing an exception on the length changing operations keeps the dead lock from occurring. I'm still a bit confused though.
Not as confused as I am. I don't think deadlocks or threads are that relevant to me. The most likely situations in which I would use the buffer interface is to perform large I/O operations without copying or when performing asynchronous I/O to load or save documents while continuing to run styling or linting tasks. I think its likely that the pieces of code accessing the buffer will not be real threads, but instead be cooperating contexts within a single-threaded UI framework so using semaphores will not be possible.
What happens when you've locked the buffer and passed a pointer to the I/O system for an asynchronous operation, but before that operation has completed, your main program wants to resize the buffer due to a user generated event? I had written responses/questions to other parts of your message, but I found that I was just asking the same question above over and over, so I've chopped them out. If you can explain this to me, and there aren't any problems with deadlock or polling, then I'll quit interfering and let you and Thomas decide if you really think the locking semantics are useful to a wide enough audience that it should be included in the core.
I don't want counting mutexes. I'm not defining behavior that needs them.
You said you wanted the locks to keep a count. So that you could call acquire() multiple times and have the buffer not truly become unlocked until release() was called the same amount of times. I'm willing to adopt any terminology you want for the purpose of this discussion. I think I understand the semantics or the counting operation, but I want to understand more what actually happens when the buffer is locked. __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com
data:image/s3,"s3://crabby-images/54459/544597433b501b30460d150d60abaa21fda3d0ab" alt=""
Scott Gilbert:
What happens when you've locked the buffer and passed a pointer to the I/O system for an asynchronous operation, but before that operation has completed, your main program wants to resize the buffer due to a user generated event?
That is up to the application or class designer. There are three reasonable responses I see: throw an exception, buffer the user event, or ignore the user event. The only thing guaranteed by providing the safe buffer interface is that the pointer will remain valid.
I don't want counting mutexes. I'm not defining behavior that needs them.
You said you wanted the locks to keep a count. So that you could call acquire() multiple times and have the buffer not truly become unlocked until release() was called the same amount of times. I'm willing to adopt any terminology you want for the purpose of this discussion. I think I understand the semantics or the counting operation, but I want to understand more what actually happens when the buffer is locked.
When the buffer is locked, it returns a pointer and promises that the pointer will remain valid until the buffer is unlocked. The buffer interface could be defined either to allow multiple (counted) locks or to fail further lock attempts. Counted locks would be applicable in more circumstances but require more implementation. I would prefer counted but it is not that important as a counting layer can be implemented over a single lock interface if needed. Neil
data:image/s3,"s3://crabby-images/f0776/f07765b8fc6a5b69e717d7e4e22dc176ca8602e5" alt=""
--- Neil Hodgson <nhodgson@bigpond.net.au> wrote:
Scott Gilbert:
What happens when you've locked the buffer and passed a pointer to the I/O system for an asynchronous operation, but before that operation has completed, your main program wants to resize the buffer due to a user generated event?
That is up to the application or class designer. There are three reasonable responses I see: throw an exception, buffer the user event, or ignore the user event. The only thing guaranteed by providing the safe buffer interface is that the pointer will remain valid.
The guarantee about the pointer remaining valid while the acquire_count is positive is clear. I'm concerned about what the other thread (the one that wants to resize it) is going to do while the lock count is positive. You've listed three possibilities, but lets narrow it down to the strategy that you intend to use in Scintilla (a real use case). I believe all three strategies lead to something undesirable (be it polling, deadlock, a confused user, or ???), but I don't want to exhaustively scrutinize all possibilities until we come up with one good example that you intend to use (it would bore you to read them, and me to type them). So what exactly would you do in Scintilla? (Or pick another good use case if you prefer.)
The buffer interface could be defined either to allow multiple (counted) locks or to fail further lock attempts. Counted locks would be applicable in more circumstances but require more implementation. I would prefer counted but it is not that important as a counting layer can be implemented over a single lock interface if needed.
A single lock interface can be implemented over an object without any locking. Have the lockable object return simple "fixed buffer objects" with a limited lifespan. __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com
data:image/s3,"s3://crabby-images/54459/544597433b501b30460d150d60abaa21fda3d0ab" alt=""
You've listed three possibilities, but lets narrow it down to the strategy that you intend to use in Scintilla (a real use case). I believe all
Scott Gilbert: three
strategies lead to something undesirable (be it polling, deadlock, a confused user, or ???), but I don't want to exhaustively scrutinize all possibilities until we come up with one good example that you intend to use (it would bore you to read them, and me to type them).
So what exactly would you do in Scintilla? (Or pick another good use case if you prefer.)
I'd prefer to ignore the input. Unfortunately users prefer a higher degree of friendliness :-( Since Scintilla is a component within a user interface, it shares this responsibility with the container application with the application being the main determinant. If I was writing a Windows-specific application that used Scintilla, and I wanted to use Asynchronous I/O then my preferred technique would be to change the message processing loop to leave the UI input messages in the queue until the I/O had completed. Once the I/O had completed then the message loop would change back to processing all messages which would allow the banked up input to come through. If I was feeling ambitious I may try to process some UI messages, possible detecting pressing Escape to abort a file load if it turned out the read was taking too long.
A single lock interface can be implemented over an object without any locking. Have the lockable object return simple "fixed buffer objects" with a limited lifespan.
This returns to the possibility of indeterminate lifespan as mentioned earlier in the thread.
At which point I wonder what using asynchronous I/O achieved since the resize operation had to wait synchronously for the I/O to complete. This also sounds suspiciously like blocking the resize thread, but I won't argue that point.
There may be other tasks that the application can perform while waiting for the I/O to complete, such as displaying, styling or line-wrapping whatever text has already arrived (assuming that there are some facilities for discovering this) or performing similar tasks for other windows. Neil
data:image/s3,"s3://crabby-images/33250/33250af20922a831c31f7ef0da1e3e089214cd2b" alt=""
[Scott]
A single lock interface can be implemented over an object without any locking. Have the lockable object return simple "fixed buffer objects" with a limited lifespan.
[Neil]
This returns to the possibility of indeterminate lifespan as mentioned earlier in the thread.
Can't you do something like this (maybe this is what Scott has in mind): static void _unlock(void *ptr, MyObject *self) { /* do whatever needed to unlock the object */ self->locked--; Py_DECREF(self); } static PyObject* MyObject_GetBuffer(MyObject *self) { /* Do whatever needed to lock the object */ self->lock++; Py_INCREF(self); return PyCObject_FromVoidPtrAndDesc(self->ptr, self, _unlock) } In plain text: Provide a method which returns a 'view' into your object's buffer after locking the object. The view holds a reference to object, the objects is unlocked and decref'd when the view is destroyed. In practice something better than a PyCObject will be used, and this one can even implement the 'fixed buffer' interface. Thomas
data:image/s3,"s3://crabby-images/54459/544597433b501b30460d150d60abaa21fda3d0ab" alt=""
Thomas Heller:
In plain text: Provide a method which returns a 'view' into your object's buffer after locking the object. The view holds a reference to object, the objects is unlocked and decref'd when the view is destroyed.
Yes, this handles the situation. However I see some problems here: 1 Explicit resource release, such as closing files, is easier to understand and debug than implicit ref-count exhaustion. 2 On platforms such as .NET and the JVM, the view object will live for an indeterminate time, prohibiting resizes until the VM decides to garbage collect. While the JVM can not return pointers, and so may seem to not be a candidate for this interface, it can return array references. 3 More complex implementation requiring a secondary view object. Neil
data:image/s3,"s3://crabby-images/f0776/f07765b8fc6a5b69e717d7e4e22dc176ca8602e5" alt=""
--- Neil Hodgson <nhodgson@bigpond.net.au> wrote:
Thomas Heller:
In plain text: Provide a method which returns a 'view' into your object's buffer after locking the object. The view holds a reference to object, the objects is unlocked and decref'd when the view is destroyed.
Yes, this handles the situation. However I see some problems here: 1 Explicit resource release, such as closing files, is easier to understand and debug than implicit ref-count exhaustion.
So add an explicit release() method to your object. Just because it supports the "Fixed Buffer API" doesn't mean you can't add other methods to it.
2 On platforms such as .NET and the JVM, the view object will live for an indeterminate time, prohibiting resizes until the VM decides to garbage collect. While the JVM can not return pointers, and so may seem to not be a candidate for this interface, it can return array references.
This is solved with the explicit release() method above. Just like files solve this problem with an explicit close() method.
3 More complex implementation requiring a secondary view object.
It's also a more complex problem that you're trying to solve. Putting the complexity on the common, simple, cases may not be appropriate when the complex cases are few and far between. Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com
data:image/s3,"s3://crabby-images/f0776/f07765b8fc6a5b69e717d7e4e22dc176ca8602e5" alt=""
--- Thomas Heller <thomas.heller@ion-tof.com> wrote:
In plain text: Provide a method which returns a 'view' into your object's buffer after locking the object. The view holds a reference to object, the objects is unlocked and decref'd when the view is destroyed.
Exactly. This is just like putting an explicit close() on the file object. Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com
data:image/s3,"s3://crabby-images/f0776/f07765b8fc6a5b69e717d7e4e22dc176ca8602e5" alt=""
--- Neil Hodgson <nhodgson@bigpond.net.au> wrote:
Since Scintilla is a component within a user interface, it shares this responsibility with the container application with the application being the main determinant. If I was writing a Windows-specific application
that
used Scintilla, and I wanted to use Asynchronous I/O then my preferred technique would be to change the message processing loop to leave the UI input messages in the queue until the I/O had completed. Once the I/O had completed then the message loop would change back to processing all messages which would allow the banked up input to come through.
Cool. This is what I was looking for. It's a tad complicated, but it makes a bit of sense. Is there anything in here that can't be done if you only had the simple (no locking) version of the fixed buffer interface?
A single lock interface can be implemented over an object without any locking. Have the lockable object return simple "fixed buffer objects" with a limited lifespan.
This returns to the possibility of indeterminate lifespan as mentioned earlier in the thread.
Not if you add an explicit release() method. Just like the file object has an explicit close() method. Your object with the locking smarts could just return "snapshot" views with an explicit release() method on them.
At which point I wonder what using asynchronous I/O achieved since the resize operation had to wait synchronously for the I/O to complete. This also sounds suspiciously like blocking the resize thread, but I won't argue that point.
There may be other tasks that the application can perform while waiting for the I/O to complete, such as displaying, styling or line- wrapping whatever text has already arrived (assuming that there are some facilities for discovering this) or performing similar tasks for other windows.
All good points. Thank you for indulging me. Sorry to be such a PITA. Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
Based on the example of mmap (which can be closed at any time) I agree that the fixed buffer interface needs to have "get" and "release" methods (please pick better names). Maybe Thomas can update PEP 298. --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/f0776/f07765b8fc6a5b69e717d7e4e22dc176ca8602e5" alt=""
--- Guido van Rossum <guido@python.org> wrote:
Based on the example of mmap (which can be closed at any time) I agree that the fixed buffer interface needs to have "get" and "release" methods (please pick better names). Maybe Thomas can update PEP 298.
Wow, the tides have turned. Fair enough. I think Neil put forth the names "acquire" and "release". So how about typedef struct { getreadbufferproc bf_getreadbuffer; getwritebufferproc bf_getwritebuffer; getsegcountproc bf_getsegcount; getcharbufferproc bf_getcharbuffer; /* fixed buffer interface functions */ acquirereadbufferproc bf_acquirereadbuffer; acquirewritebufferproc bf_acquirewritebuffer; releasebufferproc bf_releasebuffer; } PyBufferProcs; Whatever the actual names, should there be a bf_releasereadbuffer and bf_releasewritebuffer? Or just the one bf_releasebuffer? Could also just have one acquire function that indicates whether it is read-write or read-only via a return parameter. Is write-only ever useful? Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
I think Neil put forth the names "acquire" and "release". So how about
typedef struct { getreadbufferproc bf_getreadbuffer; getwritebufferproc bf_getwritebuffer; getsegcountproc bf_getsegcount; getcharbufferproc bf_getcharbuffer; /* fixed buffer interface functions */ acquirereadbufferproc bf_acquirereadbuffer; acquirewritebufferproc bf_acquirewritebuffer; releasebufferproc bf_releasebuffer; } PyBufferProcs;
Whatever the actual names, should there be a bf_releasereadbuffer and bf_releasewritebuffer? Or just the one bf_releasebuffer?
Just the one.
Could also just have one acquire function that indicates whether it is read-write or read-only via a return parameter.
That loses the (weak) symmetry with the existing API.
Is write-only ever useful?
No, write implies read. --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/33250/33250af20922a831c31f7ef0da1e3e089214cd2b" alt=""
Could also just have one acquire function that indicates whether it is read-write or read-only via a return parameter.
That loses the (weak) symmetry with the existing API.
There's nothing a client expecting a read/write pointer could do with a read only pointer IMO.
Is write-only ever useful?
No, write implies read.
Should it be named getfixedreadwritebuffer then? Thomas
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
Could also just have one acquire function that indicates whether it is read-write or read-only via a return parameter.
That loses the (weak) symmetry with the existing API.
There's nothing a client expecting a read/write pointer could do with a read only pointer IMO.
So we agree that it's a bad idea to have one function. :-)
Is write-only ever useful?
No, write implies read.
Should it be named getfixedreadwritebuffer then?
No, the existing API also uses getwritebuffer implying read/write. --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/ede6d/ede6d2cca33d547132f95e3f8c841d9976575f77" alt=""
Scott Gilbert <xscottg@yahoo.com>:
getreadbufferproc bf_getreadbuffer; getwritebufferproc bf_getwritebuffer;
acquirereadbufferproc bf_acquirereadbuffer; acquirewritebufferproc bf_acquirewritebuffer;
Is there really a need for both "get" and "acquire" methods? Surely if an object requires locking, it always requires locking, so why can't the "get" functions simply include the locking operation if they need it? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
data:image/s3,"s3://crabby-images/f0776/f07765b8fc6a5b69e717d7e4e22dc176ca8602e5" alt=""
--- Greg Ewing <greg@cosc.canterbury.ac.nz> wrote:
Scott Gilbert <xscottg@yahoo.com>:
getreadbufferproc bf_getreadbuffer; getwritebufferproc bf_getwritebuffer;
acquirereadbufferproc bf_acquirereadbuffer; acquirewritebufferproc bf_acquirewritebuffer;
Is there really a need for both "get" and "acquire" methods? Surely if an object requires locking, it always requires locking, so why can't the "get" functions simply include the locking operation if they need it?
That is the proposal. The get methods are the legacy (non-fixed) interface. Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com
data:image/s3,"s3://crabby-images/33250/33250af20922a831c31f7ef0da1e3e089214cd2b" alt=""
From: "Greg Ewing" <greg@cosc.canterbury.ac.nz>
Scott Gilbert <xscottg@yahoo.com>:
getreadbufferproc bf_getreadbuffer; getwritebufferproc bf_getwritebuffer;
acquirereadbufferproc bf_acquirereadbuffer; acquirewritebufferproc bf_acquirewritebuffer;
Is there really a need for both "get" and "acquire" methods? Surely if an object requires locking, it always requires locking, so why can't the "get" functions simply include the locking operation if they need it?
Backward compatibility. If we change the array object to enter a locked state when getreadbuffer() is called, it would be surprising. Thomas
data:image/s3,"s3://crabby-images/ede6d/ede6d2cca33d547132f95e3f8c841d9976575f77" alt=""
Backward compatibility. If we change the array object to enter a locked state when getreadbuffer() is called, it would be surprising.
Yes, I understand now. I hadn't realised that list include both the old and new routines. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
data:image/s3,"s3://crabby-images/33250/33250af20922a831c31f7ef0da1e3e089214cd2b" alt=""
From: "Guido van Rossum" <guido@python.org>
Based on the example of mmap (which can be closed at any time) I agree that the fixed buffer interface needs to have "get" and "release" methods (please pick better names). Maybe Thomas can update PEP 298.
The consequence: mmap objects need a 'buffer lock counter', and cannot be closed while the count is >0. Which exception is raised then? Or do you have something different in mind? The lock counter wouuld not be needed for strings and unicode... Thomas
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
The consequence: mmap objects need a 'buffer lock counter', and cannot be closed while the count is >0. Which exception is raised then?
Pick one -- mmap.error (== EnvironmentError) seems fine to me. Alternately, close() could set a "please close me" flag which causes the mmap file to be closed when the last release is called. Of course, the acquire method should raise an exception when it's already closed.
Or do you have something different in mind? The lock counter wouuld not be needed for strings and unicode...
And the array module could have one. --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/ede6d/ede6d2cca33d547132f95e3f8c841d9976575f77" alt=""
Thomas Heller <thomas.heller@ion-tof.com>:
The consequence: mmap objects need a 'buffer lock counter', and cannot be closed while the count is >0. Which exception is raised then?
Maybe instead of raising an exception at all, the closing could simply be deferred until the lock count reaches 0? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
data:image/s3,"s3://crabby-images/ede6d/ede6d2cca33d547132f95e3f8c841d9976575f77" alt=""
This restricts the set of objects that can be buffers to statically sized objects. I'd prefer that dynamically resizable objects be able to be buffers.
That's what bothers me about the proposal -- I suspect that this restriction will turn out to be too restrictive to make it useful. But maybe locking could be built into the safe-buffer protocol? Resizable objects wanting to support the safe buffer protocol would be required to maintain a lock count which is incremented on each getsafebufferptr call. There would also have to be a releasesafebufferptr call to decrement the lock count. As long as the lock count is nonzero, attempting to resize the object would raise an exception. That way, resizable objects could be used as asynchronous I/O buffers as long as you didn't try to resize them while actually doing I/O. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
data:image/s3,"s3://crabby-images/ede6d/ede6d2cca33d547132f95e3f8c841d9976575f77" alt=""
Thomas Heller <thomas.heller@ion-tof.com>:
This PEP proposes an extension to the buffer interface called the 'safe buffer interface'.
I don't understand the need for this. The C-level buffer interface is already safe as long as you use it properly -- which means using it to fetch the pointer each time it's needed. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
data:image/s3,"s3://crabby-images/f0776/f07765b8fc6a5b69e717d7e4e22dc176ca8602e5" alt=""
--- Greg Ewing <greg@cosc.canterbury.ac.nz> wrote:
Thomas Heller <thomas.heller@ion-tof.com>:
This PEP proposes an extension to the buffer interface called the 'safe buffer interface'.
I don't understand the need for this. The C-level buffer interface is already safe as long as you use it properly -- which means using it to fetch the pointer each time it's needed.
This is not my PEP, but let me defend it anyway. The need for this derives from wanting to do more than one thing at a time in Python (multiple processors with multiple threas, asynchronous I/O, DMA transers, ???). One thread grabs the pointer from the "safe buffer interface" and then releases the GIL while it works on that pointer. Now another thread is free to acquire the GIL and run concurrently with the first. (The asynchronous I/O case applies even on single processor machines...) I believe you were the one to explain to me why an extension can't release the GIL while it works with the PyBufferProcs acquired pointer. This PEP tries to allow the extension to do just that. __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com
data:image/s3,"s3://crabby-images/ede6d/ede6d2cca33d547132f95e3f8c841d9976575f77" alt=""
Scott Gilbert <xscottg@yahoo.com>:
The need for this derives from wanting to do more than one thing at a time in Python (multiple processors with multiple threas, asynchronous I/O, DMA transers, ???).
In any situation like that, you should be using some form of locking on the object concerned. The Python buffer interface is not the right place to deal with these issues. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
data:image/s3,"s3://crabby-images/f0776/f07765b8fc6a5b69e717d7e4e22dc176ca8602e5" alt=""
--- Greg Ewing <greg@cosc.canterbury.ac.nz> wrote:
The need for this derives from wanting to do more than one thing at a time in Python (multiple processors with multiple threas, asynchronous I/O, DMA transers, ???).
In any situation like that, you should be using some form of locking on the object concerned. The Python buffer interface is not the right place to deal with these issues.
I humbly disagree with you, and I like his proposal. His PEP is simple and the locking business could lead to a mess if everyone involved is not very careful. However, I'll let him champion his PEP. I've got my own stuff to worry about, and this is part of why I didn't want to add new protocol to the PEP I've been working on. Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
Thomas, I like your PEP. Could you clean it up (changing 'large' into 'safe' etc.) and send it to Barry? Some comments:
Backward Compatibility
There are no backward compatibility problems.
That's a simplification of the truth -- you're adding two new fields to an existing struct. But the flag bit you add makes that old and new versions of the struct can be distinguished.
It may be a good idea to expose the following convenience functions:
int PyObject_AsSafeReadBuffer(PyObject *obj, void **buffer, size_t *buffer_len);
int PyObject_AsSafeWriteBuffer(PyObject *obj, void **buffer, size_t *buffer_len);
These functions return 0 on success, set buffer to the memory location and buffer_len to the length of the memory block in bytes. On failure, they return -1 and set an exception.
Please make these a manadatory part of the proposal. Please also try to summarize the discussion so far here. My personal opinion: locking seems the wrong approach, given the danger of deadlock; Scintilla can use the existing buffer protocol, assuming its buffer doesn't move as long as you don't release the GIL and don't make calls into Scintilla. --Guido van Rossum (home page: http://www.python.org/~guido/)
participants (6)
-
Greg Ewing
-
Guido van Rossum
-
Neil Hodgson
-
Scott Gilbert
-
Steve Holden
-
Thomas Heller