GIL required for _all_ Python calls?
Hi, I've been wondering whether it's possible to release the GIL in the regex engine during matching. I know that it needs to have the GIL during memory-management calls, but does it for calls like Py_UNICODE_TOLOWER or PyErr_SetString? Is there an easy way to find out? Or is it just a case of checking the source files for mentions of the GIL? The header file for PyList_New, for example, doesn't mention it! Thanks
MRAB wrote:
Hi,
I've been wondering whether it's possible to release the GIL in the regex engine during matching.
I know that it needs to have the GIL during memory-management calls, but does it for calls like Py_UNICODE_TOLOWER or PyErr_SetString? Is there an easy way to find out? Or is it just a case of checking the source files for mentions of the GIL? The header file for PyList_New, for example, doesn't mention it!
Thanks
Anything that Py_INCREF or Py_DECREF's should have the GIL, or you may get concurrent updating of the value, and then the final value is wrong. (two threads do 5+1 getting 6, rather than 7, and when the decref, you end up at 4 rather than back at 5). AFAIK, the only things that don't require the GIL are macro functions, like PyString_AS_STRING or PyTuple_SET_ITEM. PyErr_SetString, for example, will be increfing and setting the exception state, so certainly needs the GIL to be held. John =:->
2010/1/6 John Arbash Meinel
Anything that Py_INCREF or Py_DECREF's should have the GIL, or you may get concurrent updating of the value, and then the final value is wrong. (two threads do 5+1 getting 6, rather than 7, and when the decref, you end up at 4 rather than back at 5).
Correct.
AFAIK, the only things that don't require the GIL are macro functions, like PyString_AS_STRING or PyTuple_SET_ITEM. PyErr_SetString, for example, will be increfing and setting the exception state, so certainly needs the GIL to be held.
As a general rule, I would say, no Py* macros are safe without the gil either (the exception being Py_END_ALLOW_THREADS), since they mutate Python objects which must be protected. -- Regards, Benjamin
On Wed, Jan 6, 2010 at 7:32 PM, Benjamin Peterson
2010/1/6 John Arbash Meinel
: AFAIK, the only things that don't require the GIL are macro functions, like PyString_AS_STRING or PyTuple_SET_ITEM. PyErr_SetString, for example, will be increfing and setting the exception state, so certainly needs the GIL to be held.
As a general rule, I would say, no Py* macros are safe without the gil either (the exception being Py_END_ALLOW_THREADS), since they mutate Python objects which must be protected.
That's keeping it on the safe side, since there are some macros like PyString_AS_STRING() that are also safe, *if* you are owning at least one reference to the string object. At the same time, "no Py* macros" is not quite strong enough, since if you called PyString_AS_STRING() before releasing the GIL but you don't own a reference to the string object, the string might be deallocated behind your back by another thread. A better rule would be "you may access the memory buffer in a PyString or PyUnicode object with the GIL released as long as you own a reference to the string object." Everything else is out of bounds (or not worth the bother). -- --Guido van Rossum (python.org/~guido)
Guido van Rossum, 07.01.2010 05:29:
A better rule would be "you may access the memory buffer in a PyString or PyUnicode object with the GIL released as long as you own a reference to the string object." Everything else is out of bounds (or not worth the bother).
Is that a "yes" regarding the OP's original question about releasing the GIL during regexp searches? Stefan
MRAB, 07.01.2010 04:07:
I've been wondering whether it's possible to release the GIL in the regex engine during matching.
I know that it needs to have the GIL during memory-management calls, but does it for calls like Py_UNICODE_TOLOWER
Py_UNICODE_TOLOWER looks safe to me at first glance.
or PyErr_SetString?
Certainly not safe.
Is there an easy way to find out?
Release it and fix any crashes? Note that this isn't a safe solution, though, as some GIL requiring code may be platform specific. So a better approach might be to extract any obviously problematic stuff from the existing code (such as any exception handling, explicit ref-counting or object creation), and *then* try to release the GIL. Stefan
MRAB
I know that it needs to have the GIL during memory-management calls, but does it for calls like Py_UNICODE_TOLOWER or PyErr_SetString? Is there an easy way to find out?
There is no "easy way" to do so. The only safe way is to examine all the functions or macros you want to call with the GIL released, and assess whether it is safe to call them. As already pointed out, no reference count should be changed, and generally no mutable container should be accessed, except if that container is known not to be referenced anywhere else (that would be the case for e.g. a list that your function has created and is busy populating). I agree that releasing the GIL when doing non-trivial regex searches is a worthwhile research, so please don't give up immediately :-) Regards Antoine Pitrou.
A better rule would be "you may access the memory buffer in a PyString or PyUnicode object with the GIL released as long as you own a reference to the string object." Everything else is out of bounds (or not worth the bother).
Is that a "yes" regarding the OP's original question about releasing the GIL during regexp searches?
No, because the regex engine may also operate on buffers that start moving around when you release the GIL. Regards, Martin
I've been wondering whether it's possible to release the GIL in the regex engine during matching.
I don't think that's possible. The regex engine can also operate on objects whose representation may move in memory when you don't hold the GIL (e.g. buffers that get mutated). Even if they stay in place - if their contents changes, regex results may be confusing. Regards, Martin
On Jan 7, 2010, at 3:27 PM, Martin v. Löwis wrote:
I've been wondering whether it's possible to release the GIL in the regex engine during matching.
I don't think that's possible. The regex engine can also operate on objects whose representation may move in memory when you don't hold the GIL (e.g. buffers that get mutated). Even if they stay in place - if their contents changes, regex results may be confusing.
It seems probably worthwhile to optimize for the common case of using the regexp engine on an immutable object of type "str" or "bytes", and allow releasing the GIL in *that* case, even if you have to keep it for the general case. James
Martin v. Löwis
I don't think that's possible. The regex engine can also operate on objects whose representation may move in memory when you don't hold the GIL (e.g. buffers that get mutated).
Why is it a problem? If we get a buffer through the new buffer API, the object should ensure that the representation isn't moved away until the buffer is released. Regards Antoine.
I've been wondering whether it's possible to release the GIL in the regex engine during matching.
I don't think that's possible. The regex engine can also operate on objects whose representation may move in memory when you don't hold the GIL (e.g. buffers that get mutated). Even if they stay in place - if their contents changes, regex results may be confusing.
It seems probably worthwhile to optimize for the common case of using the regexp engine on an immutable object of type "str" or "bytes", and allow releasing the GIL in *that* case, even if you have to keep it for the general case.
Right. This problem was the one that I thought of first. Thinking about these things is fairly difficult (to me, at least), so I think I could only tell whether I would consider a patch thread-safe that released the GIL around matching under selected circumstances - if I had the patch available. I don't see any obvious reason (assuming Guido's list of conditions holds - i.e. you are holding references to everything you access). Regards, Martin
I don't think that's possible. The regex engine can also operate on objects whose representation may move in memory when you don't hold the GIL (e.g. buffers that get mutated).
Why is it a problem? If we get a buffer through the new buffer API, the object should ensure that the representation isn't moved away until the buffer is released.
In 2.7, we currently get the buffer with bf_getreadbuffer. In 3.x, we have /* Release the buffer immediately --- possibly dangerous but doing something else would require some re-factoring */ PyBuffer_Release(&view); Even if we do use the new API, and correctly, it still might be confusing if the contents of the buffer changes underneath. Regards, Martin
Le Thu, 07 Jan 2010 22:11:36 +0100, Martin v. Löwis a écrit :
Even if we do use the new API, and correctly, it still might be confusing if the contents of the buffer changes underneath.
Well, no more confusing than when you compute a SHA1 hash or zlib- compress the buffer, is it? Regards Antoine
On Fri, Jan 8, 2010 at 6:27 AM, Antoine Pitrou
Le Thu, 07 Jan 2010 22:11:36 +0100, Martin v. Löwis a écrit :
Even if we do use the new API, and correctly, it still might be confusing if the contents of the buffer changes underneath.
Well, no more confusing than when you compute a SHA1 hash or zlib- compress the buffer, is it?
That depends. Algorithms that make exactly one pass over the buffer will run fine (maybe producing a meaningless result). But the regex matcher may scan the buffer repeatedly (for backtracking purposes) and it would take a considerable analysis to prove that cannot mess up its internal data structures if the data underneath changes. (I give it a decent chance that it's fine, but since it was written without ever considering this possibility I'm not 100% sure.) -- --Guido van Rossum (python.org/~guido)
participants (8)
-
"Martin v. Löwis"
-
Antoine Pitrou
-
Benjamin Peterson
-
Guido van Rossum
-
James Y Knight
-
John Arbash Meinel
-
MRAB
-
Stefan Behnel