[Patches] [ python-Patches-1366311 ] SRE engine do not release the GIL

SourceForge.net noreply at sourceforge.net
Mon Nov 27 14:03:32 CET 2006


Patches item #1366311, was opened at 2005-11-25 13:57
Message generated for change (Comment added) made by eric_noyau
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1366311&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
Resolution: Rejected
Priority: 5
Private: No
Submitted By: Eric Noyau (eric_noyau)
Assigned to: Fredrik Lundh (effbot)
Summary: SRE engine do not release the GIL

Initial Comment:
In a multi-threaded program that does lots of regular
expression searching, some of them on very long strings
with complex regex we've noticed that everything stops
when a regular expression is searching.

One of the issue is that the re engine does not release
the interpreter lock while it is running. All the
other threads are therefore blocked for the entire time
it takes to do the regular expression search.

See the thread in python-dev about it:

http://mail.python.org/pipermail/python-dev/2005-November/058316.html



----------------------------------------------------------------------

>Comment By: Eric Noyau (eric_noyau)
Date: 2006-11-27 13:03

Message:
Logged In: YES 
user_id=1388768
Originator: YES

Albeit I still think releasing the GIL during regex matching would be
beneficial, I agree with Martin that the patch is not good enough for that
purpose. I was not aware of the requirement to hold the GIL in order to do
memory allocation.

Anyway, since implementing this patch, we have reviewed our usage of regex
and supressed the really greedy ones. As such this patch is no longer
needed by us either. It would probably make our application a tiny
fractional bit faster, but not the order of magnitude faster than we
experienced before removing the big regexes.

In conclusion I thank Martin for the review as I've learned something new,
and instead of trying to do a more fine grained fix I'm closing this bug as
the current behaviour is good enough if you avoid using stupid regexes...





----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2006-11-25 15:13

Message:
Logged In: YES 
user_id=21627
Originator: NO

I believe the patch is incorrect. While matching, sre may allocate memory
through Python API, and it may raise exceptions through Python API.
Neither is allowed when the GIL is released

Tentatively rejecting the patch.

Eric, if you think the patch is correct or can be corrected, please update
it to the current subversion trunk.

----------------------------------------------------------------------

Comment By: Georg Brandl (birkenfeld)
Date: 2006-02-18 23:41

Message:
Logged In: YES 
user_id=1188172

Fredrik, do you have time to review this?

----------------------------------------------------------------------

Comment By: Eric Noyau (eric_noyau)
Date: 2005-11-28 14:11

Message:
Logged In: YES 
user_id=1388768

Thanks for your comments. I've updated the patch to fix your
issues, but without introducing a per-state object lock.

What I did instead is to mark a state as not supporting
concurrency when a scanner object creates it. So the GIL
will not be released for scanners objects at all.

For consistency match also release the GIL now, if possible.


----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2005-11-25 21:38

Message:
Logged In: YES 
user_id=4771

The patch looks good, but I wonder if it is safe.  The SRE_STATE structure
that SRE_SEARCH_INNER uses is potentially visible to the application-level
Python code, via the (undocumented) scanner objects:

>>> r = re.compile(r"hello")
>>> s = r.scanner("big string in which to search")
>>> s.search()
<_sre.SRE_Match object at 0x12345678>

Each call to s.search() continues the previous search with the same
SRE_STATE.  The problem with releasing the GIL as you do is that several
threads could call s.search() concurrently, which would most probably
crash CPython.

This probably means that you need to add a lock in SRE_STATE and acquire
it while searching, to serialize its usage.  Of course, we should then be
careful about what overhead this gives to applications that use regexps on
a lot of small strings...

Another note: for consistency, match() should also release the GIL if
search() does.

----------------------------------------------------------------------

Comment By: Eric Noyau (eric_noyau)
Date: 2005-11-25 14:02

Message:
Logged In: YES 
user_id=1388768

I'm attaching a diff to this bug that remove this limitation
if it sane to do so. If a search is done on a string or a
unicode object (which by definition are immutable) the GIL
is released and reacquired everytime a search is done.

I've tested this patch in both a simple tests (start a
thread with a greedy regex on a monstruous string and verify
that the othe python threads are still active) and by
running our internal application verifying that nothing is
blocking anymore.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1366311&group_id=5470


More information about the Patches mailing list