Time out a regular expression in Python 2.6.4?

MRAB python at mrabarnett.plus.com
Mon Feb 15 20:37:33 EST 2010


Steve Holden wrote:
> Jonathan Gardner wrote:
>> On Feb 15, 7:59 am, Steve Holden <st... at holdenweb.com> wrote:
>>> pyt... at bdurham.com wrote:
>>>> Is there any way to time out a regular expression in Python
>>>> 2.6.4? Motiviation: Our application allows users to enter
>>>> regular expressions as validation criteria. If a user enters a
>>>> pathological regular expression, we would like to timeout the
>>>> evaluation of this expression after a short period of time.
>>> Python itself does not contain any mechanism to terminate an
>>> operation if it takes too much time.
>>> 
>>> One approach would be to run the regex in a subprocess, and apply
>>> process limits to terminate that subprocess if it ran too long.
>>> 
>>> This group being what it is you are likely to receive other,
>>> better suggestions too.
>>> 
>> I'm not sure how exactly the re module is implemented, but since I 
>> assume a great chunk is in C code, you may get away with a single 
>> process and multiple threads. One thread will watch the process, or
>> have a timer event set to go off at a certain point. The other
>> will actually run the regex and get killed by the timer process if
>> it doesn't finish in time.
> 
> That would be a great idea if it were possible to kill a thread form 
> outside. Unfortunately it's not, so the best you can do is set a flag
> and have it queried periodically. This is not practical during re
> matching.
> 
The code for matching in the re module is written in C, and it doesn't
release the GIL because it calls the Python API, and you need to have
the GIL when doing that (unless you can guarantee that the specific call
is safe, that is!).

This means that other threads can't run during matching.

In order to be able to cancel the matching, the re module would have to
release the GIL when possible and have some kind of cancel() method
(belonging to which class?).

A simpler option would be to add a timeout argument. It already
periodically checks for ctrl-C, so perhaps the time check could be done
then.



More information about the Python-list mailing list