[Python-ideas] Adding function checks to regex

Peter Otten __peter__ at web.de
Sat Mar 19 12:33:58 CET 2011


MRAB wrote:

> Some of those who are relative new to regexes sometimes ask how to write
> a regex which checks that a number is in a range or is a valid date.
> Although this may be possible, it certainly isn't easy.
> 
>  From what I've read, Perl has a way of including code in a regex, but I
> don't think that's a good idea
> 
> However, it occurs to me that there may be a case for being able to call
> a supplied function to perform such checking.
> 
> Borrowing some syntax from Perl, it could look like this:
> 
>      def range_check(m):
>          return 1 <= int(m.group()) <= 10
> 
>      numbers = regex.findall(r"\b\d+\b(*CALL)", text, call=range_check)
> 
> The regex module would match as normal until the "(*CALL)", at which
> point it would call the function. If the function returns True, the
> matching continues (and succeeds); if the function returns False, the
> matching backtracks (and fails).

I would approach that with

numbers = (int(m.group()) for m in re.finditer(r"\b\d+\b"))
numbers = [n for n in numbers if 1 <= n <= 10]

here. This is of similar complexity, but has the advantage that you can use 
the building blocks throughout your python scripts. Could you give an 
example where the benefits of the proposed syntax stand out more?

> The function would be passed a match object.
> 
> An extension, again borrowing the syntax from Perl, could include a tag
> like this:
> 
>      numbers = regex.findall(r"\b\d+\b(*CALL:RANGE)", text,
> call=range_check)
> 
> The tag would be passed to the function so that it could support
> multiple checks.

[brainstorm mode]
Could the same be achieved without new regex syntax? I'm thinking of reusing 
named groups:

re.findall(r"\b(?P<number>\d+)\b", text, 
           number=lambda s: 1 <= int(s) <= 10)

Peter




More information about the Python-ideas mailing list