[Python-ideas] Adding function checks to regex
python at mrabarnett.plus.com
Sat Mar 19 17:19:30 CET 2011
On 19/03/2011 11:33, Peter Otten wrote:
> MRAB wrote:
>> Some of those who are relative new to regexes sometimes ask how to write
>> a regex which checks that a number is in a range or is a valid date.
>> Although this may be possible, it certainly isn't easy.
>> From what I've read, Perl has a way of including code in a regex, but I
>> don't think that's a good idea
>> However, it occurs to me that there may be a case for being able to call
>> a supplied function to perform such checking.
>> Borrowing some syntax from Perl, it could look like this:
>> def range_check(m):
>> return 1<= int(m.group())<= 10
>> numbers = regex.findall(r"\b\d+\b(*CALL)", text, call=range_check)
>> The regex module would match as normal until the "(*CALL)", at which
>> point it would call the function. If the function returns True, the
>> matching continues (and succeeds); if the function returns False, the
>> matching backtracks (and fails).
> I would approach that with
> numbers = (int(m.group()) for m in re.finditer(r"\b\d+\b"))
> numbers = [n for n in numbers if 1<= n<= 10]
> here. This is of similar complexity, but has the advantage that you can use
> the building blocks throughout your python scripts. Could you give an
> example where the benefits of the proposed syntax stand out more?
There may be a use case in config files where you define rules (for
example, Apache <FilesMatch>) or web forms where you have validation,
but a regex is too limited. This would enable you to add 'richer'
checking. There could be a predefined set of checks, such as whether a
date is valid.
>> The function would be passed a match object.
>> An extension, again borrowing the syntax from Perl, could include a tag
>> like this:
>> numbers = regex.findall(r"\b\d+\b(*CALL:RANGE)", text,
>> The tag would be passed to the function so that it could support
>> multiple checks.
> [brainstorm mode]
> Could the same be achieved without new regex syntax? I'm thinking of reusing
> named groups:
> re.findall(r"\b(?P<number>\d+)\b", text,
> number=lambda s: 1<= int(s)<= 10)
I'm not sure about that.
More information about the Python-ideas