
Some of those who are relative new to regexes sometimes ask how to write a regex which checks that a number is in a range or is a valid date. Although this may be possible, it certainly isn't easy. From what I've read, Perl has a way of including code in a regex, but I don't think that's a good idea However, it occurs to me that there may be a case for being able to call a supplied function to perform such checking. Borrowing some syntax from Perl, it could look like this: def range_check(m): return 1 <= int(m.group()) <= 10 numbers = regex.findall(r"\b\d+\b(*CALL)", text, call=range_check) The regex module would match as normal until the "(*CALL)", at which point it would call the function. If the function returns True, the matching continues (and succeeds); if the function returns False, the matching backtracks (and fails). The function would be passed a match object. An extension, again borrowing the syntax from Perl, could include a tag like this: numbers = regex.findall(r"\b\d+\b(*CALL:RANGE)", text, call=range_check) The tag would be passed to the function so that it could support multiple checks. Alternatively, a tag could always be passed; if no tag is provided then None would be passed instead. There's also the additional possibility of providing a dict of functions instead and using the tag to select the function which should be called. I'd be interested in your opinions.

MRAB wrote:
I would approach that with numbers = (int(m.group()) for m in re.finditer(r"\b\d+\b")) numbers = [n for n in numbers if 1 <= n <= 10] here. This is of similar complexity, but has the advantage that you can use the building blocks throughout your python scripts. Could you give an example where the benefits of the proposed syntax stand out more?
[brainstorm mode] Could the same be achieved without new regex syntax? I'm thinking of reusing named groups: re.findall(r"\b(?P<number>\d+)\b", text, number=lambda s: 1 <= int(s) <= 10) Peter

On 19/03/2011 11:33, Peter Otten wrote:
There may be a use case in config files where you define rules (for example, Apache <FilesMatch>) or web forms where you have validation, but a regex is too limited. This would enable you to add 'richer' checking. There could be a predefined set of checks, such as whether a date is valid.
I'm not sure about that.

I am -1 on the whole idea. However, for the sake of argument, I'll say that if it was done I would not bind the callbacks at match time. Instead, they would be part of the compiled regex objects. r = re.compile(r"foo:(?C<check_bounds>\d+)", check_bounds=lambda d: 1 <= int(d) <= 100) and then r could be used like any other regex, and you don't need to know about the callbacks when actually using it, just to build it. On Sat, Mar 19, 2011 at 12:19 PM, MRAB <python@mrabarnett.plus.com> wrote:
-- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy

On Sat, Mar 19, 2011 at 4:33 AM, Peter Otten <__peter__@web.de> wrote:
I like this alternative. (1) the function can simply operate on a string rather than a regex object. (2) it makes the function optional, enabling verification and testing of the regex to be separated from testing the function. (3) it would make it easier to port code that uses this to other languages and perhaps make it more likely to be adopted by other languages. On Sat, Mar 19, 2011 at 9:35 AM, Calvin Spealman <ironfroggy@gmail.com>wrote:
I'd want to understand likely use cases before deciding on early/late binding of the callbacks. And I'm not sure the expressive power of this is worth the effort. --- Bruce New Puzzazz newsletter: http://j.mp/puzzazz-news-2011-03 Make your web app more secure: http://j.mp/gruyere-security

To follow up on this: he has pointed out an existing way of doing something that fully covers the goal of your addition. The current way is both straightforward, elegant, and self-describes what it is doing, I believe. I think if we have an obvious way to do it, we usually want to be consistent with our normal attempt of having one obvious way to do it. If his way wasn't obvious, you may not be Dutch.

On 28/03/2011 18:46, Daniel da Silva wrote:
I was thinking about 2 possible uses: 1. Where you would have a regex in a configuration or setup file, or validation for a field, but with extra checks which are tricky or impossible in a regex, eg date ranges. 2. Where you want to perform a check during the matching, much in the way that you would use a lookahead or lookbehind. So far no-one has been able to come up with a convincing real world use case. Still, it's better to make a bad suggestion than not to make a good one. :-)

On Mon, Mar 28, 2011 at 9:34 PM, MRAB <python@mrabarnett.plus.com> wrote:
A regex-with-filter can be useful, but I don't think any changes to the stdlib are necessary, a simple 3rd party module (or even just a cookbook recipe) would suffice. I've used something very similar (I rolled my own, a regexp wrapper with a filter function). I was scraping blogs for links to user profiles on various social sites. I wanted to hand-code how user-profile URLs looked for some major sites, and ended up using regexps. I needed the filtering to deal with various edge-cases. (There are better solutions but I needed something quick!) Writing the wrapper class and mimicking the re object API was easy. I like the idea of allowing separate filters for different named groups. (This would work especially well with the new regex module, which allows more than 99 groups and has better support for named groups.) If there's interest I could clean up my code and publish it somewhere. - Tal Einat

On Sat, 19 Mar 2011 03:25:57 +0000 MRAB <python@mrabarnett.plus.com> wrote:
However, it occurs to me that there may be a case for being able to call a supplied function to perform such checking.
What would be such a case? Adding more complications to the regex syntax and semantics is something most of us would frown upon, IMHO. *Especially* if it involves mixing in arbitrary Python callbacks referenced by name in the regex... Regards Antoine.

MRAB wrote:
I would approach that with numbers = (int(m.group()) for m in re.finditer(r"\b\d+\b")) numbers = [n for n in numbers if 1 <= n <= 10] here. This is of similar complexity, but has the advantage that you can use the building blocks throughout your python scripts. Could you give an example where the benefits of the proposed syntax stand out more?
[brainstorm mode] Could the same be achieved without new regex syntax? I'm thinking of reusing named groups: re.findall(r"\b(?P<number>\d+)\b", text, number=lambda s: 1 <= int(s) <= 10) Peter

On 19/03/2011 11:33, Peter Otten wrote:
There may be a use case in config files where you define rules (for example, Apache <FilesMatch>) or web forms where you have validation, but a regex is too limited. This would enable you to add 'richer' checking. There could be a predefined set of checks, such as whether a date is valid.
I'm not sure about that.

I am -1 on the whole idea. However, for the sake of argument, I'll say that if it was done I would not bind the callbacks at match time. Instead, they would be part of the compiled regex objects. r = re.compile(r"foo:(?C<check_bounds>\d+)", check_bounds=lambda d: 1 <= int(d) <= 100) and then r could be used like any other regex, and you don't need to know about the callbacks when actually using it, just to build it. On Sat, Mar 19, 2011 at 12:19 PM, MRAB <python@mrabarnett.plus.com> wrote:
-- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy

On Sat, Mar 19, 2011 at 4:33 AM, Peter Otten <__peter__@web.de> wrote:
I like this alternative. (1) the function can simply operate on a string rather than a regex object. (2) it makes the function optional, enabling verification and testing of the regex to be separated from testing the function. (3) it would make it easier to port code that uses this to other languages and perhaps make it more likely to be adopted by other languages. On Sat, Mar 19, 2011 at 9:35 AM, Calvin Spealman <ironfroggy@gmail.com>wrote:
I'd want to understand likely use cases before deciding on early/late binding of the callbacks. And I'm not sure the expressive power of this is worth the effort. --- Bruce New Puzzazz newsletter: http://j.mp/puzzazz-news-2011-03 Make your web app more secure: http://j.mp/gruyere-security

To follow up on this: he has pointed out an existing way of doing something that fully covers the goal of your addition. The current way is both straightforward, elegant, and self-describes what it is doing, I believe. I think if we have an obvious way to do it, we usually want to be consistent with our normal attempt of having one obvious way to do it. If his way wasn't obvious, you may not be Dutch.

On 28/03/2011 18:46, Daniel da Silva wrote:
I was thinking about 2 possible uses: 1. Where you would have a regex in a configuration or setup file, or validation for a field, but with extra checks which are tricky or impossible in a regex, eg date ranges. 2. Where you want to perform a check during the matching, much in the way that you would use a lookahead or lookbehind. So far no-one has been able to come up with a convincing real world use case. Still, it's better to make a bad suggestion than not to make a good one. :-)

On Mon, Mar 28, 2011 at 9:34 PM, MRAB <python@mrabarnett.plus.com> wrote:
A regex-with-filter can be useful, but I don't think any changes to the stdlib are necessary, a simple 3rd party module (or even just a cookbook recipe) would suffice. I've used something very similar (I rolled my own, a regexp wrapper with a filter function). I was scraping blogs for links to user profiles on various social sites. I wanted to hand-code how user-profile URLs looked for some major sites, and ended up using regexps. I needed the filtering to deal with various edge-cases. (There are better solutions but I needed something quick!) Writing the wrapper class and mimicking the re object API was easy. I like the idea of allowing separate filters for different named groups. (This would work especially well with the new regex module, which allows more than 99 groups and has better support for named groups.) If there's interest I could clean up my code and publish it somewhere. - Tal Einat

On Sat, 19 Mar 2011 03:25:57 +0000 MRAB <python@mrabarnett.plus.com> wrote:
However, it occurs to me that there may be a case for being able to call a supplied function to perform such checking.
What would be such a case? Adding more complications to the regex syntax and semantics is something most of us would frown upon, IMHO. *Especially* if it involves mixing in arbitrary Python callbacks referenced by name in the regex... Regards Antoine.
participants (9)
-
Antoine Pitrou
-
Bruce Leban
-
Calvin Spealman
-
Daniel da Silva
-
Masklinn
-
MRAB
-
Peter Otten
-
Raymond Hettinger
-
Tal Einat