[Python-ideas] Adding function checks to regex

Tal Einat taleinat at gmail.com
Tue Mar 29 00:58:40 CEST 2011


On Mon, Mar 28, 2011 at 9:34 PM, MRAB <python at mrabarnett.plus.com> wrote:

> On 28/03/2011 18:46, Daniel da Silva wrote:
>
>>
>>    I would approach that with
>>
>>    numbers = (int(m.group()) for m in re.finditer(r"\b\d+\b"))
>>    numbers = [n for n in numbers if 1 <= n <= 10]
>>
>>
>> To follow up on this: he has pointed out an existing way of doing
>> something that fully covers the goal of your addition. The current way
>> is both straightforward, elegant, and self-describes what it is doing, I
>> believe. I think if we have an obvious way to do it, we usually want to
>> be consistent with our normal attempt of having one obvious way to do it.
>>
>> If his way wasn't obvious, you may not be Dutch.
>>
>>  I was thinking about 2 possible uses:
>
> 1. Where you would have a regex in a configuration or setup file, or
> validation for a field, but with extra checks which are tricky or
> impossible in a regex, eg date ranges.
>
> 2. Where you want to perform a check during the matching, much in the
> way that you would use a lookahead or lookbehind.
>
> So far no-one has been able to come up with a convincing real world use
> case. Still, it's better to make a bad suggestion than not to make a
> good one. :-)


A regex-with-filter can be useful, but I don't think any changes to the
stdlib are necessary, a simple 3rd party module (or even just a cookbook
recipe) would suffice.

I've used something very similar (I rolled my own, a regexp wrapper with a
filter function). I was scraping blogs for links to user profiles on various
social sites. I wanted to hand-code how user-profile URLs looked for some
major sites, and ended up using regexps. I needed the filtering to deal with
various edge-cases. (There are better solutions but I needed something
quick!)

Writing the wrapper class and mimicking the re object API was easy. I like
the idea of allowing separate filters for different named groups. (This
would work especially well with the new regex module, which allows more than
99 groups and has better support for named groups.)

If there's interest I could clean up my code and publish it somewhere.

- Tal Einat
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20110329/68069314/attachment.html>


More information about the Python-ideas mailing list