matching multiple regexs to a single line...

Alex Martelli aleax at aleax.it
Thu Nov 21 06:20:37 EST 2002


maney at pobox.com wrote:
   ...
>>> This is one point on which I agree with Alexander: it seems to me to be
>>> the usual case that the regexp both identifies and parses the target.
>> 
>> It's one case, but it's very far from being the only one.
> 
> Right, and I didn't say it was.  I notice you aren't disagreeing that
> it's a very common style of use, 

No, I don't disagree it's common -- that why I mentioned it myself!

> and this both-identifying-and-parsing
> use has been at least implied all along.  It was explicitly described,
> though not explicitly shown in the pseudo-code snippets, back in the
> first post in this thread.

Guess I must have missed that, because I saw no "at least implied".
For such a crucial issue, making such an important difference, I find
myself hard put to believe that people would "just imply" it without
even showing it.  So maybe there was a lot more talking at cross-purposes
throughout the thread -- you yourself remarked that you were commenting
on a post you "quite possibly didn't read all the way", and that kind
of thing seems to insure there will be lots of misunderstanding.


>>> If it isn't being used in that dual mode, then the whole issue
>>> addressed here (and in at least two other threads during the past week)
>>> doesn't exist.
>> 
>> Why doesn't it?  Alexander's post to which I was replying had no
>> groups at all in the regex patterns
> 
> Not in the sample code, no.  There were groups in a lot of non-code
> exposition, and as I say, they've been implicitly and explicitly a
> major part of the motivation for this, at least IMO.  After all, if you
> don't have any groups, what do you need the match object for outside of
> the conditional test?  

The match object gives you more than just groups you put in the
pattern!  It's quite typical, for example, that all you need to
know is the exact substring that was matched:

>>> import re
>>> are=re.compile('ab+c')
>>> mo = are.match('abbbcccdd')
>>> mo.group(0)
'abbbc'
>>>

See?  No explicit groups in the re's pattern, yet pretty obvious
potential usefulness of the resulting match-object anyway.

So, if your question was meant to be rhetorical, it seems particularly
inappropriate to me.  If instead it was asked in earnest, to learn
crucial facts you didn't know about how re's and mo's work or about
how they're often used, then I think you might usefully have refrained
from criticizing what you suspected you didn't fully understand.  But
I do gather that trying to understand a subject before criticising
is quite an obsolete approach these days.

> And then the motivating problem, or at least
> what I have seen as the motivating problem, is the dual use of the
> result of the regexp's application.

You can have some level of dual use without _necessarily_ having any
group in the res' patterns.  Of course you may often be interested
in having groups, but your repeated attempts to imply that it would
be _necessarily_ so seem quite misplaced to me.

> The elephant seems very like a pillar on this side...  :-)

That can be a particularly dangerous partial-perception, should
the pachyderm suddenly decide to shuffle its feet.


I have already outlined quite briefly what I think could be one
interesting approach should one need to pass on a match object
that 'hides' the join-all-with-| approach to matching multiple re
patterns in one gulp -- synthesize a suitable object polymorphic
to a real matchobject.  A completely different tack, simpler though
needing some time measurement to check its worth, is to do TWO
matches (still better than doing N one after the other...) --
one vs the patterns joined into one by |, just to identify which
of them is the first (if any) to match; then a second versus the
specific "first matching pattern" only, just to build the match
object one needs.  At this point I'm not highly motivate to spend
more time and energy trying to help out with this, so I think I'll
leave it at this "suggestive and somewhat helpful handwaving" level
unless some other reader, perceptive enough to SEE how vastly
superior the join-all-patterns approach is, but needing to get
the specific match-object too, should express interest.  As far
as I'm concerned, people who still can't see it probably don't want
to, and so they're welcome to wear out their CPUs looping uselessly
to their hearts' contents.


Alex




More information about the Python-list mailing list