newbie re question

Gonçalo Rodrigues op73418 at mail.telepac.pt
Wed Nov 6 16:52:45 EST 2002


On Wed, 06 Nov 2002 21:17:12 GMT, "Fredrik Lundh"
<fredrik at pythonware.com> wrote:

>Gonçalo Rodrigues wrote:
>
>> I've been trying to grok re's and settled myself a little exercise:
>> concoct a re for a Python identifier.
>>
>> Now what I got is
>>
>> >>> pattern = re.compile(r'(\s|^)([\w_][\w\._]*)(\s|$)')
>> >>> pattern.findall('aadf cdase b ad:aa aasa a.aa a@ aa _aa _aafr@ aa_aa aa__a?jk')
>> [('', 'aadf', ' '), (' ', 'b', ' '), (' ', 'aasa', ' '), (' ', 'aa', '> '), (' ', 'aa_aa', ' ')]
>>
>> But as you can see from the results, not all valid identifiers get
>> caught. For example, why isn't 'cdase' caught?
>
>findall returns non-overlapping matches.  there's only a single space
>between "aadf" and "cdase", and that was used by the first match.

Typical newbie error - forgot that it consumed the next char. And I do
want all the non overlapping matches.

>
>here's a better pattern:
>
>    pattern = re.compile(r'\b([a-zA-Z_]\w*)\b')
>
></F>
>

OK, I revamped a little your pattern but know I get too many matches,
e.g.

>>> pattern = re.compile(r'\b([a-zA-Z_][a-zA-Z_\.]*)\b')
>>> pattern.findall('aadf cdase b ad:aa aasa a.aa a@ aa _aa _aafr@ aa_aa aa__a?jk')
['aadf', 'cdase', 'b', 'ad', 'aa', 'aasa', 'a.aa', 'a', 'aa', '_aa',
'_aafr', 'aa_aa', 'aa__a', 'jk']

I want the re to reject 'ad:aa', 'a@', etc. which are not valid
identifiers. In the first case it returned 'ad' and 'aa' because \b
matched '@' and I do *not* want that.

I also tried using a not-match-lookahead as in

>>> pattern = re.compile(r'([a-zA-Z_][a-zA-Z_\.]*)(?![^a-zA-Z_\.\s])')
>>> pattern.findall('aadf cdase b ad:aa aasa a.aa a@ aa _aa _aafr@ aa_aa aa__a?jk')
['aadf', 'cdase', 'b', 'a', 'aa', 'aasa', 'a.aa', 'aa', '_aa', '_aaf',
'aa_aa', 'aa__', 'jk']

But as you see, it does not work either. In 'ad:aa' returns the matches
'a' and 'aa' - and I understand why it does - it just keeps backtracking
until you get match.

I'm beaten. Can any1 help me out here?

TIA,
Gonçalo Rodrigues




More information about the Python-list mailing list