newbie re question
Fredrik Lundh
fredrik at pythonware.com
Wed Nov 6 16:17:12 EST 2002
Gonçalo Rodrigues wrote:
> I've been trying to grok re's and settled myself a little exercise:
> concoct a re for a Python identifier.
>
> Now what I got is
>
> >>> pattern = re.compile(r'(\s|^)([\w_][\w\._]*)(\s|$)')
> >>> pattern.findall('aadf cdase b ad:aa aasa a.aa a@ aa _aa _aafr@ aa_aa aa__a?jk')
> [('', 'aadf', ' '), (' ', 'b', ' '), (' ', 'aasa', ' '), (' ', 'aa', '> '), (' ', 'aa_aa', ' ')]
>
> But as you can see from the results, not all valid identifiers get
> caught. For example, why isn't 'cdase' caught?
findall returns non-overlapping matches. there's only a single space
between "aadf" and "cdase", and that was used by the first match.
here's a better pattern:
pattern = re.compile(r'\b([a-zA-Z_]\w*)\b')
</F>
More information about the Python-list
mailing list