how to use pyparsing for identifiers that start with a constant string
Kent Johnson
kent37 at tds.net
Tue Jun 14 18:20:35 EDT 2005
phil_nospam_schmidt at yahoo.com wrote:
> I am scanning text that has identifiers with a constant prefix string
> followed by alphanumerics and underscores. I can't figure out, using
> pyparsing, how to match for this. The example expression below seems to
> be looking for whitespace between the 'atod' and the rest of the
> identifier.
>
> identifier_atod = 'atod' + pp.Word('_' + pp.alphanums)
>
> How can I get pyparsing to match 'atodkj45k' and 'atod_asdfaw', but not
> 'atgdkasdjfhlksj' and 'atod asdf4er', where the first four characters
> must be 'atod', and not followed by whitespace?
Here is one way using pyparsing.Combine:
>>> from pyparsing import *
>>> tests = [ 'atodkj45k', 'atod_asdfaw', 'atgdkasdjfhlksj', 'atod asdf4er']
>>> ident = Combine(Literal('atod') + Word('_' + alphanums))
>>> for t in tests:
... try:
... print ident.parseString(t)
... except:
... print 'No match', t
...
['atodkj45k']
['atod_asdfaw']
No match atgdkasdjfhlksj
No match atod asdf4er
>>>
Kent
More information about the Python-list
mailing list