pyparsing Combine without merging sub-expressions
Paul McGuire
ptmcg at austin.rr.com
Sun Jan 21 17:15:33 EST 2007
Steven Bethard wrote:
> Within a larger pyparsing grammar, I have something that looks like::
>
> wsj/00/wsj_0003.mrg
>
> When parsing this, I'd like to keep around both the full string, and the
> AAA_NNNN substring of it, so I'd like something like::
>
> >>> foo.parseString('wsj/00/wsj_0003.mrg')
> (['wsj/00/wsj_0003.mrg', 'wsj_0003'], {})
>
> How do I go about this? I was using something like::
>
> >>> digits = pp.Word(pp.nums)
> >>> alphas = pp.Word(pp.alphas)
> >>> wsj_name = pp.Combine(alphas + '_' + digits)
> >>> wsj_path = pp.Combine(alphas + '/' + digits + '/' + wsj_name +
> ... '.mrg')
>
> But of course then all I get back is the full path::
>
> >>> wsj_path.parseString('wsj/00/wsj_0003.mrg')
> (['wsj/00/wsj_0003.mrg'], {})
>
The tokens are what the tokens are, so if you want to replicate a
sub-field, then you'll need a parse action to insert it into the
returned tokens. BUT, if all you want is to be able to easily *access*
that sub-field, then why not give it a results name? Like this:
wsj_name = pp.Combine(alphas + '_' + digits).setResultsName("name")
Leave everything else the same, but now you can access the name field
independently from the rest of the combined tokens.
result = wsj_path.parseString('wsj/00/wsj_0003.mrg')
print result.dump()
print result.name
print result.asList()
-- Paul
More information about the Python-list
mailing list