Problem using Optional pyparsing
Peter Otten
__peter__ at web.de
Thu Aug 16 03:57:02 EDT 2007
Nathan Harmston wrote:
> I know this isnt the pyparsing list, but it doesnt seem like there is
> one. I m trying to use pyparsing to parse a file however I cant get
> the Optional keyword to work. My file generally looks like this:
>
> ALIGNMENT 1020 YS2-10a02.q1k chr09 1295 42 141045
> 142297 C 1254 95.06 1295 reject_bad_break 0
>
> or this:
>
> ALIGNMENT 36 YS2-10a08.q1k chrm 208 165 10745
> 10788 C 44 95.45 593 reject_low 10,14
>
> and my grammar work well for these lines, however somethings the row looks
like:
> ALIGNMENT 53 YS2-10b03.p1k chr12 180 125 1067465
> 1067520 C 56 98.21 532|5,2 reject_low 25
>
> So I try to parse the 532 using
>
> from pyparsing import *
>
> integer = Word( nums )
> float = Word( nums+".")
> identifier = Word( alphanums+"-_." )
>
> alignment = Literal("ALIGNMENT ").suppress()
> row_1 = integer.setResultsName("row_1")#.setParseAction(make_int)
> src_id = identifier.setResultsName("src_id")
> dest_id = identifier.setResultsName("dest_id")
> src_start = integer.setResultsName("src_start")#.setParseAction(make_int)
> src_stop = integer.setResultsName("src_stop")#.setParseAction(make_int)
> dest_start =
integer.setResultsName("dest_start")#.setParseAction(make_int)
> dest_stop = integer.setResultsName("dest_stop")#.setParseAction(make_int)
> row_8 = oneOf("F C").setResultsName("row_8")
> length = integer.setResultsName("length")#.setParseAction(make_int)
> percent_id =
float.setResultsName("percent_id")#.setParseAction(make_float)
> row_11 = integer + Optional(Literal("|") + commaSeparatedList )
> )#.setResultsName("row_11")#.setParseAction(make_int)
> result = Word(alphas+"_").setResultsName("result")
> row_13 = commaSeparatedList.setResultsName("row_13")
>
> def make_alilines_status_parser():
> return alignment + row_1 + src_id + dest_id + src_start + src_stop
> + dest_start + dest_stop + row_8 + length + percent_id + row_11 +
> result + row_13
>
> def parse_alilines_status(ifile):
> alilines = make_alilines_status_parser()
> for l in ifile:
> yield alilines.parseString( l )
>
> However my parser always fails on lines of type 3. Does anyone know
> why the Optional part is not working.
The commaSeparatedList includes the rest of the line into its last item:
>>> commaSeparatedList.parseString("a,b c")
(['a', 'b c'], {})
You can fix this by defining your own delimitedList that doesnt accept
whitespace, e. g.:
>>> delimitedList(Word(alphanums)).parseString("a,b c")
(['a', 'b'], {})
Peter
More information about the Python-list
mailing list