pyparsing: match empty line
Marek Kubica
marek at xivilization.net
Tue Sep 2 12:38:10 EDT 2008
Hi,
I am trying to get this stuff working, but I still fail.
I have a format which consists of three elements:
\d{4}M?-\d (4 numbers, optional M, dash, another number)
EMPTY (the <EMPTY> token)
[Empty line] (the <PAGEBREAK> token. The line may contain whitespaces,
but nothing else)
While the ``watchname`` and ``leaveempty`` were trivial, I cannot get
``pagebreak`` to work properly.
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
from pyparsing import (Word, Literal, Optional, Group, OneOrMore, Regex,
Combine, ParserElement, nums, LineStart, LineEnd, White,
replaceWith)
ParserElement.setDefaultWhitespaceChars(' \t\r')
watchseries = Word(nums, exact=4)
watchrev = Word(nums, exact=1)
watchname = Combine(watchseries + Optional('M') + '-' + watchrev)
leaveempty = Literal('EMPTY')
def breaks(s, loc, tokens):
print repr(tokens[0])
#return ['<PAGEBREAK>' for token in tokens[0]]
return ['<PAGEBREAK>']
#pagebreak = Regex('^\s*$').setParseAction(breaks)
pagebreak = LineStart() + LineEnd().setParseAction(replaceWith
('<PAGEBREAK>'))
parser = OneOrMore(watchname ^ pagebreak ^ leaveempty)
tests = [
"2134M-2",
"""3245-3
3456M-5""",
"""3256-4
4563-4""",
"""4562M-6
EMPTY
3246-5"""
]
for test in tests:
print parser.parseString(test)
The output should be:
['2134M-2']
['3245-3', '3456M-5']
['3256-4', '<PAGEBREAK>' '4563-4']
['4562M-6', '<EMPTY>', '3246-5']
Thanks in advance!
regards,
Marek
More information about the Python-list
mailing list