Extracting values from text file

Paul McGuire ptmcg at austin.rr._bogus_.com
Fri Jun 16 16:29:49 CEST 2006


"Preben Randhol" <randhol at bacchus.pvv.ntnu.no> wrote in message
news:slrne94rcg.2qnb.randhol at bacchus.pvv.ntnu.no...
> What I first though was if there was possible to make a filter such as:
>
>   Apples (apples)
>   (ducks) Ducks
>   (butter) g butter
>
> The data can be put in a hash table.
>
> Or maybe there are better ways? I generally want something that is
> flexible so one can easily make a filter settings if the text file
> format changes.
>

Here is a simple filter builder using pyparsing.  Pyparsing runs in two
passes: first, to parse your filter patterns; then to use the generated
grammar to parse some incoming source string.  Pyparsing comes with a
similar EBNF compiler, written by Seo Sanghyeon.  I'm sorry this is not
really a newbie example, but it does allow you to easily construct simple
filters, and the implementation will give you something to chew on... :)

Pyparsing wont be as fast as re's, but I cobbled this filter compiler
together in about 3/4 of an hour, and may serve as a decent prototype for a
more full-featured package.

-- Paul
Pyparsing's home Wiki is at http://pyparsing.wikispaces.com.


-----------------
from pyparsing import *

sourceText = """
  Apples 34
  56 Ducks

Some more text.

  0.5 g butter
"""

patterns = """\
    Apples (apples)
    (ducks:%) Ducks
    (butter:#) g butter"""

def compilePatternList(patternList, openTagChar="(", closeTagChar=")",
greedy=True):
    def compileType(s,l,t):
        return {
            "%" : Word(nums+"-",nums).setName("integer"),
            "#" :
Combine(Optional("-")+Word(nums)+"."+Optional(Word(nums))).setName("float"),
            "$" : Word(alphas).setName("alphabetic word"),
            "*" : Word(printables).setName("char-group")
        }[t[0]]
    backgroundWord = Word(alphanums).setParseAction(lambda
s,l,t:Literal(t[0]))
    matchType = Optional(Suppress(":") + oneOf("% # $
*"),default="*").setParseAction(compileType)
    matchPattern = Combine(openTagChar +
                           Word(alphas,alphanums).setResultsName("nam") +
                           matchType.setResultsName("typ") +
                           closeTagChar)
    matchPattern.setParseAction(lambda s,l,t:
(t.typ).setResultsName(t.nam) )
    patternGrammar = OneOrMore( backgroundWord |
matchPattern ).setParseAction(lambda s,l,t:And([expr for expr in t]))
    patterns = []
    for p in patternList:
        print p,
        pattExpr = patternGrammar.parseString(p)[0]
        print pattExpr
        patterns.append(pattExpr)
    altern = (greedy and Or or MatchFirst)
    return altern( patterns )

grammar = compilePatternList( patterns.split("\n") )
print grammar

allResults = ParseResults([])
for t,s,e in grammar.scanString(sourceText):
    print t
    allResults += t
print

print allResults.keys()
for k in allResults.keys():
    print k,allResults[k]

-----------------
Prints:
    Apples (apples) {"Apples" char-group}
    (ducks:%) Ducks {integer "Ducks"}
    (butter:#) g butter {float "g" "butter"}
{{"Apples" char-group} ^ {integer "Ducks"} ^ {float "g" "butter"}}
['Apples', '34']
['56', 'Ducks']
['0.5', 'g', 'butter']

['butter', 'apples', 'ducks']
butter 0.5
apples 34
ducks 56





More information about the Python-list mailing list