[Tutor] Parsing Bible verses

John Fouhy john at fouhy.net
Fri May 22 03:03:28 CEST 2009


2009/5/22 Eduardo Vieira <eduardo.susan at gmail.com>:
> I will be looking for lines like these:
> Lesson Text: Acts 5:15-20, 25; 10:12; John 3:16; Psalm 23
>
> So, references in different chapters are separated by a semicolon. My
> main challenge would be make the program guess that 10:12 refers to
> the previous book. 15-20 means verses 15 thru 20 inclusive. I'm afraid
> that will take more than Regex and I never studied anything about
> parser tools, really.

Well, pyparsing is one of the standard python parsing modules.  It's
not that bad, really :-)

Here's some code I knocked out:

from pyparsing import *

SingleVerse = Word(nums)
VerseRange = SingleVerse + '-' + SingleVerse
Verse = VerseRange | SingleVerse
Verse = Verse.setResultsName('Verse').setName('Verse')
Verses = Verse + ZeroOrMore(Suppress(',') + Verse)
Verses = Verses.setResultsName('Verses').setName('Verses')

ChapterNum = Word(nums)
ChapterNum = ChapterNum.setResultsName('Chapter').setName('Chapter')
ChapVerses = ChapterNum + ':' + Verses
SingleChapter = Group(ChapVerses | ChapterNum)

Chapters = SingleChapter + ZeroOrMore(Suppress(';') + SingleChapter)
Chapters = Chapters.setResultsName('Chapters').setName('Chapters')

BookName = CaselessLiteral('Acts') | CaselessLiteral('Psalm') |
CaselessLiteral('John')
BookName = BookName.setResultsName('Book').setName('Book')

Book = Group(BookName + Chapters)
Books = Book + ZeroOrMore(Suppress(';') + Book)
Books = Books.setResultsName('Books').setName('Books')

All = CaselessLiteral('Lesson Text:') + Books + LineEnd()

s = 'Lesson Text: Acts 5:15-20, 25; 10:12; John 3:16; Psalm 23'
res = All.parseString(s)

for b in res.Books:
    for c in b.Chapters:
        if c.Verses:
            for v in c.Verses:
                print 'Book', b[0], 'Chapter', c[0], 'Verse', v
        else:
            print 'Book', b[0], 'Chapter', c[0]

######

Hopefully you can get the idea of most of it from looking at the code.

Suppress() means "parse this token, but don't include it in the results".

Group() is necessary for getting access to a list of things -- you can
experiment by taking it out and seeing what you get.

Obviously you'll need to add more names to the BookName element.

Obviously also, there is a bit more work to be done on Verses.  You
might want to look into the concept of "parse actions".  A really
simple parse action might be this:

def convertToNumber(string_, location, tokens):
    """ Used in setParseAction to make numeric parsers return numbers. """

    return [int(tokens[0])]

SingleVerse.setParseAction(convertToNumber)
ChapterNum.setParseAction(convertToNumber)

That should get you python integers instead of strings.  You can
probably do more with parseActions to, for instance, turn something
like '15-20' into [15,16,17,18,19,20].

HTH!

-- 
John.


More information about the Tutor mailing list