[Tutor] Picking up citations

Kent Johnson kent37 at tds.net
Sat Feb 7 22:19:14 CET 2009


It turns out you can use Or expressions to cause a kind of
backtracking in Pyparsing. This is very close to what you want:

Name1 = Forward()
Name1 << Combine(Word(alphas) + Name1 | Word(alphas) + Suppress('v.'),
joinString=' ', adjacent=False).setResultsName('name1')
Name2 = Combine(OneOrMore(Word(alphas)), joinString=' ',
adjacent=False).setResultsName('name2')

Volume = Word(nums).setResultsName('volume')
Reporter = Word(alphas, alphanums+".").setResultsName('reporter')
Page = Word(nums).setResultsName('page')
Page2 = (',' + Word(nums)).setResultsName('page2')

VolumeCitation = (Volume + Reporter +
Page).setResultsName('volume_citation', listAllMatches=True)
VolumeCitations = Forward()
VolumeCitations << (
      Combine(VolumeCitation  + Page2, joinString=' ',
adjacent=False).setResultsName('volume_citation2')
        + Suppress(',') + VolumeCitations
    | VolumeCitation + Suppress(',') + VolumeCitations
    | Combine(VolumeCitation  + Page2, joinString=' ',
adjacent=False).setResultsName('volume_citation2')
    | VolumeCitation
)

Date = (Suppress('(') +
Combine(CharsNotIn(')')).setResultsName('date') + Suppress(')'))

FullCitation = Name1 + Name2 + Suppress(',') + VolumeCitations + Date

for item in FullCitation.scanString(text):
    fc = item[0]
    # Uncomment the following to see the raw parse results
    # pp(fc)
    # print
    # print fc.name1
    # print fc.name2
    # for vc in fc.volume_citation:
    #     pp(vc)

    # If name1 is multiple words it is enclosed in a ParseResults
    name1 = fc.name1
    if isinstance(name1, ParseResults):
        name1 = name1[0]

    for vc in fc.volume_citation:
        print '%s v. %s, %s %s %s (%s)' % (name1, fc.name2, vc.volume,
vc.reporter, vc.page, fc.date)

    for vc2 in fc.volume_citation2:
        print '%s v. %s, %s (%s)' % (name1, fc.name2, vc2, fc.date)
    print


Output:

Carter v. Jury Commission of Greene County, 396 U.S. 320 (1970)
Carter v. Jury Commission of Greene County, 90 S.Ct. 518 (1970)
Carter v. Jury Commission of Greene County, 24 L.Ed.2d 549 (1970)

Lathe Turner v. Fouche, 396 U.S. 346 (1970)
Lathe Turner v. Fouche, 90 S.Ct. 532 (1970)
Lathe Turner v. Fouche, 24 L.Ed.2d 567 (1970)

White v. Crook, 251 F.Supp. 401 (DCMD Ala.1966)

In John Doggone Williams v. Florida, 399 U.S. 78 (1970)
In John Doggone Williams v. Florida, 26 L.Ed.2d 446 (1970)
In John Doggone Williams v. Florida, 90 S.Ct. 1893 , 234 (1970)


It is correct except for the inclusion of "In" in the name and the
extra space before the comma separating the page numbers in the last
citation.

Don't ask me why I did this :-)
Kent


More information about the Tutor mailing list