[Tutor] Picking up citations
Kent Johnson
kent37 at tds.net
Sat Feb 7 22:19:14 CET 2009
It turns out you can use Or expressions to cause a kind of
backtracking in Pyparsing. This is very close to what you want:
Name1 = Forward()
Name1 << Combine(Word(alphas) + Name1 | Word(alphas) + Suppress('v.'),
joinString=' ', adjacent=False).setResultsName('name1')
Name2 = Combine(OneOrMore(Word(alphas)), joinString=' ',
adjacent=False).setResultsName('name2')
Volume = Word(nums).setResultsName('volume')
Reporter = Word(alphas, alphanums+".").setResultsName('reporter')
Page = Word(nums).setResultsName('page')
Page2 = (',' + Word(nums)).setResultsName('page2')
VolumeCitation = (Volume + Reporter +
Page).setResultsName('volume_citation', listAllMatches=True)
VolumeCitations = Forward()
VolumeCitations << (
Combine(VolumeCitation + Page2, joinString=' ',
adjacent=False).setResultsName('volume_citation2')
+ Suppress(',') + VolumeCitations
| VolumeCitation + Suppress(',') + VolumeCitations
| Combine(VolumeCitation + Page2, joinString=' ',
adjacent=False).setResultsName('volume_citation2')
| VolumeCitation
)
Date = (Suppress('(') +
Combine(CharsNotIn(')')).setResultsName('date') + Suppress(')'))
FullCitation = Name1 + Name2 + Suppress(',') + VolumeCitations + Date
for item in FullCitation.scanString(text):
fc = item[0]
# Uncomment the following to see the raw parse results
# pp(fc)
# print
# print fc.name1
# print fc.name2
# for vc in fc.volume_citation:
# pp(vc)
# If name1 is multiple words it is enclosed in a ParseResults
name1 = fc.name1
if isinstance(name1, ParseResults):
name1 = name1[0]
for vc in fc.volume_citation:
print '%s v. %s, %s %s %s (%s)' % (name1, fc.name2, vc.volume,
vc.reporter, vc.page, fc.date)
for vc2 in fc.volume_citation2:
print '%s v. %s, %s (%s)' % (name1, fc.name2, vc2, fc.date)
print
Output:
Carter v. Jury Commission of Greene County, 396 U.S. 320 (1970)
Carter v. Jury Commission of Greene County, 90 S.Ct. 518 (1970)
Carter v. Jury Commission of Greene County, 24 L.Ed.2d 549 (1970)
Lathe Turner v. Fouche, 396 U.S. 346 (1970)
Lathe Turner v. Fouche, 90 S.Ct. 532 (1970)
Lathe Turner v. Fouche, 24 L.Ed.2d 567 (1970)
White v. Crook, 251 F.Supp. 401 (DCMD Ala.1966)
In John Doggone Williams v. Florida, 399 U.S. 78 (1970)
In John Doggone Williams v. Florida, 26 L.Ed.2d 446 (1970)
In John Doggone Williams v. Florida, 90 S.Ct. 1893 , 234 (1970)
It is correct except for the inclusion of "In" in the name and the
extra space before the comma separating the page numbers in the last
citation.
Don't ask me why I did this :-)
Kent
More information about the Tutor
mailing list