[Tutor] Picking up citations
Dinesh B Vadhia
dineshbvadhia at hotmail.com
Sun Feb 8 00:15:47 CET 2009
Kent
I've just thought that as an initial attempt, the last name (of the name before the v.) is sufficient ie. "Turner v. Fouche, 396 U.S. 346 (1970)" instead of "Lathe Turner v. Fouche, 396 U.S. 346 (1970)" as we are only using the citations internally and not displaying publicly. That solves the first name problem.
The remaining problem is picking up multiple pages in a citation ie.
"John Doggone Williams v. Florida, 399 U.S. 78, 90 S.Ct. 1893, 234, 26 L.Ed.2d 446 (1970)"
... and a variation of this is:
"John Doe Agency v. John Doe Corp., 493 U.S. 146, 159-60 (1934)"
I didn't know about pyparsing which appears to be very powerful and have joined their list. Thank-you for your help.
Dinesh
From: Kent Johnson
Sent: Saturday, February 07, 2009 1:19 PM
To: Dinesh B Vadhia
Cc: tutor at python.org
Subject: Re: [Tutor] Picking up citations
It turns out you can use Or expressions to cause a kind of
backtracking in Pyparsing. This is very close to what you want:
Name1 = Forward()
Name1 << Combine(Word(alphas) + Name1 | Word(alphas) + Suppress('v.'),
joinString=' ', adjacent=False).setResultsName('name1')
Name2 = Combine(OneOrMore(Word(alphas)), joinString=' ',
adjacent=False).setResultsName('name2')
Volume = Word(nums).setResultsName('volume')
Reporter = Word(alphas, alphanums+".").setResultsName('reporter')
Page = Word(nums).setResultsName('page')
Page2 = (',' + Word(nums)).setResultsName('page2')
VolumeCitation = (Volume + Reporter +
Page).setResultsName('volume_citation', listAllMatches=True)
VolumeCitations = Forward()
VolumeCitations << (
Combine(VolumeCitation + Page2, joinString=' ',
adjacent=False).setResultsName('volume_citation2')
+ Suppress(',') + VolumeCitations
| VolumeCitation + Suppress(',') + VolumeCitations
| Combine(VolumeCitation + Page2, joinString=' ',
adjacent=False).setResultsName('volume_citation2')
| VolumeCitation
)
Date = (Suppress('(') +
Combine(CharsNotIn(')')).setResultsName('date') + Suppress(')'))
FullCitation = Name1 + Name2 + Suppress(',') + VolumeCitations + Date
for item in FullCitation.scanString(text):
fc = item[0]
# Uncomment the following to see the raw parse results
# pp(fc)
# print
# print fc.name1
# print fc.name2
# for vc in fc.volume_citation:
# pp(vc)
# If name1 is multiple words it is enclosed in a ParseResults
name1 = fc.name1
if isinstance(name1, ParseResults):
name1 = name1[0]
for vc in fc.volume_citation:
print '%s v. %s, %s %s %s (%s)' % (name1, fc.name2, vc.volume,
vc.reporter, vc.page, fc.date)
for vc2 in fc.volume_citation2:
print '%s v. %s, %s (%s)' % (name1, fc.name2, vc2, fc.date)
print
Output:
Carter v. Jury Commission of Greene County, 396 U.S. 320 (1970)
Carter v. Jury Commission of Greene County, 90 S.Ct. 518 (1970)
Carter v. Jury Commission of Greene County, 24 L.Ed.2d 549 (1970)
Lathe Turner v. Fouche, 396 U.S. 346 (1970)
Lathe Turner v. Fouche, 90 S.Ct. 532 (1970)
Lathe Turner v. Fouche, 24 L.Ed.2d 567 (1970)
White v. Crook, 251 F.Supp. 401 (DCMD Ala.1966)
In John Doggone Williams v. Florida, 399 U.S. 78 (1970)
In John Doggone Williams v. Florida, 26 L.Ed.2d 446 (1970)
In John Doggone Williams v. Florida, 90 S.Ct. 1893 , 234 (1970)
It is correct except for the inclusion of "In" in the name and the
extra space before the comma separating the page numbers in the last
citation.
Don't ask me why I did this :-)
Kent
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20090207/4c6ce581/attachment-0001.htm>
More information about the Tutor
mailing list