[Tutor] Picking up citations
kent37 at tds.net
Sat Feb 7 18:43:05 CET 2009
On Sat, Feb 7, 2009 at 11:53 AM, Dinesh B Vadhia
<dineshbvadhia at hotmail.com> wrote:
> Wow Kent, what a great start!
> I found this
> http://mail.python.org/pipermail/python-list/2006-April/376149.html which
> lays out some patterns of legal citations ie.
Here is another good reference:
> 1. Two names, consisting of one or more words, separated by a "v."
> 2. One, two, or three citations, each of which has a volume number ("90")
> followed by a Reporter name ("U.S." or "S.Ct." or "L.Ed."), which consists
> of one or two words always ending with a ".", followed by a page number
According to the reference I cite above, the Reporter name does not
have to include periods, his examples include US and Tenn as
> 3. Each citation may contain a comma and a second page number (", 234 ")
> 4. Optionally, a parenthesized year ("(1970)") or optional information in
> parentheses ("(DCMD Ala.1966)")
> 5. An ending "."
Or comma; this seems to be a grammatical element of the enclosing
sentence rather than part of the citation.
> I was pondering the same issue about names ie. how do you know that "Page
> 500" is not part of "Carter". My thought was to start from the "v.", step
> backwards a word at a time, assume that the first name is valid, for all
> subsequent words check if the last character of a word contained the digits
> [0-9] or these punctuation marks [.,:;], if so, then it was unlikely to be
> part of the name.
That won't help with "In John Doggone Williams". I can imagine a name
with punctuation, also, e.g. St. John's Lumber.
> I've changed the sample text to include examples of multiple page
Actually you had one already.
> Okay, I'd better get to grips with pyparsing!
Pyparsing won't backtrack which may be a disadvantage here in parsing
the extra page numbers. Here is some comment:
More information about the Tutor