[Tutor] Picking up citations

Dinesh B Vadhia dineshbvadhia at hotmail.com
Tue Feb 10 18:42:16 CET 2009


Kent

The citation without the name is perfect (and this appears to be how most citation parsers work).  There are two issues in the test run:

1.  The parallel citation 422 U.S. 490, 499 n. 10, 95 S.Ct. 2197, 2205 n. 10, 45 L.Ed.2d 343 (1975) is resolved as:

422 U.S. 490 (1975)
499 n. 10 (1975)
95 S.Ct. 2197 (1975)
2205 n. 10 (1975)
45 L.Ed.2d 343 (1975)

instead of as:

422 U.S. 490, 499 n. 10 (1975)
95 S.Ct. 2197, 2205 n. 10 (1975)
45 L.Ed.2d 343 (1975)

ie. parsing the second page references should pick up all alphanumeric chars between the commas.

2. It doesn't parse the last citation ie. 463 U.S. 29, 43, 103 S.Ct. 2856, 2867, 77 L.Ed.2d 443 (1983).  I tested it on another sample text and it missed the last citation too.

Thanks!

Dinesh


 
From: Kent Johnson 
Sent: Tuesday, February 10, 2009 4:01 AM
To: Dinesh B Vadhia 
Cc: tutor at python.org 
Subject: Re: [Tutor] Picking up citations


On Mon, Feb 9, 2009 at 12:51 PM, Dinesh B Vadhia
<dineshbvadhia at hotmail.com> wrote:
> Kent /Emmanuel
>
> Below are the results using the PLY parser and Regex versions on the
> attached 'sierra' data which I think covers the common formats.  Here are
> some 'fully unparsed" citations that were missed by the programs:
>
> Smith v. Wisconsin Dept. of Agriculture, 23 F.3d 1134, 1141 (7th Cir.1994)
>
> Indemnified Capital Investments, S.A. v. R.J. O'Brien & Assoc., Inc., 12
> F.3d 1406, 1409 (7th Cir.1993).
>
> Hunt v. Washington Apple Advertising Commn., 432 U.S. 333, 343, 97 S.Ct.
> 2434, 2441, 53 L.Ed.2d 383 (1977)
>
> Idaho Conservation League v. Mumma, 956 F.2d 1508, 1517-18 (9th Cir.1992)

A few issues here:
S.A. - this is hard, to allow this while filtering out sentences
R.J. O'Brien, etc. - Loosening up the rules for the second name can allow these
1517-18 - allow page ranges

The name issues are getting to be too much for me. Attached is a PLY
version that just pulls out the citation without the name; at one
point you indicated that would work for you.

Kent
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20090210/2d43907c/attachment.htm>


More information about the Tutor mailing list