[Tutor] Should I use python for parsing text
Jay Mutter III
jmutter at uakron.edu
Sat Mar 10 17:10:30 CET 2007
I am using an intel iMac with OS -X 10.4.8.
It has Python 2.3.5.
My issue is that I have a lot of text ( about 500 pages at the
moment) that I need to parse so that I can eliminate info I don't
need, break the remainder into fields and put in a database/spreadsheet.
See example next:
A.-C. Manufacturing Company. (See Sebastian, A. A.,
and Capes, assignors.)
A. G. A. Railway Light & Signal Co. (See Meden, Elof
A-N Company, The. (See Alexander and Nasb, as-
AN Company, The. (See Nash, It. J., and Alexander, as-
A/S. Arendal Smelteverk. (See Kaaten, Einar, assignor.)
A/S. Bjorgums Gevaei'kompani. (See Bjorguni, Nils, as-
A/S Mekano. (Sec Schepeler, Herman A., assignor.)
A/S Myrens Verkstad. (See Klling, Jens W. A., assignor.)
A/S Stordo Kisgruber. (See Nielsen, C., and Ilelleland,
A-Z Company, The. 'See llanmer, Laurence G., assignor.)
Aagaard, Carl L., Rockford, 111. Hand scraping tool. No.
1,345,058 ; July 6; v. 276 ; p. 05.
Aalborg, Christian, Wllkinsburg, Pa., assignor to Wcst-
inghouse Electric and Manufacturing Company. Trol-
ley. No. 1,334,943 ; Mar. 30 ; v. 272 ; p. 741.
Aaron, Solomon E., Boston, Mass. Pliers. No. 1,329,155 ;
Jan. 27 ; v. 270 ; p. 554.
For instance, I would like to go to end of line and if last character
is a comma or semicolon or hyphen then remove the CR.
Then move line by line through the file and delete everything after a
I am wondering if Python would be a good tool and if so where can I
find information on how to accomplish this or would I be better off
using something like the unix tool awk or something else??
More information about the Tutor