[Tutor] Should I use python for parsing text
Jay Mutter III
jmutter at uakron.edu
Wed Mar 21 03:47:40 CET 2007
"Jay Mutter III" <jmutter at uakron.edu> wrote
> See example next:
> A.-C. Manufacturing Company. (See Sebastian, A. A.,
> and Capes, assignors.)
>Aaron, Solomon E., Boston, Mass. Pliers. No. 1,329,155 ;
>Jan. 27 ; v. 270 ; p. 554.
> For instance, I would like to go to end of line and if last
> character is a comma or semicolon or hyphen then
> remove the CR.
It would look something like:
output = open('example.fixed','w')
for line in file('example.txt'):
if line[-1] in ',;-': # check last character
line = line.strip() # lose the C/R
output.write(line) # write to output
else: output.write(line) # append the next line complete with C/R
Working from the above suggestion ( and thank you very much - i did
enjoy your online tutorial)
I came up with the following:
# The next 5 lines are so I have an idea of how many lines i started
with in the file.
in_filename = raw_input('What is the COMPLETE name of the file you
want to open: ')
in_file = open(in_filename, 'r')
text = in_file.read()
num_lines = text.count('\n')
print 'There are', num_lines, 'lines in the file', in_filename
output = open("cleandata.txt","a") # file for writing data to
after stripping newline character
# read file, copying each line to new file
for line in text:
if line[:-1] in '-':
line = line.rstrip()
print "Data written to cleandata.txt."
# close the files
The above ran with no erros, gave me the number of lines in my
orginal file but then when i opened the cleandata.txt file
Sebastian,䄀⸀ A., and䌀愀瀀攀猀Ⰰ assignors.) A.䜀⸀ A.刀
愀椀氀眀愀礀 Light☀ Signal䌀漀⸀ (See䴀攀搀攀渀
Ⰰ Elof H
assignor.) A-N䌀漀洀瀀愀渀礀Ⰰ The.⠀匀攀攀
Alexander愀渀搀 Nasb,愀猀ⴀ 猀椀最渀漀爀猀⸀㬀 䄀一
Company,吀栀攀⸀ (See一愀猀栀Ⰰ It.䨀⸀Ⰰ and䄀氀攀砀
So what did I do to cause all of the strange characters????
Plus since this goes on it is as if it removed all \n and not just
the ones after a hyphen which I was using as my test case.
> Then move line by line through the file and delete everything
> after a numerical sequence
Slightly more tricky because you need to use a regular expression.
But if you know regex then only slightly.
> I am wondering if Python would be a good tool
Absolutely, its one of the areas where Python excels.
> find information on how to accomplish this
You could check my tutorial on the three topics:
Also the standard python documentation for the general tutorial
(assuming you've done basic programming in some other language
before) plus the re module
> using something like the unix tool awk or something else??
awk or sed could both be used, but Python is more generally
useful so unless you already know awk I'd take the time to
learn the basics of Python (a few hours maybe) and use that.
Author of the Learn to Program web site
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Tutor