[Tutor] Should I use python for parsing text
Luke Paireepinart
rabidpoobear at gmail.com
Wed Mar 21 06:41:44 CET 2007
> # The next 5 lines are so I have an idea of how many lines i started
> with in the file.
>
> in_filename = raw_input('What is the COMPLETE name of the file you
> want to open: ')
> in_file = open(in_filename, 'r')
> text = in_file.read()
read() returns a one-dimensional list with all the data, not a
2-dimensional one with each element a line.
Use readlines() for this functionality.
(Eg. A file with contents 'hello\nhoware\nyou?' would have this string
returned by read(), but
readlines() would return ['hello\n','howare\n','you?'].)
> num_lines = text.count('\n')
or just len(text) if you're using readlines()
> print 'There are', num_lines, 'lines in the file', in_filename
>
> output = open("cleandata.txt","a") # file for writing data to after
> stripping newline character
You might want to open this file in 'write' mode while you're testing,
so previous test results don't confuse you.
>
> # read file, copying each line to new file
> for line in text:
since read() returns a 1-dimensional list, you're looping over every
character in the file, not every line.
> if line[:-1] in '-':
In this case this is the same as "if line == '-':" because your 'line'
variable only contains characters.
> line = line.rstrip()
> output.write(line)
> else: output.write(line)
>
> print "Data written to cleandata.txt."
>
> # close the files
> in_file.close()
> output.close()
>
> The above ran with no erros, gave me the number of lines in my orginal
> file but then when i opened the cleandata.txt file
> I got:
>
> A.-C.䴀愀渀甀昀愀挀琀甀爀椀渀最 �Company.⠀匀攀攀�Sebastian,䄀⸀�A.,�and
> 䌀愀瀀攀猀Ⰰ�assignors.)�A.䜀⸀�A.刀愀椀氀眀愀礀 �Light☀�Signal䌀漀⸀�
> (See䴀攀搀攀渀Ⰰ�Elof�H
assignor.)�A-N䌀漀洀瀀愀渀礀Ⰰ�The.⠀匀攀攀
> �Alexander愀渀搀�Nasb,愀猀ⴀ�猀椀最渀漀爀猀⸀㬀�䄀一�Company,吀栀攀⸀�
> (See一愀猀栀Ⰰ�It.䨀⸀Ⰰ�and䄀氀攀砀愀渀搀攀爀Ⰰ�as-�
Not sure what caused all of those characters.
HTH,
-Luke
More information about the Tutor
mailing list