[Tutor] Should I use python for parsing text

Luke Paireepinart rabidpoobear at gmail.com
Wed Mar 21 06:41:44 CET 2007

> # The next 5 lines are so I have an idea of how many lines i started 
> with in the file.
> in_filename = raw_input('What is the COMPLETE name of the file you 
> want to open:    ')
> in_file = open(in_filename, 'r')
> text = in_file.read()
read() returns a one-dimensional list with all the data, not a 
2-dimensional one with each element a line.
Use readlines() for this functionality.
(Eg. A file with contents 'hello\nhoware\nyou?' would have this string 
returned by read(), but
readlines() would return ['hello\n','howare\n','you?'].)
> num_lines = text.count('\n')
or just len(text) if you're using readlines()
> print 'There are', num_lines, 'lines in the file', in_filename
> output = open("cleandata.txt","a")    # file for writing data to after 
> stripping newline character
You might want to open this file in 'write' mode while you're testing, 
so previous test results don't confuse you.
> # read file, copying each line to new file
> for line in text:
since read() returns a 1-dimensional list, you're looping over every 
character in the file, not every line.
>     if line[:-1] in '-':
In this case this is the same as "if line == '-':" because your 'line' 
variable only contains characters.
>         line = line.rstrip()
>         output.write(line)
>     else: output.write(line)
> print "Data written to cleandata.txt."
> # close the files
> in_file.close()
> output.close()
> The above ran with no erros, gave me the number of lines in my orginal 
> file but then when i opened the cleandata.txt file
> I got:
> A.-C.䴀愀渀甀昀愀挀琀甀爀椀渀最 �Company.⠀匀攀攀�Sebastian,䄀⸀�A.,�and 
> 䌀愀瀀攀猀Ⰰ�assignors.)�A.䜀⸀�A.刀愀椀氀眀愀礀 �Light☀�Signal䌀漀⸀� 
> (See䴀攀搀攀渀Ⰰ�Elof�H
> �Alexander愀渀搀�Nasb,愀猀ⴀ�猀椀最渀漀爀猀⸀㬀�䄀一�Company,吀栀攀⸀� 
> (See一愀猀栀Ⰰ�It.䨀⸀Ⰰ�and䄀氀攀砀愀渀搀攀爀Ⰰ�as-�
Not sure what caused all of those characters.

More information about the Tutor mailing list