[Tutor] file eats first character, film at 11

Jeff Shannon jeff@ccvcorp.com
Mon Jun 16 20:07:01 2003


Kirk Bailey wrote:

> ok, I wrote a program to strip out the ms-dos CRLF charpair, replacing 
> with a single \n char. Works fine.
>
> EXCEPT
> it eats the first character in the file.


It's already been pointed out that the tool you're using (strip()) is 
too broad for what you say that you want to do (as well as having been 
noted that Unix already provides utilities to do this).  However, I 
thought I'd point out a few other specific issues with this code.

> index=0                                 # This strips out whitespace 
> chars
> for line in filelines:                  # strip out all the trailing 
> whitespace chars
>         filelines[index]=string.rstrip(filelines[index])
>         index=index+1
> [...] 
> linenumber=0
> for line in filelines:
>         filelines[linenumber]=string.strip(line)
> print filelines


First off, you're looping through the list of lines twice, once applying 
rstrip() to each line, and then applying strip() to each line.  Not only 
are you possibly doing more than stripping off the CR/LF, you're doing 
it twice, and removing any leading whitespace as well as any trailing 
whitespace.  (Just think about how bad this could be for a Python 
sourcefile...)

More importantly, though, you're using a dangerous way of modifying your 
list contents.  Each loop is iterating directly over the contents of the 
list (for line in filelines), but then modifying the list based on a 
separately-maintained index variable.  It's hard to guarantee that this 
index remains in synch with the for-loop.  And in fact, this is where 
your problem comes from -- in your second loop, you fail to increment 
the index variable, so you assign the (stripped) contents of each line 
in turn to the first item of the list.  Since the last line of the file 
is blank, the last trip through this loop assigns an empty string to the 
first item of the list -- thus your lost character, which is actually a 
lost entire line.

If you need to step through a list based on an index variable, you're 
much better off doing that explicitly for *both* reads and writes -- 
i.e., code your loop this way instead:

for index in range(len(filelines)):
    filelines[index] = string.rstrip(filelines[index])

This way, you *know* that each modified line is going right back where 
it came from.

> OK, clues please?


Hopefully this will explain what went wrong with this code, even though 
I do believe you'll be happier using the existing *nix utilities instead 
of a Python script for this job.  :)

Jeff Shannon
Technician/Programmer
Credit International