[Tutor] file eats first character, film at 11
Jeff Shannon
jeff@ccvcorp.com
Mon Jun 16 20:07:01 2003
Kirk Bailey wrote:
> ok, I wrote a program to strip out the ms-dos CRLF charpair, replacing
> with a single \n char. Works fine.
>
> EXCEPT
> it eats the first character in the file.
It's already been pointed out that the tool you're using (strip()) is
too broad for what you say that you want to do (as well as having been
noted that Unix already provides utilities to do this). However, I
thought I'd point out a few other specific issues with this code.
> index=0 # This strips out whitespace
> chars
> for line in filelines: # strip out all the trailing
> whitespace chars
> filelines[index]=string.rstrip(filelines[index])
> index=index+1
> [...]
> linenumber=0
> for line in filelines:
> filelines[linenumber]=string.strip(line)
> print filelines
First off, you're looping through the list of lines twice, once applying
rstrip() to each line, and then applying strip() to each line. Not only
are you possibly doing more than stripping off the CR/LF, you're doing
it twice, and removing any leading whitespace as well as any trailing
whitespace. (Just think about how bad this could be for a Python
sourcefile...)
More importantly, though, you're using a dangerous way of modifying your
list contents. Each loop is iterating directly over the contents of the
list (for line in filelines), but then modifying the list based on a
separately-maintained index variable. It's hard to guarantee that this
index remains in synch with the for-loop. And in fact, this is where
your problem comes from -- in your second loop, you fail to increment
the index variable, so you assign the (stripped) contents of each line
in turn to the first item of the list. Since the last line of the file
is blank, the last trip through this loop assigns an empty string to the
first item of the list -- thus your lost character, which is actually a
lost entire line.
If you need to step through a list based on an index variable, you're
much better off doing that explicitly for *both* reads and writes --
i.e., code your loop this way instead:
for index in range(len(filelines)):
filelines[index] = string.rstrip(filelines[index])
This way, you *know* that each modified line is going right back where
it came from.
> OK, clues please?
Hopefully this will explain what went wrong with this code, even though
I do believe you'll be happier using the existing *nix utilities instead
of a Python script for this job. :)
Jeff Shannon
Technician/Programmer
Credit International