[Tutor] Iterating through a list of strings
Andre Engels
andreengels at gmail.com
Mon May 3 09:35:52 CEST 2010
On Mon, May 3, 2010 at 7:16 AM, Thomas C. Hicks <paradox at pobox.com> wrote:
> I am using Python 2.6.4 in Ubuntu. Since I use Ubuntu (with its every
> 6 months updates) and want to learn Python I have been working on a
> post-install script that would get my Ubuntu system up and running with
> my favorite packages quickly. Basically the script reads a text file,
> processes the lines in the file and then does an apt-get for each line
> that is a package name. The text file looks like this:
>
> %Comment introducing the next block of packages
> %Below are the packages for using Chinese on the system
> %Third line of comment because I am a verbose guy!
> ibus-pinyin
> ibus-table-wubi
> language-pack-zh-hans
>
> etc.
>
> I read the lines of the file into a list for processing. To strip
> out the comments lines I am using something like this:
>
> for x in lines:
> if x.startswith('%'):
> lines.remove(x)
>
> This works great for all incidents of comments that are only one
> line. Sometimes I have blocks of comments that are more than one
> line and find that the odd numbered lines are stripped from the list
> but not the even numbered lines (i.e in the above block the line
> "%Below are the ..." line would not be stripped out of the
> list).
>
> Obviously there is something I don't understand about processing
> the items in the list and using the string function x.startswith() and
> the list function list.remove(). Interestingly if I put in "print x"
> in place of the lines.remove(x) line I get all the comment lines
> printed.
>
> Can anyone point me in the right direction?
Don't change the list that you are iterating over. As you have found,
it leads to (to most) unexpected results. What's going on, is that
Python first checks the first line, finds that it needs to be deleted,
deletes it, then goes to the second line; however, at that time the
original third line has become the second line, so the original second
line is not checked.
There are several ways to resolve this problem; to me the most obvious are:
1. Get the lines from a copy of the list rather than the list itself:
# use lines[:] rather than lines to actually make a copy rather than
a new name for the same object
linesCopy = lines[:]
for x in linesCopy:
if x.startswith('%'):
lines.remove(x)
2. First get the lines to remove, and remove them afterward:
linesToDelete = []
for x in lines:
if x.startswith('%'):
linesToDelete.append(x)
for x in linesToDelete:
lines.remove(x)
If that looks a bit clumsy, use a generator expression:
linesToDelete = [x for x in lines if x.startswith('%')]
for x in linesToDelete:
lines.remove(x)
which idiomatically should probably become:
for x in [y for y in lines if y.startswith('%')]:
lines.remove(x)
--
André Engels, andreengels at gmail.com
More information about the Tutor
mailing list