[Tutor] scratching my head - still

Wed Aug 5 09:35:09 CEST 2015

Cameron Simpson wrote:

> On 05Aug2015 12:46, Steven D'Aprano <steve at pearwood.info> wrote:
>>On Tue, Aug 04, 2015 at 05:52:15PM -0700, Clayton Kirkwood wrote:
>>> As seen below (closely), some filenames are not being removed while
>>> others are, such as in the first stanza, some pdfs are removed, some
>>> aren't. In the second stanza, Thumbs.db makes it through, but was caught
>>> in the first stanza. (Thanks for those who have proffered solutions to
>>> date!) I see no logic in the results. What am I missing???
>>
>>You are modifying the list of files while iterating over it, which plays
>>all sorts of hell with the process. Watch this:
> [... detailed explaination ...]
>>The lesson here is that you should never modify a list while iterating
>>over it. Instead, make a copy, and modify the copy.
> 
> What Steven said. Yes indeed.
> 
> Untested example suggestion:
> 
>   all_filenames = set(filenames)
>   for filename in filenames:
>     if .. test here ...:
>       all_filenames.remove(filename)
>   print(all_filenames)
> 
> You could use a list instead of a set and for small numbers of files be
> fine. With large numbers of files a set is far faster to remove things
> from.

If the list size is manageable, usually the case for the names of files in 
one directory, you should not bother about removing items. Just build a new 
list:

all_filenames = [...]
matching_filenames = [name for name in all_filenames if test(name)]

If the list is huge and you expect that most items will be kept you might 
try reverse iteration:

for i in reversed(range(len(all_filenames))):
    name = all_filenames[i]
    if test(name):
        del all_filenames[i]

This avoids both copying the list and the linear search performed by 
list.remove().