[docs] [issue22167] iglob() has misleading documentation (does indeed store names internally)

R. David Murray report at bugs.python.org
Fri Aug 8 15:22:33 CEST 2014


R. David Murray added the comment:

IMO the documentation isn't *wrong*, just misleading :)

What it is saying is that *your program* doesn't have to store the full list returned by iglob before being able to use it (ie: iglob doesn't return a list).  It says nothing about what resources are used internally, other than an implied contract that there is *some* efficiency over calling glob; which, as explained above, there is.  The fact that the implementation uses lots of memory if any single directory is large is then a performance bug, which can theoretically be fixed in 3.5 using scandir.

The reason iglob was introduced, if you check the revision history, is that glob used to call itself recursively for each sub-directory, which meant it held *all* of the files in *all* of the scanned tree in memory at one time.  It is literally true that the difference between glob and iglob is that with iglob your program doesn't have to store the full list of matches from all subdirectories, but talking about "your program" is not something we typically do in python docs, it is implied.

Perhaps in 2.7/3.4 we can mention in the module docs that at most one directory's worth of data will be held in memory during the globbing process, but it feels a little weird to document an implementation detail like that.  Still, if someone can come up with improved wording for the docs, we can add it.

----------
nosy: +r.david.murray

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue22167>
_______________________________________


More information about the docs mailing list