[Python-ideas] Speed up os.walk() 5x to 9x by using file attributes from FindFirst/NextFile() and readdir()

Random832 random832 at fastmail.us
Mon Nov 19 02:23:27 CET 2012


On 11/15/2012 4:50 PM, Ben Hoyt wrote:
> 1) You've have to add a whole new way / set of constants / functions
> to test for the different values of d_type. Whereas there's already
> stuff (stat module) to test for st_mode values.
>
> 2) It'd make the typical use case more complex, for example, the
> straight "if st.st_mode is None ... else ..." I gave earlier becomes
> this:
>
> for filename, st in iterdir_stat(path):
>       if st.d_type is None:
>            if st.st_mode is None:
>                 st = os.stat(os.path.join(path, filename))
>            is_dir = stat.S_ISDIR(st.st_mode)
>       else:
>            is_dir = st.d_type == DT_DIR
>
> -Ben
I actually meant adding d_type *everywhere*...

if st.d_type is None:
     st = os.stat(os.path.join(path, filename))
is_dir = st.d_type == DT_DIR

Of course, your code would ultimately be more complex anyway since when 
followlinks=True you want to use isdir, and when it's not you want to 
use lstat. And consider what information iterdir_stat actually returns 
when the results are symlinks (if it's readdir/d_type, it's going to say 
"it's a symlink" and you need to try again to followlinks, if it's 
WIN32_FIND_DATA you have the information for both in principle, but the 
stat structure can only include one. Do we need an iterdir_lstat? If so, 
should iterdir_stat return None if d_type is DT_LNK, or DT_LNK?)

...and ultimately deprecating the S_IS___ stuff. It made sense in the 
1970s when there was real savings in packing your 4 bits of type 
information and your 12 bits of permission information in a single 
16-bit field, now it's just a historical artifact that seems like the 
only reason for it is a vague "make os feel like C on Unix" principle.



More information about the Python-ideas mailing list