[Python-ideas] Speed up os.walk() 5x to 9x by using file attributes from FindFirst/NextFile() and readdir()
Random832
random832 at fastmail.us
Mon Nov 19 02:23:27 CET 2012
On 11/15/2012 4:50 PM, Ben Hoyt wrote:
> 1) You've have to add a whole new way / set of constants / functions
> to test for the different values of d_type. Whereas there's already
> stuff (stat module) to test for st_mode values.
>
> 2) It'd make the typical use case more complex, for example, the
> straight "if st.st_mode is None ... else ..." I gave earlier becomes
> this:
>
> for filename, st in iterdir_stat(path):
> if st.d_type is None:
> if st.st_mode is None:
> st = os.stat(os.path.join(path, filename))
> is_dir = stat.S_ISDIR(st.st_mode)
> else:
> is_dir = st.d_type == DT_DIR
>
> -Ben
I actually meant adding d_type *everywhere*...
if st.d_type is None:
st = os.stat(os.path.join(path, filename))
is_dir = st.d_type == DT_DIR
Of course, your code would ultimately be more complex anyway since when
followlinks=True you want to use isdir, and when it's not you want to
use lstat. And consider what information iterdir_stat actually returns
when the results are symlinks (if it's readdir/d_type, it's going to say
"it's a symlink" and you need to try again to followlinks, if it's
WIN32_FIND_DATA you have the information for both in principle, but the
stat structure can only include one. Do we need an iterdir_lstat? If so,
should iterdir_stat return None if d_type is DT_LNK, or DT_LNK?)
...and ultimately deprecating the S_IS___ stuff. It made sense in the
1970s when there was real savings in packing your 4 bits of type
information and your 12 bits of permission information in a single
16-bit field, now it's just a historical artifact that seems like the
only reason for it is a vague "make os feel like C on Unix" principle.
More information about the Python-ideas
mailing list