
On Jun 28, 2014 12:49 PM, "Ben Hoyt" <benhoyt@gmail.com> wrote:
But the underlying system calls -- ``FindFirstFile`` / ``FindNextFile`` on Windows and ``readdir`` on Linux and OS X --
What about FreeBSD, OpenBSD, NetBSD, Solaris, etc. They don't provide
readdir?
I guess it'd be better to say "Windows" and "Unix-based OSs" throughout the PEP? Because all of these (including Mac OS X) are Unix-based.
No, Just say POSIX.
It looks like the WIN32_FIND_DATA has a dwFileAttributes field. So we should mimic stat_result recent addition: the new stat_result.file_attributes field. Add DirEntry.file_attributes which would only be available on Windows.
The Windows structure also contains
FILETIME ftCreationTime; FILETIME ftLastAccessTime; FILETIME ftLastWriteTime; DWORD nFileSizeHigh; DWORD nFileSizeLow;
It would be nice to expose them as well. I'm no more surprised that the exact API is different depending on the OS for functions of the os module.
I think you've misunderstood how DirEntry.lstat() works on Windows -- it's basically a no-op, as Windows returns the full stat information with the original FindFirst/FindNext OS calls. This is fairly explict in the PEP, but I'm sure I could make it clearer:
DirEntry.lstat(): "like os.lstat(), but requires no system calls on
Windows
So you can already get the dwFileAttributes for free by saying entry.lstat().st_file_attributes. You can also get all the other fields you mentioned for free via .lstat() with no additional OS calls on Windows, for example: entry.lstat().st_size.
Feel free to suggest changes to the PEP or scandir docs if this isn't clear. Note that is_dir()/is_file()/is_symlink() are free on all systems, but .lstat() is only free on Windows.
Does your implementation uses a free list to avoid the cost of memory allocation? A short free list of 10 or maybe just 1 may help. The free list may be stored directly in the generator object.
No, it doesn't. I might add this to the PEP under "possible improvements". However, I think the speed increase by removing the extra OS call and/or disk seek is going to be way more than memory allocation improvements, so I'm not sure this would be worth it.
Does it support also bytes filenames on UNIX?
Python now supports undecodable filenames thanks to the PEP 383 (surrogateescape). I prefer to use the same type for filenames on Linux and Windows, so Unicode is better. But some users might prefer bytes for other reasons.
I forget exactly now what my scandir module does, but for os.scandir() I think this should behave exactly like os.listdir() does for Unicode/bytes filenames.
Crazy idea: would it be possible to "convert" a DirEntry object to a pathlib.Path object without losing the cache? I guess that pathlib.Path expects a full stat_result object.
The main problem is that pathlib.Path objects explicitly don't cache stat info (and Guido doesn't want them to, for good reason I think). There's a thread on python-dev about this earlier. I'll add it to a "Rejected ideas" section.
I don't understand how you can build a full lstat() result without really calling stat. I see that WIN32_FIND_DATA contains the size, but here you call lstat().
See above.
Do you plan to continue to maintain your module for Python < 3.5, but upgrade your module for the final PEP?
Yes, I intend to maintain the standalone scandir module for 2.6 <= Python < 3.5, at least for a good while. For integration into the Python 3.5 stdlib, the implementation will be integrated into posixmodule.c, of course.
Should there be a way to access the full path? ----------------------------------------------
Should ``DirEntry``'s have a way to get the full path without using ``os.path.join(path, entry.name)``? This is a pretty common pattern, and it may be useful to add pathlib-like ``str(entry)`` functionality. This functionality has also been requested in `issue 13`_ on GitHub.
.. _`issue 13`: https://github.com/benhoyt/scandir/issues/13
I think that it would be very convinient to store the directory name in the DirEntry. It should be light, it's just a reference.
And provide a fullname() name which would just return os.path.join(path, entry.name) without trying to resolve path to get an absolute path.
Yeah, fair suggestion. I'm still slightly on the fence about this, but I think an explicit fullname() is a good suggestion. Ideally I think it'd be better to mimic pathlib.Path.__str__() which is kind of the equivalent of fullname(). But how does pathlib deal with unicode/bytes issues if it's the str function which has to return a str object? Or at least, it'd be very weird if __str__() returned bytes. But I think it'd need to if you passed bytes into scandir(). Do others have thoughts?
Would it be hard to implement the wildcard feature on UNIX to compare performances of scandir('*.jpg') with and without the wildcard built in os.scandir?
It's a good idea, the problem with this is that the Windows wildcard implementation has a bunch of crazy edge cases where *.ext will catch more things than just a simple regex/glob. This was discussed on python-dev or python-ideas previously, so I'll dig it up and add to a Rejected Ideas section. In any case, this could be added later if there's a way to iron out the Windows quirks.
-Ben _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe:
https://mail.python.org/mailman/options/python-dev/greg%40krypto.org