Extend the os.stat() result objects with methods like isfile() and isdir()

I propose adding methods like isfile(), isdir(), islink(), isfifo() and so on - basically everything that would currently be done via code like "stat.S_ISREG(s.st_mode)". Please indicate support or not, so I can know whether to draft a PEP and work on implementation. My motivation is twofold: Firstly, it would make code that needs to interpret stat() results using the existing S_ISREG etc. methods in the stat module look cleaner, more Pythonic, and less like C code manipulating bitmasks. Secondly, in a recent discussion on python-dev [1] the issue was raised that the stat() call can perform badly under certain situations, and that some form of caching of the result of stat() calls is therefore desirable. This proposal makes it easier to do one form of caching stat() results: the kind where the result is manually cached by storing it in some variable. Think of code such as: if os.path.isfile(f) or os.path.isdir(f): # do something This will indirectly cause two calls to stat(). Currently, if you want to manually cache that stat call, you'll need to write: s = os.stat(f) if stat.S_ISREG(s.st_mode) or stat.S_ISDIR(s.st_mode): # do something This not only looks more convoluted and requires an extra import of stat, but it also looks wildly different from the previous code even though it basically has the same semantics. Under my proposal, this could become: s = os.stat(f) if s.isfile() or s.isdir(): # do something This proposal is independent of the current PEP 428 Path object proposal. However, if accepted, users of PEP 428 Path objects will also benefit, since those can also return results of stat() calls. -- Pieter Nagel

On 02/05/2013 17:49, Pieter Nagel wrote:
I propose adding methods like isfile(), isdir(), islink(), isfifo() and so on - basically everything that would currently be done via code like "stat.S_ISREG(s.st_mode)".
Please indicate support or not, so I can know whether to draft a PEP and work on implementation.
My motivation is twofold:
Firstly, it would make code that needs to interpret stat() results using the existing S_ISREG etc. methods in the stat module look cleaner, more Pythonic, and less like C code manipulating bitmasks.
Secondly, in a recent discussion on python-dev [1] the issue was raised that the stat() call can perform badly under certain situations, and that some form of caching of the result of stat() calls is therefore desirable.
This proposal makes it easier to do one form of caching stat() results: the kind where the result is manually cached by storing it in some variable.
Think of code such as:
if os.path.isfile(f) or os.path.isdir(f): # do something
This will indirectly cause two calls to stat().
Currently, if you want to manually cache that stat call, you'll need to write:
s = os.stat(f) if stat.S_ISREG(s.st_mode) or stat.S_ISDIR(s.st_mode): # do something
This not only looks more convoluted and requires an extra import of stat, but it also looks wildly different from the previous code even though it basically has the same semantics.
Under my proposal, this could become:
s = os.stat(f) if s.isfile() or s.isdir(): # do something
This proposal is independent of the current PEP 428 Path object proposal. However, if accepted, users of PEP 428 Path objects will also benefit, since those can also return results of stat() calls.
+1 It also means not having to import the stat module to get the strangely-named (to me) constants (why the "S_" prefix? Yes, I do know why, BTW. :-)).

+1 On May 2, 2013 11:13 AM, "MRAB" <python@mrabarnett.plus.com> wrote:
On 02/05/2013 17:49, Pieter Nagel wrote:
I propose adding methods like isfile(), isdir(), islink(), isfifo() and so on - basically everything that would currently be done via code like "stat.S_ISREG(s.st_mode)".
Please indicate support or not, so I can know whether to draft a PEP and work on implementation.
My motivation is twofold:
Firstly, it would make code that needs to interpret stat() results using the existing S_ISREG etc. methods in the stat module look cleaner, more Pythonic, and less like C code manipulating bitmasks.
Secondly, in a recent discussion on python-dev [1] the issue was raised that the stat() call can perform badly under certain situations, and that some form of caching of the result of stat() calls is therefore desirable.
This proposal makes it easier to do one form of caching stat() results: the kind where the result is manually cached by storing it in some variable.
Think of code such as:
if os.path.isfile(f) or os.path.isdir(f): # do something
This will indirectly cause two calls to stat().
Currently, if you want to manually cache that stat call, you'll need to write:
s = os.stat(f) if stat.S_ISREG(s.st_mode) or stat.S_ISDIR(s.st_mode): # do something
This not only looks more convoluted and requires an extra import of stat, but it also looks wildly different from the previous code even though it basically has the same semantics.
Under my proposal, this could become:
s = os.stat(f) if s.isfile() or s.isdir(): # do something
This proposal is independent of the current PEP 428 Path object proposal. However, if accepted, users of PEP 428 Path objects will also benefit, since those can also return results of stat() calls.
+1
It also means not having to import the stat module to get the strangely-named (to me) constants (why the "S_" prefix? Yes, I do know why, BTW. :-)).
______________________________**_________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/**mailman/listinfo/python-ideas<http://mail.python.org/mailman/listinfo/python-ideas>

On 2 May 2013 17:49, Pieter Nagel <pieter@nagel.co.za> wrote:
I propose adding methods like isfile(), isdir(), islink(), isfifo() and so on - basically everything that would currently be done via code like "stat.S_ISREG(s.st_mode)".
Please indicate support or not, so I can know whether to draft a PEP and work on implementation.
+1 for all the reasons you mention. I would never think of using stat.S_ISREG(s.st_mode) - it looks too low level. But s.isfile() looks completely obvious. Paul

+1 On Thu, May 2, 2013 at 10:07 PM, Paul Moore <p.f.moore@gmail.com> wrote:
On 2 May 2013 17:49, Pieter Nagel <pieter@nagel.co.za> wrote:
I propose adding methods like isfile(), isdir(), islink(), isfifo() and so on - basically everything that would currently be done via code like "stat.S_ISREG(s.st_mode)".
Please indicate support or not, so I can know whether to draft a PEP and work on implementation.
+1 for all the reasons you mention. I would never think of using stat.S_ISREG(s.st_mode) - it looks too low level. But s.isfile() looks completely obvious.
Paul
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- Thanks, Andrew Svetlov

Am 02.05.2013 18:49, schrieb Pieter Nagel:
Currently, if you want to manually cache that stat call, you'll need to write:
s = os.stat(f) if stat.S_ISREG(s.st_mode) or stat.S_ISDIR(s.st_mode): # do something
This not only looks more convoluted and requires an extra import of stat, but it also looks wildly different from the previous code even though it basically has the same semantics.
Under my proposal, this could become:
s = os.stat(f) if s.isfile() or s.isdir(): # do something
This proposal is independent of the current PEP 428 Path object proposal. However, if accepted, users of PEP 428 Path objects will also benefit, since those can also return results of stat() calls.
Hi Pieter, I like your proposal. We could take the opportunity now and push the proposal one or two steps further. First step: drop the function call stat_result.isfile() or stat_result.isdir() don't have to be functions. The feature can also be implemented with properties, e.g. stat_result.is_file. Or can somebody think of a reason why they have to be callables anymore? Second step: get file type as string A property stat_result.file_type that returns the type of the file as string makes checks like "s.is_dir or s.is_file" even easier: s = os.stat(f) if s.file_type in {'reg', 'dir'}: do_something() We have to agree on a set of names, though. IMHO the abbreviations from stat.h are clear and distinct: {'fifo', 'chr', 'dir', 'blk', 'reg', 'lnk', 'sock', 'door', 'port'}. door and port are special file types on Solaris. Christian

On 5/2/2013 7:48 PM, Christian Heimes wrote:
Second step: get file type as string
A property stat_result.file_type that returns the type of the file as string makes checks like "s.is_dir or s.is_file" even easier:
s = os.stat(f) if s.file_type in {'reg', 'dir'}: do_something()
We have to agree on a set of names, though. IMHO the abbreviations from stat.h are clear and distinct: {'fifo', 'chr', 'dir', 'blk', 'reg', 'lnk', 'sock', 'door', 'port'}. door and port are special file types on Solaris.
Seems like a use case for a flag-based enum! -- Eric.

From: Christian Heimes <christian@python.org> Sent: Thursday, May 2, 2013 4:48 PM
Am 02.05.2013 18:49, schrieb Pieter Nagel:
Under my proposal, this could become:
s = os.stat(f) if s.isfile() or s.isdir(): # do something
+1
First step: drop the function call
stat_result.isfile() or stat_result.isdir() don't have to be functions. The feature can also be implemented with properties, e.g. stat_result.is_file. Or can somebody think of a reason why they have to be callables anymore?
Well, there's the fact that os.path.isfile is a callable. And I've actually seen code that uses isfile in a filter call, and operator.attrgettr('is_file') obviously isn't as nice. But then a genexp is probably nicer than filter here anyway. So, two very trivial downsides. I guess +0.
Second step: get file type as string
A property stat_result.file_type that returns the type of the file as string makes checks like "s.is_dir or s.is_file" even easier:
s = os.stat(f) if s.file_type in {'reg', 'dir'}: do_something()
If this is _in addition to_ the methods/attributes, +1. If it's in place of them, -1. There are cases where this will be simpler, but for the most common case, s.isdir is much nicer than s.file_type == 'dir'.
We have to agree on a set of names, though. IMHO the abbreviations from stat.h are clear and distinct: {'fifo', 'chr', 'dir', 'blk', 'reg', 'lnk', 'sock', 'door', 'port'}. door and port are special file types on Solaris.
This one's actually a problem. If os.path.isfile(name) is true, and so is s.isfile, but s.file_type=='file' is false, that's going to be confusing. Especially to novices and Windows programmers—the very people who write code like os.path.islink(f) or os.path.isdir(x) today because they're afraid of the stat module, who we're trying to help here. In fact, I suspect that, even after they learn that it's "reg" rather than "file", they're going to have a hard time remembering it. But calling it 'file' is confusing to everyone who _does_ know stat. Anything you can call stat on is a file. And I don't know of a good answer here.

On 03/05/2013 01:09, Andrew Barnert wrote:
From: Christian Heimes <christian@python.org>
Sent: Thursday, May 2, 2013 4:48 PM
Am 02.05.2013 18:49, schrieb Pieter Nagel:
Under my proposal, this could become:
s = os.stat(f) if s.isfile() or s.isdir(): # do something
+1
First step: drop the function call
stat_result.isfile() or stat_result.isdir() don't have to be functions. The feature can also be implemented with properties, e.g. stat_result.is_file. Or can somebody think of a reason why they have to be callables anymore?
Well, there's the fact that os.path.isfile is a callable.
True.
And I've actually seen code that uses isfile in a filter call, and operator.attrgettr('is_file') obviously isn't as nice. But then a genexp is probably nicer than filter here anyway.
So, two very trivial downsides. I guess +0.
Second step: get file type as string
A property stat_result.file_type that returns the type of the file as string makes checks like "s.is_dir or s.is_file" even easier:
s = os.stat(f) if s.file_type in {'reg', 'dir'}: do_something()
If this is _in addition to_ the methods/attributes, +1.
If it's in place of them, -1. There are cases where this will be simpler, but for the most common case, s.isdir is much nicer than s.file_type == 'dir'.
We have to agree on a set of names, though. IMHO the abbreviations from stat.h are clear and distinct: {'fifo', 'chr', 'dir', 'blk', 'reg', 'lnk', 'sock', 'door', 'port'}. door and port are special file types on Solaris.
This one's actually a problem.
If os.path.isfile(name) is true, and so is s.isfile, but s.file_type=='file' is false, that's going to be confusing. Especially to novices and Windows programmers—the very people who write code like os.path.islink(f) or os.path.isdir(x) today because they're afraid of the stat module, who we're trying to help here. In fact, I suspect that, even after they learn that it's "reg" rather than "file", they're going to have a hard time remembering it.
It wouldn't be """s.file_type=='file'""", but """'file' in s.file_type""". And I agree about 'reg'.
But calling it 'file' is confusing to everyone who _does_ know stat. Anything you can call stat on is a file.
...even if os.path.isfile(...) says it isn't.
And I don't know of a good answer here.
Maybe "file_type" is the wrong name for it.

Am 03.05.2013 03:56, schrieb MRAB:
It wouldn't be """s.file_type=='file'""", but """'file' in s.file_type""".
And I agree about 'reg'.
But calling it 'file' is confusing to everyone who _does_ know stat. Anything you can call stat on is a file.
...even if os.path.isfile(...) says it isn't.
And I don't know of a good answer here.
Maybe "file_type" is the wrong name for it.
It's the POSIX nomenclature and the Plan 9 concept "everything is a file". POSIX calls it "file type" all over the place, e.g. in the documentation of stat's st_mode field. A new term is going to confuse lots of Unix developers. Windows developer are only used to two kinds of files: regular files and directories. Even symlinks are rarely used on Windows. I agree that Windows developers are going to be confused by the concept of 'reg' or 'regular file'. Andrew's sugestion of an enum instead of strings has a nice benefit. We can have both concepts if file_types.FILE == file_types.REG. Christian

From: Christian Heimes <christian@python.org> Sent: Thursday, May 2, 2013 4:48 PM Also:
We have to agree on a set of names, though. IMHO the abbreviations from stat.h are clear and distinct: {'fifo', 'chr', 'dir', 'blk', 'reg', 'lnk', 'sock', 'door', 'port'}. door and port are special file types on Solaris.
Does Python have stat.S_ISDOOR on Solaris? (It doesn't on other POSIX systems, and it's not mentioned in the docs.) Meanwhile, if we're going to add non-standard platform-specific flags, these aren't the only two. Mac and most other *BSD have WHT. (I believe recent linux/glibc doesn't expose it anymore, because it's treated as internal to certain unionfs implementations?) POSIX 1.b also defines MQ, SEM, and SHM (although these aren't required to be stored inside the S_IFMT bits of mode).

On Thu, 2013-05-02 at 17:20 -0700, Andrew Barnert wrote:
Does Python have stat.S_ISDOOR on Solaris? (It doesn't on other POSIX systems, and it's not mentioned in the docs.)
In principle I'm all for looking at missing platform-specific stat flags while that region of the stdlib is being worked on. In practice, though, I only have access to Linux when it comes to implementing this. Support for other platforms will most likely depend on the availability of volunteers when it comes to implementation. -- Pieter Nagel

Am 03.05.2013 07:22, schrieb Pieter Nagel:
On Thu, 2013-05-02 at 17:20 -0700, Andrew Barnert wrote:
Does Python have stat.S_ISDOOR on Solaris? (It doesn't on other POSIX systems, and it's not mentioned in the docs.)
In principle I'm all for looking at missing platform-specific stat flags while that region of the stdlib is being worked on.
In practice, though, I only have access to Linux when it comes to implementing this. Support for other platforms will most likely depend on the availability of volunteers when it comes to implementation.
You can ask Trent Nelson for snakebite.net access. He has lots important operation systems in his setup. I can also help you if you need information or testing. So far I was able to identify this set of file types: S_ISDIR() S_ISCHR() S_ISBLK() S_ISREG() S_ISLNK() S_ISSOCK() S_ISFIFO() # Solaris S_ISDOOR() S_ISPORT() # POSIX 1.b real-time extension S_ISMSG() S_ISSEM() S_ISSHM() # whiteout, translucent file systems S_ISWHT

There also exist macros S_TYPEISMQ, S_TYPEISSEM, S_TYPEISSHM and S_TYPEISTMO, those have a struct stat as the argument and return if it refers to a message queue, semaphore, shared memory segment or typed memory (see http://pubs.opengroup.org/onlinepubs/009604599/basedefs/sys/stat.h.html). It don't know if there a systems where these macros can return a value other than 0 (both OSX and Linux always "return" 0 from these macros). Ronald On 3 May, 2013, at 13:57, Christian Heimes <christian@python.org> wrote:
Am 03.05.2013 07:22, schrieb Pieter Nagel:
On Thu, 2013-05-02 at 17:20 -0700, Andrew Barnert wrote:
Does Python have stat.S_ISDOOR on Solaris? (It doesn't on other POSIX systems, and it's not mentioned in the docs.)
In principle I'm all for looking at missing platform-specific stat flags while that region of the stdlib is being worked on.
In practice, though, I only have access to Linux when it comes to implementing this. Support for other platforms will most likely depend on the availability of volunteers when it comes to implementation.
You can ask Trent Nelson for snakebite.net access. He has lots important operation systems in his setup. I can also help you if you need information or testing.
So far I was able to identify this set of file types:
S_ISDIR() S_ISCHR() S_ISBLK() S_ISREG() S_ISLNK() S_ISSOCK() S_ISFIFO()
# Solaris S_ISDOOR() S_ISPORT()
# POSIX 1.b real-time extension S_ISMSG() S_ISSEM() S_ISSHM()
# whiteout, translucent file systems S_ISWHT _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

Am 03.05.2013 14:12, schrieb Ronald Oussoren:
There also exist macros S_TYPEISMQ, S_TYPEISSEM, S_TYPEISSHM and S_TYPEISTMO, those have a struct stat as the argument and return if it refers to a message queue, semaphore, shared memory segment or typed memory (see http://pubs.opengroup.org/onlinepubs/009604599/basedefs/sys/stat.h.html).
It don't know if there a systems where these macros can return a value other than 0 (both OSX and Linux always "return" 0 from these macros).
I've checked stat.h on some additional machines. Solaris 11 and AIX 7 have the macros but they always evaluates to 0. FreeBSD doesn't have the macros at all. I could not find typed memory object macros on any system. I guess we can safely ignore the files types as they aren't available on any supported platform. Christian

On Fri, May 3, 2013, at 7:57, Christian Heimes wrote:
You can ask Trent Nelson for snakebite.net access. He has lots important operation systems in his setup. I can also help you if you need information or testing.
So far I was able to identify this set of file types
Heirloom toolchest "ls" supports: http://heirloom.cvs.sourceforge.net/viewvc/heirloom/heirloom/ls/ls.c?revision=1.9&view=markup http://heirloom.cvs.sourceforge.net/viewvc/heirloom/heirloom/ls/ls.1?revision=1.5&view=markup S_IFNWK HP-UX network special file S_IFNAM XENIX special named file S_INSEM XENIX semaphore subtype of IFNAM (looked up from s->rdev) S_INSHD XENIX shared data subtype of IFNAM " " " " Of these, GNU coreutils ls only supports doors and whiteouts. Chasing after a random hunch (something about AIX), I found these: http://cd.textfiles.com/transameritech2/EXTRAS/JOVE-4.6/ASK.C S_ISHIDDEN Hidden Directory [aix] S_ISCDF Context Dependent Files [hpux] S_ISNWK Network Special [hpux] http://lists.gnu.org/archive/html/bug-gnulib/2012-12/msg00084.html S_ISMPX AIX "MPX" file (multiplex device?) https://github.com/gagern/gnulib/blob/master/tests/test-sys_stat.c has a massive pile of macros with no comments S_ISCTG S_ISMPB S_ISMPX S_ISNAM S_ISNWK S_ISOFD S_ISOFL S_ISPORT http://lists.gnu.org/archive/html/bug-gnulib/2004-08/msg00017.html S_ISOFD Cray DMF (data migration facility): off line, with data S_ISOFL Cray DMF (data migration facility): off line, with no data S_ISCTG Contiguous (It's possible that these may not be file types) http://doiso.googlecode.com/svn/trunk/Source/mkisofs-1.12b5/include/statdefs... S_ISMPC UNUSED multiplexed c S_ISNAM Named file (XENIX) S_ISMPB UNUSED multiplexed b S_ISCNT Contiguous file S_ISSHAD Solaris shadow inode http://www.opensource.apple.com/source/gnutar/gnutar-450/gnutar/lib/sys_stat... S_ISMPB /* V7 */ S_ISPORT /* Solaris 10 and up */ S_TYPEISSEM S_TYPEISSHM - macros to check the XENIX IFNAM types mentioned above S_TYPEISMQ S_TYPEISTMO

On Fri, 2013-05-03 at 01:48 +0200, Christian Heimes wrote:
stat_result.isfile() or stat_result.isdir() don't have to be functions. The feature can also be implemented with properties, e.g. stat_result.is_file. Or can somebody think of a reason why they have to be callables anymore?
I lean towards keeping it a function call for symmetry with os.path.isfile() and friends.
s = os.stat(f) if s.file_type in {'reg', 'dir'}: do_something()
If something like this were to be done, I wouldn't like doing it with magic string constants. I agree that the new enums would be better to do this with. This also raises the issue of whether, if there is a file type enumeration on the stat() result, whether there should be a symmetric os.path.file_type(f) call added. But I'll remain open to these kinds of discussions as the PEP is discussed, It seems there's enough support for the basic principle for me to go and work on the PEP. -- Pieter Nagel

First step: drop the function call
stat_result.isfile() or stat_result.isdir() don't have to be functions. The feature can also be implemented with properties, e.g. stat_result.is_file. Or can somebody think of a reason why they have to be callables anymore?
Callables sound more consistent.
Second step: get file type as string
A property stat_result.file_type that returns the type of the file as string makes checks like "s.is_dir or s.is_file" even easier:
s = os.stat(f) if s.file_type in {'reg', 'dir'}: do_something()
Strings shouldn't be used for anything except text. It defeats the typing system, prevents static check, offers poor performance, etc. This kind of attribute should ideally be an enum ;-) Note that you have to be careful when changing os.stat() return type: we absolutely don't want to break backward compatibility: for example, the returned object should look like a tuple (among other things, support indexing).

To all the proponents of a file_type() attribute: can you please show some use-cases for this? I don't want to complicate the PEP just for some speculative nice-to-have. -- Pieter Nagel

On May 3, 2013, at 8:14, Pieter Nagel <pieter@nagel.co.za> wrote:
To all the proponents of a file_type() attribute: can you please show some use-cases for this?
There's one really obvious one for os.walk or similar functions: types_to_recurse = ('dir', 'link') if follow else ('dir',) # .., if s.file_type in types_to_recurse: try_to_recurse() Meanwhile, it strikes me that if you just change this to is_type(types_to_check), it gives a parallel with isinstance and friends, which has two benefits. First, it means you can handle synonyms. Just like isinstance can handle subclasses/ABC registration/etc. while type() cannot, istype can handle both 'reg' and 'file' as the same type while file_type cannot. Second, by following the usual python pattern for pre-checking, it makes it blindingly obvious that you're violating EAFTP, forcing you to think about whether you have a good reason to do so. In the example above, I do (I don't want to try recursing into symlinks if follow is false, even though it would work), but that may not be true for every use case.
I don't want to complicate the PEP just for some speculative nice-to-have.
-- Pieter Nagel
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
participants (12)
-
Andrew Barnert
-
Andrew Svetlov
-
Charles-François Natali
-
Christian Heimes
-
David Mertz
-
Eric V. Smith
-
MRAB
-
Nick Coghlan
-
Paul Moore
-
Pieter Nagel
-
random832@fastmail.us
-
Ronald Oussoren