[Patches] [ python-Patches-462296 ] Add attributes to os.stat results

noreply@sourceforge.net noreply@sourceforge.net
Tue, 05 Mar 2002 08:46:33 -0800


Patches item #462296, was opened at 2001-09-17 19:57
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=462296&group_id=5470

Category: Library (Lib)
Group: None
Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Nick Mathewson (nickm)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: Add attributes to os.stat results

Initial Comment:
See bug #111481, and PEP 0042.  Both suggest that the
return values for os.{stat,lstat,statvfs,fstatvfs}
ought to be struct-like objects rather than simple tuples. 

With this patch, the os module will modify the
aformentioned functions so that their results still
obey the previous tuple protocol, but now have
read-only attributes as well.  In other words,
"os.stat('filename')[0]" is now synonymous with
"os.stat('filename').st_mode.

The patch also modifies test_os.py to test the new
behavior.

In order to prevent old code from breaking, these new
return types extend tuple.  They also use the new
attribute descriptor interface. (Thanks for
PEP-025[23], Guido!)

Backward compatibility:  Code will only break if it
assumes that type(os.stat(...)) == TupleType, or if it
assumes that os.stat(...) has no attributes beyond
those defined in tuple.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-03-05 17:46

Message:
Logged In: YES 
user_id=21627

I'd not put the copyreg support into copy_reg, but into
os.py. Pickling would save a reference to
os._load_stat_result (or some such). When pickle tries to
restore the value, it would first restore
os.load_stat_result. For that, it would import os, which
would register the copy_reg support.

As for constructing structseq objects from dictionaries: it
would be a ValueError if fields within [:n_sequence_fields]
are not filled out; leaving out other fields is fine.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-05 17:32

Message:
Logged In: YES 
user_id=6656

Martin, I may not have been 100% clear in my last note, but
please run

cvs up Objects/structseq.c

structseq objects *do* now implement a __reduce__ method,
but it returns a tuple.  Using a dictionary would be more
complicated, and not solve the issue completely: what
happens when you go from a platform with less fields to one
with more?  What value does the not-prepared-for field have?

Hmm, the point about nt.stat_result is a good one.

Getting support into copy_reg.py leads to interesting
bootstrapping problems when using uninstalled builds,
unfortunately (site.py imports distutils imports re imports
copy_reg; try to import, say, time, and you can't, because
the whole reason to import distutils was to set up the path
to find dynamically linked libraries...).

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-05 17:15

Message:
Logged In: YES 
user_id=21627

To support pickling, I think structseq objects should
implement a __reduce__ method, returning the type and a
dictionary. The type's tp_new should accept dictionaries,
and reconstruct the instance from the dictionary.

Alternatively, copy_reg could grow support for stat_result,
which seems desirable anyway, since os.stat returns a
'nt.stat_result' instance on Windows.

Furthermore, fixing the number of arguments does not help at
all in pickling; __reduce__ will return an argument tuple
which includes the original object; in turn, pickle will
recurse until the stack overflows.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-05 16:50

Message:
Logged In: YES 
user_id=6656

I'm not worried about cross version problems.

The problem with pickling is that stat_results (as of today)
get pickled as "os.stat_result" and a tuple of arguments. 
The number of arguments os.stat_result takes varies by
platform (it seems to be 10 on this NT box, but it's 13 on
the starship, f'ex).  So if a stat_result gets pickled on
the starship and shoved down a socket to an NT machine, it
can't be unpickled.  I don't know if this sort of thing ever
happens, but I could see it being surprising & annoying if I
ran into it.

If os.stat_result took 13 arguments everywhere, this problem
obviously wouldn't arise.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-03-05 16:22

Message:
Logged In: YES 
user_id=21627

Adding all fields is both difficult and undesirable. It is
difficult because you may not know in advance what fields
will be added in future versions, and it is undesirable
because applications may think that there is a value even
though the is none.

What problem does that cause for pickling, and why would a
complete list of all attributes solve this problem?

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-03-05 14:45

Message:
Logged In: YES 
user_id=6656

I know this patch is closed, but it seems a vaguely sane
place to ask the question: why do we vary the number of
field of os.stat_result across platforms?  Wouldn't it be
better to let it always have the same values & fill in one's
that don't exists locally with -1 or something?

It's hard to pickle os.stat_results portably the way things
are at the moment...

----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2001-11-29 21:49

Message:
Logged In: YES 
user_id=3066

This has been checked in, edited, and checked in again.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-10-19 00:53

Message:
Logged In: YES 
user_id=499

Here's a documentation patch for libos.tex.  I don't know
the TeX macros well enough to write an analogous one for
libtime.tex; fortunately, it should be fairly easy to
extrapolate from the included patch.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-18 22:35

Message:
Logged In: YES 
user_id=6380

Thanks, Nick!  Good job.

Checked in, just in time for 2.2b1.  I'm passing this
tracker entry on to Fred for documentation.  (Fred, feel
free to pester Nick for docs.  Nick, feel free to upload
approximate patches to Doc/libos.tex and Doc/libtime.tex.
:-)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-18 21:24

Message:
Logged In: YES 
user_id=6380

I'm looking at this now.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-03 15:55

Message:
Logged In: YES 
user_id=6380

Patience, please. I'm behind reviewing this, probably won't
have time today either.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2001-10-03 15:51

Message:
Logged In: YES 
user_id=6656

If this goes in, I'd like to see it used for termios.tc
{get,set}attr too.

I could probably implement this (but not *right* now...).

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-10-02 03:56

Message:
Logged In: YES 
user_id=499

The fifth all-C (!) version, with changes as suggested by
Guido's comments via email.

Big changes: This version no longer subclasses tuple. 
Instead, it creates a general-purpose mechanism for making
struct/sequence hybrids in C.

It now includes a patch for timemodule.c as well.
Shortcomings:
(1) As before, macmodule and riscosmodule aren't tested.
(2) These new classes don't participate in GC and aren't
subclassable.  (Famous last words: "I don't think this will
matter." :) )
(3) This isn't a brand-new metaclass; it's just a quick bit
of C.  As such, you can't use this mechanism to create new
struct/tuple hybrids from Python.  (I claim this isn't a
drawback, since it's way easier to reimplement this in
python than it is to make it accessible from python.)

So, how's *this* one?


----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-10-01 17:37

Message:
Logged In: YES 
user_id=499

I've sent my email address to 'guido at python.org'.  For
reference, it's 'nickm at alum.mit.edu'.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-10-01 16:09

Message:
Logged In: YES 
user_id=6380

Nick, what's your real email? I have a bunch of feedback
related to your use of the new type stuff -- this is
uncharted territory for me too, and this SF box is too small
to type comfortably.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-10-01 04:51

Message:
Logged In: YES 
user_id=499

I think this might be the one... or at least, the
next-to-last-one. This version of the patch:

(1) moves the shared C code into a new module, "_stat", for
internal use.
(2) updates macmodule and riscosmodule to use the new code.
(3) fixes a significant reference leak in previous versions.
(4) is immune to the __new__ and __init__ bugs in previous
versions.

Things to note:
(A) I've tried to make sure that my Mac/RISCOS code was
correct, but I don't have any way to compile or test it.
(B) I'm not sure my use of PyImport_ImportModule is legit.
(C) I've allowed users to construct instances of stat_result
with < or > 13 arguments.  When this happens, attempts to
get nonexistant attributes now raise AttributeError.
(D) When dealing with Mac.xstat and RISCOS.stat, I chose to
keep backward compatibility rather than enforcing the
10-tuple rule in the docs.

Because there are new files, I can't make 'cvs diff' get
everything.  I'm uploading a zip file that contains
_statmodule.c, _statmodule.h, and a unified diff.  Please
let me know if you'd prefer a different format.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-28 16:23

Message:
Logged In: YES 
user_id=6380

Another comment: we should move this to its own file so that
other os.stat() implementations (esp. MacOS, maybe RiscOS)
that aren't in posixmodule.c can also use it, rather than
having to maintain three separate versions of the code.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-28 16:18

Message:
Logged In: YES 
user_id=6380

One comment on the patch: beautiful use of the new type
stuff, but there's something funky with the constructors
going on. It seems that the built-in __new__ (inherited from
the tuple class) requires exactly one argument -- a sequence
to be tuplified -- but your __init__ requires 13 arguments.
So construction by using posix.stat_result(...) always
fails. It makes more sense to fix the init routine to
require a 13-tuple as argument. I would also recommend
overriding the tp_new slot to require a 13-tuple: right now,
I can cause an easy core dump as follows:

>>> import os
>>> a = os.stat_result.__new__(os.stat_result, ())
>>> a.st_ctime
Segmentation fault (core dumped)
$ 



----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-28 06:20

Message:
Logged In: YES 
user_id=499

I've fixed it with the suggestions you made, and also 
   1) Added docstrings
   2) Fixed a nasty segfault bug that would be triggered by
        os.stat("/foo").__class__((10,)).st_size
      and added tests to keep it from reappearing.

I'm not sure I know how to cover Mac and RISCOS properly:
riscos.stat returns a 13-element tuple, and is hence already
incompatible with posix.stat; whereas mac.{stat|xstat}
return differing types. 

If somebody with experience with these modules could let
give me guidance as to the Right Thing, I'll be happy to
give it a shot... but my shot isn't likely to be half as
good as somebody who knew the modules better.  (For example,
I don't have the facilities to compile macmodule or
riscmodule at all, much less test them.)

I'd also be glad to make any changes that would help
maintainers of those modules.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-24 10:44

Message:
Logged In: YES 
user_id=21627

The patch looks good to me. Are you willing to revise it one
more time to cover all the stat implementations?
A few comments on the implementation:
- Why do you try to have your type participate in GC? they
will never be part of a cycle. If that ever becomes an
issue, you probably need to implement a traversal function
as well.
- I'd avoid declaring PosixStatResult, since the field
declarations are misleading. Instead, you should just add
the right number of additional in the type declaration. 


----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-21 22:07

Message:
Logged In: YES 
user_id=499

And here's an even better all-C version.  (This one doesn't
use a dictionary to store optional attributes.)

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-21 20:01

Message:
Logged In: YES 
user_id=499

Well, here's a posixmodule-only, all-C version.  If this
seems like a good approach, I'll add some better docstrings,
move it into whichever module you like, and make
riscosmodule.c and macmodule.c use it too.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-20 06:35

Message:
Logged In: YES 
user_id=6380

Or you could put it in modsupport.c, which is already a
grab-bag of handy stuff.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-19 20:36

Message:
Logged In: YES 
user_id=21627

There aren't actually so many copies of the module, since 
posixmodule implements "posix","nt", and "os2". I found 
alternative implementations in riscosmodule and macmodule.

Still, putting the support type into a shared C file is 
appropriate. I can think of two candidate places: 
tupleobject.c and fileobject.c. It may be actually 
worthwhile attempting to share the stat() implementations 
as well, but that could be an add-on.


----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-19 20:10

Message:
Logged In: YES 
user_id=499

I'm becoming more and more convinced that doing it in C is
the right thing, but I have issue with doing it in the posix
module.  The stat function is provided on (nearly?) all
platforms, and doing it in C will require minor changes to
all of these modules.  We can probably live with this, but I
don't think we should duplicate code between all of the os
modules.  

Is there some other appropriate place to put it in C?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-19 08:52

Message:
Logged In: YES 
user_id=21627

Using posix.stat is common, see

http://groups.yahoo.com/group/python-list/message/4349
http://www.washington.edu/computing/training/125/mkdoc.html
http://groups.google.com/groups?th=7d7d118fed161e0&seekm=5qdjch%24dci%40nntp6.u.washington.edu

for examples. None of these would break with your change,
though, since they don't rely on the lenght of the tuple.

If you are going to implement the type in C, I'd put it in
the posix module. If you are going to implement it in Python
(and only use it from the Posix module), making it
general-purpose may be desirable. However, a number of
things would need to be considered, so a PEP might be
appropriate. If that is done, I'd propose an interface like

tuple_with_attrs((value-tuple), (tuple-of-field-names),
exposed-length-of-tuple))

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-18 23:11

Message:
Logged In: YES 
user_id=499

Ah!  Now I see.  I hadn't realized that anybody used the
posix module directly.  (People really do this?)

I'll try to write up a patch in C tonight or tomorrow
morning.  A couple of questions on which I could use advice: 
(1) Where is the proper place to put this kind of
tuple-with-fields hybrid? Modules? Objects?  In a new file
or an existing one?
(2) Should I try to make it general enough for non-stat use?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-18 09:54

Message:
Logged In: YES 
user_id=21627

The problem with your second and third patch is that it
includes an incompatibility for users of posix.stat (and
friends), since it changes the siye of the tuple. If you
want to continue to return a tuple (as the top-level data
structure), you'll break compatibility for applications
using the C module directly. An example of code that would
be broken is

mode, ino, dev, nlink, uid, gid, size, a, c, m =
posix.stat(filename)

To pass the additional fields, you already need your class
_StatResult available in C.
You may find a way to define it in Python and use it in C,
but that has proven to be very fragile in the past.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-18 03:54

Message:
Logged In: YES 
user_id=6380

Haven't had time to review the patch yet, but the idea of
providing a structure with fields that doubles as a tuple is
a good one. It's been tried before and can be done in pure
Python as well.

Regarding the field names: I think the field names should
keep their st_ prefix -- IMO this makes the code more
recognizable and hence readable.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-18 02:32

Message:
Logged In: YES 
user_id=499

Here's the revised (*example only*) patch that takes the
more portable approach I mention below.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-18 01:10

Message:
Logged In: YES 
user_id=499

On further consideration, the approach taken in the second
(*example only*) patch is indeed too fragile.  The C code
should not lengthen the tuple arbitrarily and depend on the
Python code to decode it; instead, it should return a
dictionary of extra fields.  I think that this approach uses
a minimum of C, is easily maintainable, and very extensible.

----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-18 00:53

Message:
Logged In: YES 
user_id=499

Martin: I'm not entirely sure what you mean here; while my
patch for extra fields requires a minor chunk of C (to
access the struct fields), the rest still works in pure
python.  I'm attaching this second version for reference.

I'm not sure it makes much sense to do this with pure C; it
would certainly take a lot more code, with little benefit I
can descern.  But you're more experienced than I; what am I
missing?

I agree that the field naming is suboptimal; I was taking my
lead from the stat and statvfs modules.  If people prefer,
we can name the fields whatever we like.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-18 00:24

Message:
Logged In: YES 
user_id=21627

I second the request for supporting additional fields 
where available. At the same time, it appears 
unimplementable using pure Python.

Consequently, I'd like to see this patch redone in C. The 
implementation strategy could probably remain the same, 
i.e. inherit from tuple for best compatibility; add the 
remaining fields as slots. It may be reasonable to 
implement attribute access using a custom getattr 
function, though. 

I have also my doubts about the naming of the fields. The 
st_ prefix originates from the time where struct fields 
were living in the global namespace (i.e. across different 
structures), so prefixing them for uniqueness was 
essential. I'm not sure whether we should inherit this 
into Python...


----------------------------------------------------------------------

Comment By: Nick Mathewson (nickm)
Date: 2001-09-17 22:58

Message:
Logged In: YES 
user_id=499

BTW, if this gets in, I have another patch that adds support
for st_blksize, st_blocks, and st_rdev on platforms that
support them.  It don't expose these new fields in the
tuple, as that would break all the old code that tries to
unpack all the fields of the tuple.  Instead, these fields
are only accessible as attributes. 

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=462296&group_id=5470