[ python-Bugs-1729170 ] os.stat producing incorrect / invalid results
SourceForge.net
noreply at sourceforge.net
Sun Jun 3 10:02:08 CEST 2007
Bugs item #1729170, was opened at 2007-05-31 23:24
Message generated for change (Comment added) made by loewis
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1729170&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.5
>Status: Closed
>Resolution: Invalid
Priority: 5
Private: No
Submitted By: Joe (joesalmeri)
Assigned to: Martin v. Löwis (loewis)
Summary: os.stat producing incorrect / invalid results
Initial Comment:
Python 2.51
Windows XP SP2 + all updates
*** NOTE all code has worked on previous python versions
I just upgraded from Python 2.4.2 to Python 2.5.1 and have found some
unexpected behavior that appears to be a bug in the os.stat module.
My OS is Windows XP SP2 + all updates.
I have several programs that have worked flawlessly on all previous Python
versions for years and they are now producing incorrect results in the code
that uses os.stat.
Searching through the 2.5.1 release notes I found the following:
Use Win32 API to implement os.stat/fstat. As a result, subsecond
timestamps
are reported, the limit on path name lengths is removed, and stat
reports
WindowsError now (instead of OSError).
*********************
* Overview of the problem:
*********************
Reviewing my code it seems that it should still work with the 2.5.1 os.stat
changes however that does not appear to be the case.
Timestamps reported by os.stat() are no longer correct and the results are
not even consistent.
In my first test case ALL 3 timestamps reported by Python are 1 hour less
than the actual timestamp on the file.
In my second test case the created and last accessed timestamps reported by
Python are correct but the last write timestamp is 1 hour less than the
actual timestamp on the file.
As a third test I looked at the last write timestamps on approximately
21,000 files.
Python determined wrong last write timestamp on approximately 1581 files.
Assuming there is no error in the following code that prints out the
timestamps using the new return value from os.stat() then it would appear
that the 2.5.1 os.stat changes have a bug.
print 'Creation Time: %s' % time.strftime('%m/%d/%Y %H:%M:%S',
time.localtime(file_stats[stat.ST_CTIME]))
print 'Last Access Time: %s' % time.strftime('%m/%d/%Y %H:%M:%S',
time.localtime(file_stats[stat.ST_ATIME]))
print 'Last Write Time: %s' % time.strftime('%m/%d/%Y %H:%M:%S',
time.localtime(file_stats[stat.ST_MTIME]))
*********************
* Detailed test results
*********************
To demonstrate the problem I have created the following test.
Here are the files that will be used in my test and their associated
timestamps as reported the the dir command.
01/02/2003 12:34 PM 0 broke_test
03/06/2007 05:24 PM 3,497,177 broke_test2
05/31/2007 04:35 PM 254 runtest.cmd
05/31/2007 04:31 PM 513 broke.py
The file named broke_test has a timestamp of 01/02/2003 12:34:56 (dir does
not show seconds).
The runtest.cmd script shows the created, last accessed, and last write
timestamps as reported by the dir command (and also verified via right
clicking on the file and selecting properties in Explorer.
------ >>> START runtest.cmd script <<<<
@Echo Off
echo Create TimeStamp reported by dir command
dir /tc %1
echo Last Access TimeStamp reported by dir command
dir /ta %1
echo Last Write TimeStamp reported by dir command
dir /tw %1
echo Python 2.5.1 timestamp info
broke.py %1
------ >>> END runtest.cmd script <<<<
The broke.py script prints out the created, last accessed, last write
timestamps as Python sees them.
------ >>> START broke.py script <<<<
import sys
import os
import stat
import time
file_name = sys.argv[1]
file_stats = os.stat(file_name)
print
print 'File Name : %s' % (file_name)
print
print 'Creation Time: %s' % time.strftime('%m/%d/%Y %H:%M:%S',
time.localtime(file_stats[stat.ST_CTIME]))
print 'Last Access Time: %s' % time.strftime('%m/%d/%Y %H:%M:%S',
time.localtime(file_stats[stat.ST_ATIME]))
print 'Last Write Time: %s' % time.strftime('%m/%d/%Y %H:%M:%S',
time.localtime(file_stats[stat.ST_MTIME]))
print
------ >>> END broke.py script <<<<
#
# Test 1 on file broke_test
#
runtest broke_test
Create TimeStamp reported by dir command
01/02/2003 12:34 PM 0 broke_test
Last Access TimeStamp reported by dir command
01/02/2003 12:34 PM 0 broke_test
Last Write TimeStamp reported by dir command
01/02/2003 12:34 PM 0 broke_test
Python 2.5.1 timestamp info
File Name : broke_test
Creation Time: 01/02/2003 11:34:56 -- Python results are WRONG hours
reported are 1 hour short
Last Access Time: 01/02/2003 11:34:56 -- Python results are WRONG hours
reported are 1 hour short
Last Write Time: 01/02/2003 11:34:56 -- Python results are WRONG hours
reported are 1 hour short
#
# Test 2 on file broke_test2
#
runtest broke_test2
Create TimeStamp reported by dir command
05/31/2007 04:26 PM 3,497,177 broke_test2
Last Access TimeStamp reported by dir command
05/31/2007 04:26 PM 3,497,177 broke_test2
Last Write TimeStamp reported by dir command
03/06/2007 05:24 PM 3,497,177 broke_test2
Python 2.5.1 timestamp info
File Name : broke_test2
Creation Time: 05/31/2007 16:26:36 -- Python results are correct
Last Access Time: 05/31/2007 16:26:38 -- Python results are correct
Last Write Time: 03/06/2007 16:24:21 -- Python results are WRONG hours
reported are 1 hour short
----------------------------------------------------------------------
>Comment By: Martin v. Löwis (loewis)
Date: 2007-06-03 10:02
Message:
Logged In: YES
user_id=21627
Originator: NO
As discussed on comp.lang.python, the 1hour difference is just a display
issue resulting from daylight-saving time, not an issue with the os.stat
result. Closing this report as invalid.
----------------------------------------------------------------------
Comment By: Martin v. Löwis (loewis)
Date: 2007-06-02 11:04
Message:
Logged In: YES
user_id=21627
Originator: NO
I think there are several things going on here, and I believe that Python
is behaving correctly.
On Win32, there are two ways to find out the time stamps of a file:
FindFirstFile reads it from the directory, and GetFileAttributesEx gets it
directly from the file (i.e. on NTFS, from the MFT record). When a file is
modified, NTFS normally modifies the file only, and updates the directory
only occasionally (due to hard links, there might be multiple directories
to update also).
In particular, the last access time is rarely written back to the
directory. You can easily see this for yourself with the following
procedure:
1. Inspect the last-access time with dir /ta, or in Explorer.
2. Open the file in an editor, but don't edit it; close the editor.
3. Inspect the last-access time with dir /ta, and with broke.py
4. Wait 10 minutes.
5. Repeat steps 2 and 3.
You should see that broke.py will report that the file was accessed,
whereas Explorer and dir/ta will report an "old" last-acecssed time.
I created a program that displays both the directory-stored and the
file-stored last-acecss time (attached). This program may produce outputs
such as
C:\Pyinst\25\python>..\..\unstop\pa\Debug\pa.exe .svn\entries
Last Access according to:
FindFirstFile: 128252459235000000 2007-6-2 8:18:43
GetFileAttributesEx: 128252466734062500 2007-6-2 8:31:13
Please refer to
http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/fsutil_behavior.mspx?mfr=true
There, they explain that the access time (if it is stored at all, which
can
be turned off) gets updated on disk only if the actual last-access time
and the stored last-access time differ by 1 hour. So if you perform steps
2 and 3 above after 1 hour again, you will indeed see that the last
access
time gets updated in explorer, too.
Python 2.4 did use the same (incorrect) source that dir/ta and the
explorer
uses; Python 2.5 has fixed that bug.
So I think we should remove the last-access time from our discussion.
File Added: pa.zip
----------------------------------------------------------------------
Comment By: Neal Norwitz (nnorwitz)
Date: 2007-06-01 07:39
Message:
Logged In: YES
user_id=33168
Originator: NO
Martin, was there a similar problem to this one? I looked in the log for
posixmodule.c and didn't see this fixed.
----------------------------------------------------------------------
Comment By: Joe (joesalmeri)
Date: 2007-06-01 03:19
Message:
Logged In: YES
user_id=1235002
Originator: YES
More details and a program that demonstrates the problem
I created a file and specifically set the created date, last accessed date
and last write date to
01/02/2003 12:34:56
After setting the date and time on the file I verified the following:
The dir /tc command shows the correct Create date of 01/02/2003 12:34
The dir /ta command shows the correct Access date of 01/02/2003 12:34
The dir /tw command shows the correct Write date of 01/02/2003 12:34
Looking at the properties of the file in Windows Explorer also shows that
all 3 dates match.
When Python 2.4.2 os.stat is used to get those timestamps all 3 are
CORRECT
When Python 2.5.1 os.stat is used to get those timestamps all 3 are WRONG
and they are off by exact 1 hour
01/02/2003 11:34:56 instead of 01/02/2003 12:34:56
In the case of my above test I know exactly what the timestamp on the file
is because I manually set it so that all 3 timestamps are the same.
Since Python 2.5.1 does not return the known values for that files
timestamps how can it not be a Python 2.5.1 bug?
Further testing shows that the results are somewhat inconsistent, many
times
the create and access date are correct but the Last Write timestamp is
wrong. It is generally off by one hour but I have seen situations where
it
was +1 hour and other situations where it was -1 hour.
I even found situations where the python timestamp was 1 minute later. (I
know about the 2 second timestamps on FAT, all my filesystems are NTFS).
I
just found a situation where the python timestamp was 02:51 PM and the
windows timestamp was 02:12 PM. DST or timezone changes are not going to
make the results be off by 39 minutes? (My timezone is GMT - 5:00).
As an additional test I wrote the following Python program which takes a
directory name as a paramater.
It retrieves all 3 dates for all files in all directories and
subdirectories
for the specified directory name.
First it gets the dates using Python and os.stat, then it runs the dir
/tc,
dir /ta, and dir /tw commands on the file.
Finally it compares the dates and times returned by Python against the
dates
and times that the Windows dir command is reporting.
If you run the program using Python 2.4.2 then ALL dates and times
returned
by Python 2.4.2 correctly match the dates and times that Windows reports.
If you run the program using Python 2.5.1 against the same directory then
you get intermittient results.
As a quick test I ran it against a directory on my home directory.
Python 2.4.2 all dates matched between what the Windows dir command
reported
and what Python 2.4.2 reported.
Python 2.5.1 had the following differences (more than 50% of the time it
was
wrong)
Total Files Processed : 149
Matched All 3 Dates : 70
Did NOT Match All 3 Dates: 79
Matched Create Date : 69
Did NOT Match Create Date: 10
Matched Access Date : 59
Did NOT Match Access Date: 20
Matched Write Date : 11
Did NOT Match Write Date : 68
Here's the source for the check_dates.py program. Run it for youself and
see what your results are.
import os
import stat
import sys
import time
if len(sys.argv) == 1:
print
print '%s path_to_check' % (sys.argv[0])
print
raise SystemExit
else:
path_to_check = sys.argv[1]
file_count = 0
file_match_count = 0
file_no_match_count = 0
create_match_count = 0
create_no_match_count = 0
access_match_count = 0
access_no_match_count = 0
write_match_count = 0
write_no_match_count = 0
for root, dirs, files in os.walk(r'%s' % path_to_check):
for file in files:
file_count = file_count + 1
file_name = os.path.join(root, file)
create_ts_match = False
access_ts_match = False
write_ts_match = False
file_stats = os.stat(file_name)
python_create_ts = time.strftime('%m/%d/%Y %I:%M %p',
time.localtime(file_stats[stat.ST_CTIME]))
python_access_ts = time.strftime('%m/%d/%Y %I:%M %p',
time.localtime(file_stats[stat.ST_ATIME]))
python_write_ts = time.strftime('%m/%d/%Y %I:%M %p',
time.localtime(file_stats[stat.ST_MTIME]))
win_create_ts = os.popen('dir /a /tc "%s"' %
(file_name)).readlines()[5][0:20]
win_access_ts = os.popen('dir /a /ta "%s"' %
(file_name)).readlines()[5][0:20]
win_write_ts = os.popen('dir /a /tw "%s"' %
(file_name)).readlines()[5][0:20]
if python_create_ts == win_create_ts: create_ts_match = True
if python_access_ts == win_access_ts: access_ts_match = True
if python_write_ts == win_write_ts: write_ts_match = True
if create_ts_match and access_ts_match and write_ts_match:
# All timestamps match
file_match_count = file_match_count + 1
else:
file_no_match_count = file_no_match_count + 1
print
print 'File Name : %s' % (file_name)
print
if create_ts_match:
create_match_count = create_match_count + 1
else:
create_no_match_count = create_no_match_count + 1
print 'python_create_ts: %s' % (python_create_ts)
print 'win_create_ts : %s' % (win_create_ts)
print
if access_ts_match:
access_match_count = access_match_count + 1
else:
access_no_match_count = access_no_match_count + 1
print 'python_access_ts: %s' % (python_access_ts)
print 'win_access_ts : %s' % (win_access_ts)
print
if write_ts_match:
write_match_count = write_match_count + 1
else:
write_no_match_count = write_no_match_count + 1
print 'python_write_ts : %s' % (python_write_ts)
print 'win_write_ts : %s' % (win_write_ts)
print
#
# Show Count Results
#
print 'Total Files Processed : %s' % (file_count)
print
print 'Matched All 3 Dates : %s' % (file_match_count)
print 'Did NOT Match All 3 Dates: %s' % (file_no_match_count)
print
print 'Matched Create Date : %s' % (create_match_count)
print 'Did NOT Match Create Date: %s' % (create_no_match_count)
print
print 'Matched Access Date : %s' % (access_match_count)
print 'Did NOT Match Access Date: %s' % (access_no_match_count)
print
print 'Matched Write Date : %s' % (write_match_count)
print 'Did NOT Match Write Date : %s' % (write_no_match_count)
print
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1729170&group_id=5470
More information about the Python-bugs-list
mailing list