I've run into a problem with large files using Python 2.1.2 and a
Linux 2.4.9 box. We've got a large file -- almost 6GB -- that Python
chokes on even though regular shell tools seem to be fine.
In particular, os.stat() of the file fails with EOVERFLOW and open()
of the file fails with EFBIG. The stat() failure is really bad
because it means os.path.exists() returns false.
strace tells me that other tools open the file passing O_LARGEFILE,
but Python does not. (They pass it even for small files.) I can't
find any succient explanation of O_LARGEFILE, but Google turns up all
sorts of pages that mention it. It looks like the right way to open
large files, but it only seems to be defined in
I've run into a problem with large files using Python 2.1.2 and a Linux 2.4.9 box. We've got a large file -- almost 6GB -- that Python chokes on even though regular shell tools seem to be fine.
Was this Python configured for large file support? I think you have to turn that on somehow, and then everything is automatic. --Guido van Rossum (home page: http://www.python.org/~guido/)
"GvR" == Guido van Rossum
writes:
I've run into a problem with large files using Python 2.1.2 and a Linux 2.4.9 box. We've got a large file -- almost 6GB -- that Python chokes on even though regular shell tools seem to be fine.
GvR> Was this Python configured for large file support? I think you GvR> have to turn that on somehow, and then everything is automatic. Indeed, I think my message ought to be mostly disregarded :-). I was told that Python had been built with large file support, but didn't test it myself. However, I'm still unhappy with one thing related to large file support. If you've got a Python that doesn't have large file support and you try os.path.exists() on a large file, it will return false. This is really bad! Imagine you've got code that says, if the file doesn't exist open with mode "w+b" :-(. I'd be happiest if os.path.exists() would work regardless of whether Python supported large files. I'd be satisifed with an exception that at least let me know something went wrong. Jeremy
However, I'm still unhappy with one thing related to large file support. If you've got a Python that doesn't have large file support and you try os.path.exists() on a large file, it will return false. This is really bad! Imagine you've got code that says, if the file doesn't exist open with mode "w+b" :-(.
Wow, that sucks.
I'd be happiest if os.path.exists() would work regardless of whether Python supported large files. I'd be satisifed with an exception that at least let me know something went wrong.
Is there an errno we can test for? stat() for a non-existent file raises one exception, stat() for a file in a directory you can't read raises a different one; maybe stat of a large file raises something else again? I think os.path.exists() ought to return True in this case. --Guido van Rossum (home page: http://www.python.org/~guido/)
"GvR" == Guido van Rossum
writes:
I'd be happiest if os.path.exists() would work regardless of whether Python supported large files. I'd be satisifed with an exception that at least let me know something went wrong.
GvR> Is there an errno we can test for? stat() for a non-existent GvR> file raises one exception, stat() for a file in a directory you GvR> can't read raises a different one; maybe stat of a large file GvR> raises something else again? I think os.path.exists() ought to GvR> return True in this case. On the platform I tried (apparently RH 7.1) it raises EOVERFLOW. I can extend posixpath to treat that as "file exists" tomorrow. Jeremy
On the platform I tried (apparently RH 7.1) it raises EOVERFLOW. I can extend posixpath to treat that as "file exists" tomorrow.
OK. Be sure to check that the errno module and the value errno.EOVERFLOW exist before using them! --Guido van Rossum (home page: http://www.python.org/~guido/)
On Mon, Jun 17, 2002 at 04:28:23PM -0400, Jeremy Hylton wrote:
On the platform I tried (apparently RH 7.1) it raises EOVERFLOW. I can extend posixpath to treat that as "file exists" tomorrow.
How about changing os.path.exists for posix to:
def exists(path):
return(os.access(path, os.F_OK))
I haven't done more than a few simple tests, but I believe that this would
provide similar functionality without relying on os.stat not breaking.
Plus, access is faster (on the order of 2x as fast stating a quarter
million files on my laptop).
Sean
--
I have never been able to conceive how any rational being could propose
happiness to himself from the exercise of power over others. -- Jefferson
Sean Reifschneider, Inimitably Superfluous
How about changing os.path.exists for posix to:
def exists(path): return(os.access(path, os.F_OK))
NO, NO, NOOOOOOO! access() does something different. It checks permissions as they would be for the effective user id. DO NOT USE access() TO CHECK FOR FILE PERMISSIONS UNLESS YOU HAVE A SET-UID MISSION! --Guido van Rossum (home page: http://www.python.org/~guido/)
On Mon, Jun 17, 2002 at 11:02:42PM -0400, Guido van Rossum wrote:
How about changing os.path.exists for posix to:
def exists(path): return(os.access(path, os.F_OK))
NO, NO, NOOOOOOO!
access() does something different. It checks permissions as they
F_OK checks to see if the file exists. Am I misunderstanding something in
the following test:
[2] guin:tmp# cd /tmp
[2] guin:tmp# mkdir test
[2] guin:tmp# chmod 700 test
[2] guin:tmp# touch test/exists
[2] guin:tmp# chmod 700 test/exists
[2] guin:tmp# su -c '/tmp/showaccess /tmp/test/exists' jafo
access: 0 exists: 0
[2] guin:tmp# chmod 111 /tmp/test
[2] guin:tmp# su -c '/tmp/showaccess /tmp/test/exists' jafo
access: 1 exists: 1
[2] guin:tmp# chmod 000 test/exists
[2] guin:tmp# su -c '/tmp/showaccess /tmp/test/exists' jafo
access: 1 exists: 1
[2] guin:tmp# chmod 000 /tmp/test
[2] guin:tmp# su -c '/tmp/showaccess /tmp/test/exists' jafo
access: 0 exists: 0
[2] guin:tmp# su -c '/tmp/showaccess /tmp/test/noexists' jafo
access: 0 exists: 0
[2] guin:tmp# chmod 777 /tmp/test
[2] guin:tmp# su -c '/tmp/showaccess /tmp/test/noexists' jafo
access: 0 exists: 0
The above is run as root, with the su doing the test as non-root. The code
in showaccess simply does an os.access and then an os.path.exists and
displays the results:
[2] guin:tmp# cat /tmp/showaccess
#!/usr/bin/env python2
import os, sys
print 'access: %d exists: %d' % ( os.access(sys.argv[1], os.F_OK),
os.path.exists(sys.argv[1]))
Sean
--
/home is where your .heart is. -- Sean Reifschneider, 1999
Sean Reifschneider, Inimitably Superfluous
def exists(path): return(os.access(path, os.F_OK))
NO, NO, NOOOOOOO!
access() does something different. It checks permissions as they
F_OK checks to see if the file exists.
It is my understanding that if some directory along the path to the file is accessible to root but not to the effective user, access() for a file in that directory might return 0 while exists would return 1, on some operating systems. There's only one rule for access(): only use it if you have a set-uid mission. --Guido van Rossum (home page: http://www.python.org/~guido/)
On Mon, Jun 17, 2002 at 11:25:34PM -0400, Guido van Rossum wrote:
It is my understanding that if some directory along the path to the file is accessible to root but not to the effective user, access() for a file in that directory might return 0 while exists would return 1,
I would be shocked if POSIX allowed a non-root user to probe file entries
under a root/700 directory...
What a paradox -- when I submitted the patch to add F_OK, you said that
exists() did the same thing. ;-)
Sean
--
"Your documents always look so good." "That's because I keep my laser-printer
set on ``stun''." -- Sean Reifschneider, 1998
Sean Reifschneider, Inimitably Superfluous
I would be shocked if POSIX allowed a non-root user to probe file entries under a root/700 directory...
Exactly. If a program is written to use access(), and subsequently that program is used in a setuid(root) situation, access() will say you can't access the file, but exists() will say it exists. So access() cannot be used to emulate exists() -- they serve different purposes, and can return different results.
What a paradox -- when I submitted the patch to add F_OK, you said that exists() did the same thing. ;-)
Given the widespread misunderstanding of what access() does, anything that makes using access() easier is a mistake IMO. --Guido van Rossum (home page: http://www.python.org/~guido/)
On Tue, Jun 18, 2002 at 08:05:35AM -0400, Guido van Rossum wrote:
Given the widespread misunderstanding of what access() does, anything that makes using access() easier is a mistake IMO.
I obviously need to re-read my Posix reference. I've submitted a docstr
and library documentation change for os.access() which should make it clear
what the issues are...
Sean
--
You know you're in Canada when: You see a flyer advertising a polka-fest
at the curling rink.
Sean Reifschneider, Inimitably Superfluous
jeremy@zope.com (Jeremy Hylton) writes:
However, I'm still unhappy with one thing related to large file support. If you've got a Python that doesn't have large file support and you try os.path.exists() on a large file, it will return false. This is really bad!
I believe this is a pilot error. On a system that supports large files, it is the administrator's job to make sure the Python installation has large file support enabled, otherwise, strange things may happen. So yes, it is bad, but no, it is not really bad. Feel free to fix it, but be prepared to include work-arounds in many other places, too. Regards, Martin
I believe this is a pilot error. On a system that supports large files, it is the administrator's job to make sure the Python installation has large file support enabled, otherwise, strange things may happen.
I'm not sure who to blame, but note that (at least for 2.1.2, which is the version that Jeremy said he was given to use) large file support must be configured manually. So this might be a common problem. Unfortunately that may mean that it's only worth fixing in 2.1.4... --Guido van Rossum (home page: http://www.python.org/~guido/)
"MvL" == Martin v Loewis
writes:
MvL> jeremy@zope.com (Jeremy Hylton) writes:
However, I'm still unhappy with one thing related to large file support. If you've got a Python that doesn't have large file support and you try os.path.exists() on a large file, it will return false. This is really bad!
MvL> I believe this is a pilot error. On a system that supports MvL> large files, it is the administrator's job to make sure the MvL> Python installation has large file support enabled, otherwise, MvL> strange things may happen. We sure don't provide much help for such an administrator. (Happily, I am not one.) The instructions for Linux offer a configure recipe and says "it might work." If you build without large file support on a Linux system, the test suite gives no indication that something went wrong. So I think it is unreasonable to say the Python install is broken, despite the fact that it's possible to do better. MvL> So yes, it is bad, but no, it is not really bad. Feel free to MvL> fix it, but be prepared to include work-arounds in many other MvL> places, too. os.path.exists() is perhaps the most egregious. I think it's worth backporting the fix to the 2.1 branch, along with any other glaring errors. We might still see a 2.1.4. Jeremy
jeremy@zope.com (Jeremy Hylton) writes:
We sure don't provide much help for such an administrator.
We do, but not in 2.1.
os.path.exists() is perhaps the most egregious. I think it's worth backporting the fix to the 2.1 branch, along with any other glaring errors. We might still see a 2.1.4.
In that case, I recommend to backport the machinery that enables LFS from 2.2. If this machinery fails to detect LFS support on a system, there is a good chance that your processing of EOVERFLOW fails on that system as well. Regards, Martin
In that case, I recommend to backport the machinery that enables LFS from 2.2. If this machinery fails to detect LFS support on a system, there is a good chance that your processing of EOVERFLOW fails on that system as well.
That sounds a good plan, though painful (much configure.in hacking, and didn't we switch to a newer version of autoconf?). Can you help? 2.1 is still a popular release, and large files will become more and more common as it grows older... --Guido van Rossum (home page: http://www.python.org/~guido/)
>> If you've got a Python that doesn't have large file support and you >> try os.path.exists() on a large file, it will return false. This is >> really bad! Martin> I believe this is a pilot error. On a system that supports large Martin> files, it is the administrator's job to make sure the Python Martin> installation has large file support enabled, otherwise, strange Martin> things may happen. What about a networked environment? If machine A without large file support mounts an NFS directory from machine B that does support large files, what should a program running on A see if it attempts to stat a large file? Sounds like the EOVERFLOW thing would come in handy here. Skip
Skip Montanaro
What about a networked environment? If machine A without large file support mounts an NFS directory from machine B that does support large files, what should a program running on A see if it attempts to stat a large file?
I would have to read the specs to answer this question correctly, but I believe the answer would go like this: case 1: Machine A only supports NFSv2, which does not support large files. When machine A accesses a large file on machine B (through the NFS GETATTR operation), it will see a truncated file. Notice that the exact behaviour depends on the NFSv2 implementation on machine B. case 2: Machine A supports NFSv3, and the client NFS implementation correctly recognizes the large file. Now, you say "A has no large file support". That could either mean that the syscalls don't support that, or that the C library doesn't support that. If the kernel does not support it, it may be that it does not define EOVERFLOW, either. Most likely, you will again see the truncated value.
Sounds like the EOVERFLOW thing would come in handy here.
It's not our choice whether the operating system reports EOVERFLOW, or a truncated file. My guess is that you likely see a truncated file, but you would need to specify a precise combination of (client C lib, client OS, wire NFS version, server OS) to find out what really happens. My guess is that if the system is not aware of large files, it likely won't work "correctly" when it sees one, with Python having no way to influence the outcome. Regards, Martin
I remember that we had several times trouble with compiling
Python 2.1 under Redhat with LF support. Also the way described in the docs
did not work in all cases and we had to tweak the sources at bit
(I think it was posixmodule.c).
-aj
--On Monday, June 17, 2002 15:22 -0400 Jeremy Hylton
I've run into a problem with large files using Python 2.1.2 and a Linux 2.4.9 box. We've got a large file -- almost 6GB -- that Python chokes on even though regular shell tools seem to be fine.
In particular, os.stat() of the file fails with EOVERFLOW and open() of the file fails with EFBIG. The stat() failure is really bad because it means os.path.exists() returns false.
strace tells me that other tools open the file passing O_LARGEFILE, but Python does not. (They pass it even for small files.) I can't find any succient explanation of O_LARGEFILE, but Google turns up all sorts of pages that mention it. It looks like the right way to open large files, but it only seems to be defined in
on the Linux box in question. I haven't had any luck searching for a decent way to invoke stat() and have it be prepared for a very large file.
I think Python is definitely broken here. Can anyone offer any clues or pointers to documentation? Better yet, a fix. I'm happy to help integrate and test it.
Jeremy
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev
--------------------------------------------------------------------- - Andreas Jung http://www.andreas-jung.com - - EMail: andreas at andreas-jung.com - - "Life is too short to (re)write parsers" - ---------------------------------------------------------------------
Andreas Jung
I remember that we had several times trouble with compiling Python 2.1 under Redhat with LF support. Also the way described in the docs did not work in all cases and we had to tweak the sources at bit (I think it was posixmodule.c).
For the current 2.1 release, the docs are believed to be correct (the instructions used to be incorrect, as was the code). For 2.2, it is believed that no extra configuration is necessary on "most" systems (Windows, Linux, Solaris). Regards, Martin
Jeremy Hylton wrote:
can't find any succient explanation of O_LARGEFILE, but Google turns up all sorts of pages that mention it. It looks like the right way to open large files, but it only seems to be defined in
on the Linux box in question.
Perhaps it is set by libc if the application is compiled with large file support. Neil
participants (8)
-
Andreas Jung
-
Guido van Rossum
-
Jeremy Hylton
-
jeremy@zope.com
-
martin@v.loewis.de
-
Neil Schemenauer
-
Sean Reifschneider
-
Skip Montanaro