[Python-bugs-list] [ python-Bugs-451890 ] Building with Large File Support fails

noreply@sourceforge.net noreply@sourceforge.net
Mon, 10 Sep 2001 06:35:43 -0700


Bugs item #451890, was opened at 2001-08-16 18:00
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=451890&group_id=5470

Category: Build
Group: Python 2.2
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: Gerhard Häring (ghaering)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Building with Large File Support fails

Initial Comment:
(At least) on Linux, building 2.2-HEAD fails when 
building with Large File Support. In 
Objects/fileobject.c function _portable_ftell line 
262.


----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-10 06:35

Message:
Logged In: YES 
user_id=6380

The config-time test is already removed.

I've just checked in changes to test_largefile.py that make
it skip itself when the filesystem doesn't support files >
2GB.

Closing this now.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-10 00:27

Message:
Logged In: YES 
user_id=21627

Gerhard's comment is right on target: it certainly depends
on the file system. E.g. in case of NFS, you not only need
NFSv3, but also the remote file system must support large
files. So even on a single installation, large files may
work in some directory, but not in another.

Therefore, I recommend to remove the configuration-time test
whether creating a large file is possible. Notice that your
test may well consume unreasonable amounts of disk space
before failing, on some broken system.
With the configure-time test removed, test_largefile might
be skipped. This is no big deal; the same test may pass if
the binary is moved to another system.

----------------------------------------------------------------------

Comment By: Gerhard Häring (ghaering)
Date: 2001-09-09 19:35

Message:
Logged In: YES 
user_id=163326

Just a quick update. I've tested your latest CVS changes and
 I can seek and write with offsets above sys.maxint just
fine now. Out of the box (on my Linux). The filesystem must
support LFS, too, of course. Even reiserfs doesn't support
that w/o formatting the partition with "-v 2". I can't speak
for ext2, but I guess you must format the partition with
some special option, to to support files > 2 GB.

(Just FYI, to save some time: for just testing seek, you can
open "/dev/null" or "/dev/zero".)


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-09 19:09

Message:
Logged In: YES 
user_id=6380

I've done this in CVS now, but now the largefile build even
triggers on systems where the kernel (or the filesystem?)
doesn't support large files, but glibc does. Seeking to a
position > 2GB works, but writing triggers an IOError
exception on flush() or close(). In some sense this is right
(the binary might be moved to another kernel). But on such a
system test_largefile now fails, because its test for
largefile "support" isn't good enough. What to do next? Put
some test for a largefile-supporting kernel in the configure
script, or in test_largefile?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-09 01:44

Message:
Logged In: YES 
user_id=21627

I'd recommend to always ac_define _LARGEFILE_SOURCE and 
_FILE_OFFSET_BITS=64. It will be very hard to find in a 
test what exactly they change. Instead, we should trust 
that if they are recognized at all, they do the right 
thing. If there is an early AC_DEFINE for them, they will 
get into confdefs.h and influence the outcome of all later 
tests (e.g. the one measuring off_t).




----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-08 21:18

Message:
Logged In: YES 
user_id=6380

Interesting!  My test script for large files worked, so
_FILE_OFFSET_BITS and _LARGEFILE_SOURCE are defined in your
pyconfig.h, but apparently the test for
HAVE_LARGEFILE_SUPPORT failed, because that symbol is *not*
set in your pyconfig.h -- and everthing else keys off it!

So the only symbol you really need to pass is
HAVE_LARGEFILE_SUPPORT, and as a workaround you can define
that yourself in pyconfig.h.

This symbol is defined by a bit of configure code that looks
like this in the m4 input:

AC_MSG_CHECKING(whether to enable large file support)
if test "$have_long_long" = yes -a \
	"$ac_cv_sizeof_off_t" -gt "$ac_cv_sizeof_long" -a \
	"$ac_cv_sizeof_long_long" -ge "$ac_cv_sizeof_off_t"; then
  AC_DEFINE(HAVE_LARGEFILE_SUPPORT)
  AC_MSG_RESULT(yes)
else
  AC_MSG_RESULT(no)
fi

Can you upload config.status? That should tell me which of
those symbols doesn't have the right value. My guess is that
off_t is measured at 32 bits because _FILE_OFFSET_BITS is
not defined as 64 at the point that the symbol is measured.
So I have to tweak more stuff...  Back to the drawing board.
:-(

----------------------------------------------------------------------

Comment By: Gerhard Häring (ghaering)
Date: 2001-09-08 13:10

Message:
Logged In: YES 
user_id=163326

To find out the glibc version, you can invoke "glibcbug". 
My default bug report says:
...
Release:       libc-2.2.2
No, I don't get LFS support without manual work, with
CVS-HEAD and 2.2a3. I've uploaded my entire config.log file,
maybe you can make some sense of it. (it does find fello and
fseeko, but my pyconfig.h doesn't define the needed macros).
Come to think of it, I'll upload my pyconfig.h, too.



----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2001-09-08 12:22

Message:
Logged In: NO 

(This is Guido, in a hurry, not logged in :-)

Gerhard, I'm surprised you still had to pass options to
make. It works without those for me. (How do I tell the
version of glibc I'm using?)

Can you tell me what config.log says after
"checking for CFLAGS to enable large files"?

Have you tried 2.2a3?

----------------------------------------------------------------------

Comment By: Gerhard Häring (ghaering)
Date: 2001-09-08 12:12

Message:
Logged In: YES 
user_id=163326

Guido, I can build the current CVS now with LFS, too (Linux
2.4, glibc 2.2). I saw you did a lot in the configure
script, but I still had to give options to the make command
(grabbed them from Sean's latest source RPMs).

This worked for me:
./configure
make OPT="-g -O3 -D_FILE_OFFSET_BITS=64
-DHAVE_LARGEFILE_SUPPORT" CFLAGS="-g -O3
-D_FILE_OFFSET_BITS=64 -DHAVE_LARGEFILE_SUPPORT" 

Shouldn't the feature define HAVE_LARGEFILE_SUPPORT be
automatically added to pyconfig.h?

It would perhaps be a good idea add the info on how to build
with LFS to the build instructions.

Thanks,
Gerhard


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-09-05 11:36

Message:
Logged In: YES 
user_id=6380

Gerhard, can you try the current CVS? I've done a few things
to try and fix this. I can now build just fine on a pretty
recent Linux 2.4 kernel.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-03 02:23

Message:
Logged In: YES 
user_id=21627

To fix the bug at hand (building fails), the following
strategy might be sufficient:
- produce an autoconf test that checks whether fpos_t is
integral, and "large"; define this by default for MSVC
- use this test in portable_fseek/portable_ftell.

I also wonder why the order in which APIs are tried is
different in fseek and ftell (fseek tries fseeko first,
ftell tries ftello only second).

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2001-08-20 13:19

Message:
Logged In: YES 
user_id=31435

By itself, adding opaque getpos/setpos sounds pretty easy 
(BTW, f{get,set}pos are std in C99).

Returning a usable 64-bit integer remains a x-platform 
mess.  The C99 rationale sez f{get,set}pos are the intended 
way to work with large files, but they provide no way to 
break the abstraction (Jeremy & I both looked in vain -- 
there is no defined way to extract the stream position from 
an fpos_t object, neither to do arithmetic on one).

On Windows, f{get,set}pos are (currently) the only way to 
get a 64-bit stream position from the MS C library (and MS 
doesn't (currently) mix that in with a state encoding; the 
Win32 API has other ways to deal with this).

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-08-20 06:21

Message:
Logged In: YES 
user_id=6380

OK, so we need to add separate getpos() and setpos() methods
that return an opaque wrapper for an fpos_t. That sounds
like serious work, plus it will require changing Python apps
that use seek and tell.

So I think we shold *also* continue to search for a way to
use a 64-bit seek offset for Python's seek() and tell()
methods -- I'm presuming this is hidden *somewhere* in the
fpos_t still, since the underlying OS certainly uses
lseek64(). If there's no way to extract it out of the
fpos_t, I propose to call lseek64() directly (after using a
fflush()) on the file descriptor.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2001-08-19 22:24

Message:
Logged In: YES 
user_id=31435

Noting that C99 *requires* fpos_t values to hold all the 
info in an mbstate_t, in addition to stream position info.  
So we have to expect others to follow glibc in this, and 
eventually everyone.  fpos_t cannot resolve to an array 
type, but anything else is fair (in particular it need not 
map to an integral type -- and probably won't anymore).

We have to give up belief that fpos_t is a number, because 
it's not.  We can believe that ftell returns a number, 
because it does <wink> -- but ftell isn't suitable for 
large file support.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-08-17 06:13

Message:
Logged In: YES 
user_id=21627

This started in glibc 2.2, I believe, so it would appear in
Redhat 7, SuSE 7, etc.
To see the problem, you have to ./configure with
CFLAGS="-D_FILE_OFFSET_BITS=64" OPT="-O2 $(CFLAGS)"; see
pyconfig.h.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-08-17 03:55

Message:
Logged In: YES 
user_id=6380

Whoa.  Interesting. Which Linux version is this?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-08-17 00:21

Message:
Logged In: YES 
user_id=21627

This fails because in glibc, fpos_t contains an mb_state 
field, so that on restoring the file position, the 
multibyte encoding state of the file can be restored.

I see two solutions here:
- Python could give up the guarantee that the ftell result 
is a number, and return an object that embeds the fpos_t.
- Python could give up that guarantee that ftell/fseek 
works in all cases, and only use ftell(o), which should 
always return a number (atleast in Posix). If that 
approach is taken, an additional fgetpos/fsetpos call may 
be appropriate.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=451890&group_id=5470