RE: memory-mapped Numeric arrays: arrayfrombuffer version 2

I have verified that this package seems to work on Windows. I says seems only because I didn't try enough to uncover anything subtle. Unless or until we are convinced as a community that this is (a) the right way to do this and (b) that the package is portable, it would not be wise to put it in the main distribution. I would like to hear from the community about this so that I will know whether or not to add this package as a separate SourceForge 'package' within the Numerical Python area. Meantime I will add a link to the web page. -----Original Message----- From: python-announce-list-admin@python.org [mailto:python-announce-list-admin@python.org] On Behalf Of Kragen Sitaker Sent: Wednesday, January 23, 2002 9:40 PM To: python-announce-list@python.org Subject: memory-mapped Numeric arrays: arrayfrombuffer version 2 The 'arrayfrombuffer' package features support for Numerical Python arrays whose contents are stored in buffer objects, including memory-mapped files. This has the following advantages: - loading your array from a file is easy --- a module import and a single function call --- and doesn't use excessive amounts of memory. - loading your array is quick; it doesn't need to be copied from one part of memory to another in order to be loaded. - your array gets demand-loaded; parts you aren't using don't need to be in memory or in swap. - under memory-pressure conditions, your array doesn't use up swap, and parts of it you haven't modified can be evicted from RAM without the need for a disk write - your arrays can be bigger than your physical memory - when you modify your array, only the parts you modify get written back out to disk This is something that's been requested on the Numpy list a few times a year since 1999. arrayfrombuffer lives at http://pobox.com/~kragen/sw/arrayfrombuffer/ The current version is version 2; it is released under the X11 license (the BSD license without the advertising clause). <kragen@pobox.com> <P><A HREF="http://pobox.com/~kragen/sw/arrayfrombuffer/">arrayfrombuffer 2</A> - creates Numeric arrays from memory-mapped files. (23-Jan-02) -- http://mail.python.org/mailman/listinfo/python-announce-list

Since I have a 2.4GB data file handy, I thought I'd try this package with it. (Normally I process this data file by reading it in a chunk at a time, which is perfectly adequate.) Not surprisinly, it chokes: File "/home/eric/lib/python2.2/site-packages/maparray.py", line 15, in maparray m = mmap.mmap(fn, os.fstat(fn)[stat.ST_SIZE]) OverflowError: memory mapped size is too large (limited by C int) (details: Python 2.2, numpy 20.3, Pentium III, Debian Woody, Linux kernel 2.4.13, gcc 2.95.4) I'm not a big C programmer, but I wonder if there is some way for this package to overcome the 2GB limit on 32-bit systems. That could be useful in some situations. Eric On Fri, Jan 25, 2002 at 09:40:21AM -0800, Paul F. Dubois wrote:
-- ******************************** Eric Nodwell Ph.D. candidate Department of Physics University of British Columbia tel: 604-822-5425 fax: 604-822-4750 nodwell@physics.ubc.ca

Since I have a 2.4GB data file handy, I thought I'd try this package with it. (Normally I process this data file by reading it in a chunk at a time, which is perfectly adequate.) Not surprisinly, it chokes: File "/home/eric/lib/python2.2/site-packages/maparray.py", line 15, in maparray m = mmap.mmap(fn, os.fstat(fn)[stat.ST_SIZE]) OverflowError: memory mapped size is too large (limited by C int) (details: Python 2.2, numpy 20.3, Pentium III, Debian Woody, Linux kernel 2.4.13, gcc 2.95.4) I'm not a big C programmer, but I wonder if there is some way for this package to overcome the 2GB limit on 32-bit systems. That could be useful in some situations. Eric On Fri, Jan 25, 2002 at 09:40:21AM -0800, Paul F. Dubois wrote:
-- ******************************** Eric Nodwell Ph.D. candidate Department of Physics University of British Columbia tel: 604-822-5425 fax: 604-822-4750 nodwell@physics.ubc.ca
participants (2)
-
Eric Nodwell
-
Paul F. Dubois