From cgp at star.le.ac.uk Mon Mar 4 06:47:31 2002 From: cgp at star.le.ac.uk (Clive Page) Date: Mon, 4 Mar 2002 11:47:31 +0000 (GMT) Subject: [AstroPy] PyFITS 0.6.2 available In-Reply-To: Message-ID: On Thu, 28 Feb 2002, Perry Greenfield wrote: > I'm not sure I can go into a detailed comparison but I will outline > many of the reasons we decided to go a different way (albeit more costly > in development costs). I haven't used pCFITSIO so I may inadvertently > misrepresent some things; I hope Nor will correct me in such cases. Perry Thanks for your very instructive posting. I now fully understand your motivation. I have been happily using CFITSIO for many years and have made some attempts to put a higher level interface on it, but with limited success. But you face the problem that CFITSIO already has support for a host of features that people already use in their C and Fortran programs, and will expect in PyFITS. For example: - scaling using BZERO/BSCALE in images and tables (I note that's on your list of things to do soon) - full support for null values in all data types (and IEEE special values in floating-point cases) - support for vector columns and variable length arrays in BIN tables - almost transparent access to ASCII and BINARY tables (I will I didn't have to put that "almost" there). - support for long string keywords (comments flowing over several lines) - support for units in keyword comments - support for arrays of strings (rA:SSTRw/nn convention and an earlier one) - syntax to filter tables from command-line using row expressions etc - access to FITS files over the net using FTP and HTTP - automatic data type conversion in many cases - support for HIERARCH keyword convention. - support for automatic decompression of compressed files Now I have only ever used about half of these, but I suspect other FITS programmers have used a different subset, so PyFITS will need to support rather a lot to keep the majority of users happy. That could be quite a lot of work. > 3) It is not possible to memory map FITS files in CFITSIO. I can see that memory mapping is desirable in many areas. But I have some reservations. I have past experience of using the Starlink HDS library, which relied on memory mapping. But there is a downside if you run out of physical memory - the system reads your file and then pages it to disc, usually another disc. So each logical read turns into a physical read, a write, and another read, and efficiency drops catastrophically. There is another problem: here in the XMM-Newton project we use a FITS infrastructure (built in top of CFITSIO) which always reads each input file to memory before doing anything with it. Just last month we came across our first FITS file which exceeded the memory addressing limits of Solaris 2.6 (2 GB I think); our only solution may be to move to a 64-bit verison of the operating system. Regards -- Clive Page, Dept of Physics & Astronomy, University of Leicester, Tel +44 116 252 3551 Leicester, LE1 7RH, U.K. Fax +44 116 252 3311 _____________________________________________________ AstroPy mailing list - astropy at stsci.edu http://lheawww.gsfc.nasa.gov/~bridgman/AstroPy/ From perry at stsci.edu Mon Mar 4 12:15:39 2002 From: perry at stsci.edu (Perry Greenfield) Date: Mon, 4 Mar 2002 12:15:39 -0500 Subject: [AstroPy] PyFITS 0.6.2 available In-Reply-To: Message-ID: Clive Page writes > Thanks for your very instructive posting. I now fully understand your > motivation. I have been happily using CFITSIO for many years and have > made some attempts to put a higher level interface on it, but with limited > success. But you face the problem that CFITSIO already has support for a > host of features that people already use in their C and Fortran programs, > and will expect in PyFITS. For example: > > - scaling using BZERO/BSCALE in images and tables (I note that's on your > list of things to do soon) > > - full support for null values in all data types (and IEEE special values > in floating-point cases) > These will be supported as well. > - support for vector columns and variable length arrays in BIN tables > I believe vector columns are already supported. Variable length arrays will be as well (though efficiency is an issue since array objects must be created for each variable length entry. > - almost transparent access to ASCII and BINARY tables (I will I didn't > have to put that "almost" there). > I'm not sure what you mean by that. Do you mean not having to worry about whether it is of one type or the other? > - support for long string keywords (comments flowing over several lines) > Not sure what we will do about this. > - support for units in keyword comments > Or this. I imagine we can add this if there is demand. > - support for arrays of strings (rA:SSTRw/nn convention and an earlier > one) > We will have to look at this. > - syntax to filter tables from command-line using row expressions etc > We will certainly add functionality like this. > - access to FITS files over the net using FTP and HTTP > This shouldn't be hard to add. > - automatic data type conversion in many cases > I'm not sure this applies to a Python interface. I'd need to see a more specific request of what automatic conversion behavior was desired. There is the issue of whether types are converted for keyword values, table columns, or image data and in what context. > - support for HIERARCH keyword convention. > Not sure we would support this. I think there are much better ways of doing this. But if it is heavily demanded we might consider it. > - support for automatic decompression of compressed files > Compressed at the file level or extension level? This is something we do want to support (we were invovled in the definition the extension compression conventions after all :-). Decompressing at the file level should be fairly easy to support. > Now I have only ever used about half of these, but I suspect other FITS > programmers have used a different subset, so PyFITS will need to support > rather a lot to keep the majority of users happy. That could be quite a > lot of work. > A FITS library is a lot of work :-) This is unavoidable by its nature. We were aware of that when we started. > > 3) It is not possible to memory map FITS files in CFITSIO. > > I can see that memory mapping is desirable in many areas. But I have some > reservations. I have past experience of using the Starlink HDS library, > which relied on memory mapping. But there is a downside if you run out of > physical memory - the system reads your file and then pages it to disc, > usually another disc. So each logical read turns into a physical read, a Memory mapping is not a panacea for memory issues. It does make it easier to manage memory and can provide a means for significant improvements in accessing large data sets. But I don't think it means one can forget about the fact that one is dealing with large data sets--far from it. It is particularly useful if one only needs to access a subset of the data (thus most of it is never read into memory). In a similar vein, we are going to see if it provides the basic mechanism for reading data a subsection at a time (memory map it and then use arrau slicing to access the subsets rather than do explicit I/O on array subsets). If that proves impractical, we can always fall back to providing methods to read subsets of arrays (it's not as nice an interface though). But I agree, if you memory map large arrays and do anything that results in copies of the array being generated (e.g., from ufuncs) you will run out of virtual memory or page heavily. On the other hand, if you perform "compressive" operations (doing statistics on the image or reduction operations) then memory mapping can be a big win. > write, and another read, and efficiency drops catastrophically. There is > another problem: here in the XMM-Newton project we use a FITS > infrastructure (built in top of CFITSIO) which always reads each input > file to memory before doing anything with it. Just last month we came > across our first FITS file which exceeded the memory addressing limits of > Solaris 2.6 (2 GB I think); our only solution may be to move to a 64-bit > verison of the operating system. > This is a problem in its own right! Anyway, please do provide us feedback. Thanks, Perry _____________________________________________________ AstroPy mailing list - astropy at stsci.edu http://lheawww.gsfc.nasa.gov/~bridgman/AstroPy/ From perry at stsci.edu Tue Mar 5 16:11:54 2002 From: perry at stsci.edu (Perry Greenfield) Date: Tue, 5 Mar 2002 16:11:54 -0500 Subject: [AstroPy] PyFITS 0.6.2 available In-Reply-To: Message-ID: > I would be interested in knowing whether there is any big gain to be had > from memory-mapping FITS datasets on little-endian machines. Surely in > order to access any of the data it has to be copied from the FITS standard > big-endian numeric format to a new array in little-endian format, thereby > negating any advantage from memory-mapping. Or is there something I am > missing? > > David > Actually, there is a gain to memory mapping FITS files in PyFITS. We developed numarray so that it could transparently handle byte-swapped data. The data must be copied to a non-byteswapped representation, but not for the whole array at one time (it's done in fairly small blocks). So yes, you do pay the price in copying the data in memory, but it doesn't cost you much memory. For example, if I got a 4Kx4K memory-image referred to by the variable "im" that happened to be byteswapped on disk, the following would not result in a large temporary 4Kx4K array being created to handle the byteswapping: im += 2 # add 2 to image "in-place" Perry Greenfield _____________________________________________________ AstroPy mailing list - astropy at stsci.edu http://lheawww.gsfc.nasa.gov/~bridgman/AstroPy/ From perry at stsci.edu Tue Mar 12 17:33:05 2002 From: perry at stsci.edu (Perry Greenfield) Date: Tue, 12 Mar 2002 17:33:05 -0500 Subject: [AstroPy] ANN: numarray-0.3 Message-ID: Previously I mentioned there were some problems installing numarray on Solaris using gcc. This new version of numarray fixes those problems. Perry -----Original Message----- From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy-discussion-admin at lists.sourceforge.net]On Behalf Of Todd Miller Sent: Tuesday, March 12, 2002 5:21 PM To: numpy-discussion at lists.sourceforge.net Subject: [Numpy-discussion] ANN: numarray-0.3 Numarray 0.3 ------------ Numarray is a Numeric replacement which features c-code generated from python template scripts, the capacity to operate directly on arrays in files, and improved type promotion semantics. Numarray-0.3 incorporates safety checks to prevent crashing Python when a user accidentally changes private variables in numarray. The new safety checks ensure that: 1. Numarray C-functions are called with properly sized buffers. 2. Numarray C-functions are called with properly aligned buffers. 3. Parameters match the C-function in count and i/o direction. 4. The correct generic function wrapper is used to call each C-function. 5. All indices implied by the array strides are valid. Failed checks result in python exceptions. A new memory object fixes an unforunate limitation of the python buffer object, namely the lack of guaranteed double aligned storage. The largest generated source module, _ufuncmodule.c, has been partitioned by data type into several smaller, more gcc-friendly modules, e.g. _ufuncFloat64module.c. The sort and argsort functions are fixed. The dot function is fixed for 1D arrays. Transpose, swapaxes, and reshape once again return views. WHERE ----------- Numarray-0.3 windows executable installers and source code tar ball is here: http://sourceforge.net/project/showfiles.php?group_id=1369 Numarray is hosted by Source Forge in the same project which hosts Numeric: http://sourceforge.net/projects/numpy/ The web page for Numarray information is at: http://stsdas.stsci.edu/numarray/index.html Trackers for Numarray Bugs, Feature Requests, Support, and Patches are at the Source Forge project for NumPy at: http://sourceforge.net/tracker/?group_id=1369 REQUIREMENTS -------------------------- numarray-0.3 requires Python 2.0 or greater. AUTHORS, LICENSE ------------------------------ Numarray was written by Perry Greenfield, Rick White, Todd Miller, JC Hsu, Paul Barrett, and Phil Hodge at the Space Telescope Science Institute. Numarray is made available under a BSD-style License. See LICENSE.txt in the source distribution for details. -- Todd Miller jmiller at stsci.edu STSCI / SSG (410) 338 4576 -- Todd Miller jmiller at stsci.edu STSCI / SSG (410) 338 4576 _______________________________________________ Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion _____________________________________________________ AstroPy mailing list - astropy at stsci.edu http://lheawww.gsfc.nasa.gov/~bridgman/AstroPy/