Difference in the number of elements in a fromfile() between Windows and Linux

Hi, I'm trying to test my code on several platforms, Windows and Linux, and I'm using some data files that where saved with a tofile(sep=' ') under Linux. Those files can be loaded without a problem under Linux, but under Windows with the latest numpy, these data cannot be loaded, some numbers are not considered - eg +inf or -inf). Is this a known behaviour ? How could I load these correctly under both platforms (I don't want to save them in binary form, I'm using the files for other purpose - Matthieu

Hi,
I'm trying to test my code on several platforms, Windows and Linux, and I'm using some data files that where saved with a tofile(sep=' ') under Linux. Those files can be loaded without a problem under Linux, but under Windows with the latest numpy, these data cannot be loaded, some numbers are not considered - eg +inf or -inf). tofile is using pickling, right ? If you dump to a text file, there may be a problem because of end of line ? Is this a known behaviour ? How could I load these correctly under both platforms (I don't want to save them in binary form, I'm using the files for other purpose - Personally, I always use pytables: the file format (hdf5) is binary, but
Matthieu Brucher wrote: the file format has a standard C api (and C++/java as well), which means you can access those files pretty much anywhere, and is designed to be cross platform. David

tofile is using pickling, right ? If you dump to a text file, there may be a problem because of end of line ?
I'm not using the binary form, so it's not pickling. Example of the first line of my data file : 0.0 inf 13.9040914426 14.7406669444 inf 4.41783247603 inf inf 6.05071515635inf inf inf 15.6925185021 inf inf inf inf inf inf inf
both platforms (I don't want to save them in binary form, I'm using the files for other purpose - Personally, I always use pytables: the file format (hdf5) is binary, but
Is this a known behaviour ? How could I load these correctly under the file format has a standard C api (and C++/java as well), which means you can access those files pretty much anywhere, and is designed to be cross platform.
Well, that would be a solution, but for the moment, I can't use this solution.. I use a lot of computers, and I can't install the packages I want, and their latest version - for instance Feisty that has only numpy 1.0.1 and scipy 0.5.1... - Besides, pure beginners, that do not know of the - excellent - pytables, will use a text format - simple to check visually the results for instance - and will run into this behaviour :( Matthieu

Matthieu Brucher wrote:
Example of the first line of my data file : 0.0 inf 13.9040914426 14.7406669444 inf 4.41783247603 inf inf 6.05071515635 inf inf inf 15.6925185021 inf inf inf inf inf inf inf
I'm pretty sure fromfile() is using the standard C fscanf(). That means that whether in understands "inf" depends on the C lib. I'm guessing that the MS libc doesn't understand the same spelling of "inf" that the gcc one does. There may indeed be no literal for the IEEE Inf. Indeed, the Python built-in "float" relies on libc too, and on OS-X (glibc), I get:
float("inf") inf
On Windows (standard python.org build, compiled with MSC), I get ValueError: invalid literal for float(): inf "Inf" gives me the same thing. It's too bad that C isn't just a little bit more standardized! In short, I don't know that this is a bug. It is a missing feature, but It may be hard to get someone to write the code to account for the limited fscanf() in fromfile(). -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Fri, May 04, 2007 at 09:44:02AM -0700, Christopher Barker wrote:
Matthieu Brucher wrote:
Example of the first line of my data file : 0.0 inf 13.9040914426 14.7406669444 inf 4.41783247603 inf inf 6.05071515635 inf inf inf 15.6925185021 inf inf inf inf inf inf inf
I'm pretty sure fromfile() is using the standard C fscanf(). That means that whether in understands "inf" depends on the C lib. I'm guessing that the MS libc doesn't understand the same spelling of "inf" that the gcc one does. There may indeed be no literal for the IEEE Inf.
It would be interesting to see how Inf and NaN (vs. inf and nan) are interpreted under Windows. Are there any free fscanf implementations out there that we can include with numpy? Cheers Stéfan

Stefan van der Walt wrote:
On Fri, May 04, 2007 at 09:44:02AM -0700, Christopher Barker wrote:
Example of the first line of my data file : 0.0 inf 13.9040914426 14.7406669444 inf 4.41783247603 inf inf 6.05071515635 inf inf inf 15.6925185021 inf inf inf inf inf inf inf I'm pretty sure fromfile() is using the standard C fscanf(). That means
Matthieu Brucher wrote: that whether in understands "inf" depends on the C lib. I'm guessing that the MS libc doesn't understand the same spelling of "inf" that the gcc one does. There may indeed be no literal for the IEEE Inf.
It would be interesting to see how Inf and NaN (vs. inf and nan) are interpreted under Windows.
I'm pretty sure that they are also rejected. "1.#INF" and "1.#QNAN" might be accepted though since that's what ftoa() gives for those quantities.
Are there any free fscanf implementations out there that we can include with numpy?
This might be easy enough to adapt: http://www.python.org/ftp/python/contrib-09-Dec-1999/Misc/sscanfmodule.c.Z -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Robert Kern wrote:
"1.#INF" and "1.#QNAN" might be accepted though since that's what ftoa() gives for those quantities.
not in Python: float("1.#INF") gives a value error. That may or may not reflect what *scanf does, but I suspect it does. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Fri, May 04, 2007 at 05:43:00PM -0500, Robert Kern wrote:
Stefan van der Walt wrote:
On Fri, May 04, 2007 at 09:44:02AM -0700, Christopher Barker wrote:
Example of the first line of my data file : 0.0 inf 13.9040914426 14.7406669444 inf 4.41783247603 inf inf 6.05071515635 inf inf inf 15.6925185021 inf inf inf inf inf inf inf I'm pretty sure fromfile() is using the standard C fscanf(). That means
Matthieu Brucher wrote: that whether in understands "inf" depends on the C lib. I'm guessing that the MS libc doesn't understand the same spelling of "inf" that the gcc one does. There may indeed be no literal for the IEEE Inf.
It would be interesting to see how Inf and NaN (vs. inf and nan) are interpreted under Windows.
I'm pretty sure that they are also rejected. "1.#INF" and "1.#QNAN" might be accepted though since that's what ftoa() gives for those quantities.
So, from some googling, here's the "special" strings for floats, as regular expressions. The case of the letters doesn't seem to matter. positive infinity: [+]?inf [+]?Infinity 1\.#INF negative infinity: -Inf -1.#INF -Infinity not a number: s?NaN[0-9]+ (The 's' is for signalling NaNs, the digits are for diagnostic information. See the decimal spec at http://www2.hursley.ibm.com/decimal/daconvs.html) -1\.#IND 1\.#QNAN (Windows quiet NaN?) There may be more. If we wish to support these, then writing our own parser for them is probably the only way. I'll do it, I just need a complete list of what we want to accept :-) On the other side of the coin, I'd argue the string representations of our float scalars should also be platform-agnostic (standardising on Inf and NaN would be best, I think). -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm@physics.mcmaster.ca

Stefan van der Walt wrote:
It would be interesting to see how Inf and NaN (vs. inf and nan) are interpreted under Windows.
Neither works. I suspect there is no literal that works with the lib the python.org python is built with. (by the way, I'm testing with 2.4, but I think that's the same compiler as 2.5) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Sat, May 05, 2007 at 12:34:30AM +0200, Stefan van der Walt wrote:
On Fri, May 04, 2007 at 09:44:02AM -0700, Christopher Barker wrote:
Matthieu Brucher wrote:
Example of the first line of my data file : 0.0 inf 13.9040914426 14.7406669444 inf 4.41783247603 inf inf 6.05071515635 inf inf inf 15.6925185021 inf inf inf inf inf inf inf
I'm pretty sure fromfile() is using the standard C fscanf(). That means that whether in understands "inf" depends on the C lib. I'm guessing that the MS libc doesn't understand the same spelling of "inf" that the gcc one does. There may indeed be no literal for the IEEE Inf.
It would be interesting to see how Inf and NaN (vs. inf and nan) are interpreted under Windows.
Are there any free fscanf implementations out there that we can include with numpy?
There's no need; all that fscanf is being used for is with the single format string "%d" (and variants for each type). So that's easily replaced with type-specific functions (strtol, strtod, etc.). For the floating-point types, checking first if the string matches inf or nan patterns would be sufficient. There's a bug in fromfile anyways: because it passes the separator directly to fscanf to skip over it, using a % in your separator will not work. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm@physics.mcmaster.ca

Hi Matthieu On Fri, May 04, 2007 at 09:16:34AM +0200, Matthieu Brucher wrote:
I'm trying to test my code on several platforms, Windows and Linux, and I'm using some data files that where saved with a tofile(sep=' ') under Linux. Those files can be loaded without a problem under Linux, but under Windows with the latest numpy, these data cannot be loaded, some numbers are not considered - eg +inf or -inf). Is this a known behaviour ? How could I load these correctly under both platforms (I don't want to save them in binary form, I'm using the files for other purpose -
Please file a ticket at http://projects.scipy.org/scipy/numpy/newticket along with a short code snippet to reproduce the problem. That way we won't forget about it. Cheers Stéfan

Please file a ticket at
http://projects.scipy.org/scipy/numpy/newticket
along with a short code snippet to reproduce the problem. That way we won't forget about it.
Cheers Stéfan
Thank you, I didn't know if it was known or not, ... I'll post a ticket as soon as possible. Matthieu
participants (6)
-
Christopher Barker
-
David Cournapeau
-
David M. Cooke
-
Matthieu Brucher
-
Robert Kern
-
Stefan van der Walt