annoying numpy string to float conversion behaviour
Hi Is there a reason for numpy.float not to convert it's own string representation correctly? Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on win32>>> import numpy
numpy.__version__ '1.0.3' numpy.float("1.0") 1.0 numpy.nan -1.#IND numpy.float("-1.#IND") Traceback (most recent call last): File "<pyshell#20>", line 1, in <module> numpy.float("-1.#IND") ValueError: invalid literal for float(): -1.#IND
Also, nan and -nan are represented differently for different float to string conversion methods. I guess the added zeros are a bug somewhere.
str(nan) '-1.#IND' "%f" % nan '-1.#IND00' str(-nan) '1.#QNAN' "%f" % -nan '1.#QNAN0'
This is a problem when floats are stored in text-files that are later read to be numerically processed. For now I use the following to convert the number. special_numbers=dict([('-1.#INF',-inf),('1.#INF',inf), ('-1.#IND',nan),('-1.#IND00',nan), ('1.#QNAN',-nan),('1.#QNAN0',-nan)]) def string_to_number(x): if x in special_numbers: return special_numbers[x] return float(x) if ("." in x) or ("e" in x) else int(x) Is there a simpler way that I missed? Best Regards, //Torgil
Hi, This was discussed some time ago (I started it because I had exactly the same problem), numpy is not responsible for this, Python is. Python uses the C standard library and in C by MS, NaN and Inf can be displayed, but not read from a string, so this is the behaviour displayed here. Wait for Python 3k... Matthieu 2007/6/20, Torgil Svensson <torgil.svensson@gmail.com>:
Hi
Is there a reason for numpy.float not to convert it's own string representation correctly?
Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on win32>>> import numpy
numpy.__version__ '1.0.3' numpy.float("1.0") 1.0 numpy.nan -1.#IND numpy.float("-1.#IND") Traceback (most recent call last): File "<pyshell#20>", line 1, in <module> numpy.float("-1.#IND") ValueError: invalid literal for float(): -1.#IND
Also, nan and -nan are represented differently for different float to string conversion methods. I guess the added zeros are a bug somewhere.
str(nan) '-1.#IND' "%f" % nan '-1.#IND00' str(-nan) '1.#QNAN' "%f" % -nan '1.#QNAN0'
This is a problem when floats are stored in text-files that are later read to be numerically processed. For now I use the following to convert the number.
special_numbers=dict([('-1.#INF',-inf),('1.#INF',inf), ('-1.#IND',nan),('-1.#IND00',nan), ('1.#QNAN',-nan),('1.#QNAN0',-nan)]) def string_to_number(x): if x in special_numbers: return special_numbers[x] return float(x) if ("." in x) or ("e" in x) else int(x)
Is there a simpler way that I missed?
Best Regards,
//Torgil _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Have you specific insights in Python 3k regarding this? I assume 3k is the next millenium. Maybe they can accept patches before that. //Torgil On 6/20/07, Matthieu Brucher <matthieu.brucher@gmail.com> wrote:
Hi,
This was discussed some time ago (I started it because I had exactly the same problem), numpy is not responsible for this, Python is. Python uses the C standard library and in C by MS, NaN and Inf can be displayed, but not read from a string, so this is the behaviour displayed here. Wait for Python 3k...
Matthieu
2007/6/20, Torgil Svensson <torgil.svensson@gmail.com>:
Hi
Is there a reason for numpy.float not to convert it's own string representation correctly?
Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on win32>>> import numpy
numpy.__version__ '1.0.3' numpy.float("1.0") 1.0 numpy.nan -1.#IND numpy.float("-1.#IND") Traceback (most recent call last): File "<pyshell#20>", line 1, in <module> numpy.float("-1.#IND") ValueError: invalid literal for float(): -1.#IND
Also, nan and -nan are represented differently for different float to string conversion methods. I guess the added zeros are a bug somewhere.
str(nan) '-1.#IND' "%f" % nan '-1.#IND00' str(-nan) '1.#QNAN' "%f" % -nan '1.#QNAN0'
This is a problem when floats are stored in text-files that are later read to be numerically processed. For now I use the following to convert the number.
special_numbers=dict([('-1.#INF',-inf),('1.#INF',inf), ('-1.#IND',nan),('-1.#IND00',nan),
('1.#QNAN',-nan),('1.#QNAN0',-nan)])
def string_to_number(x): if x in special_numbers: return special_numbers[x] return float(x) if ("." in x) or ("e" in x) else int(x)
Is there a simpler way that I missed?
Best Regards,
//Torgil _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Torgil Svensson wrote:
Have you specific insights in Python 3k regarding this? I assume 3k is the next millenium. Maybe they can accept patches before that.
Work on Python 3.0 is going on now. An alpha should be out this year. The deadline for proposing major changes requiring a PEP has passed, but this particular issue isn't big enough to be a major concern for 3.0. It has been discussed for Python 2.6, even. Making float types parse/emit standard string representations for NaNs and infs could probably go in if you were to provide an implementation and work out all of the bugs and cross-platform issues. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On 6/22/07, Robert Kern <robert.kern@gmail.com> wrote:
Making float types parse/emit standard string representations for NaNs and infs could probably go in if you were to provide an implementation and work out all of the bugs and cross-platform issues.
The float types already emit string-representation of nan's and inf's but doesn't know how to parse them back. This parsing step should be trivial to implement. I cannot see any cross-platform issues with this. If the floats aren't binary compatible across platforms we'll have to face these issues regardless of the string representation (I think they are, except for endianess). If cross-platform issues includes string representation from other sources than python 3.0, things get trickier. I think that python should handle it's own string representation, others could always be handled with sub-classing. At a minimum "float(str(nan))==nan" should evaluate as True. //Torgil
Torgil Svensson wrote:
On 6/22/07, Robert Kern <robert.kern@gmail.com> wrote:
Making float types parse/emit standard string representations for NaNs and infs could probably go in if you were to provide an implementation and work out all of the bugs and cross-platform issues.
The float types already emit string-representation of nan's and inf's but doesn't know how to parse them back. This parsing step should be trivial to implement.
I cannot see any cross-platform issues with this.
Well, the string representation that is (currently) emitted is not cross-platform, so you will have to add that back to the list.
If the floats aren't binary compatible across platforms we'll have to face these issues regardless of the string representation (I think they are, except for endianess).
NaNs and infs are IEEE-754 concepts. Python does run on non-IEEE-754 platforms, and I don't think that python-dev will want to entirely exclude them. You will have to do *something* about those platforms. Possibly, they just won't support NaNs and infs at all, but you'd have to make sure that the bit pattern that is a NaN on IEEE-754 systems won't be misinterpreted as a NaN on the non-IEEE-754 systems.
If cross-platform issues includes string representation from other sources than python 3.0, things get trickier. I think that python should handle it's own string representation, others could always be handled with sub-classing. At a minimum "float(str(nan))==nan" should evaluate as True.
Then go for it. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On 6/25/07, Robert Kern <robert.kern@gmail.com> wrote:
NaNs and infs are IEEE-754 concepts. Python does run on non-IEEE-754 platforms, and I don't think that python-dev will want to entirely exclude them. You will have to do *something* about those platforms. Possibly, they just won't support NaNs and infs at all, but you'd have to make sure that the bit pattern that is a NaN on IEEE-754 systems won't be misinterpreted as a NaN on the non-IEEE-754 systems.
Sounds like some clever #ifdefs is needed here. How does isnan() dealing with this? //Torgil
Torgil Svensson wrote:
On 6/25/07, Robert Kern <robert.kern@gmail.com> wrote:
NaNs and infs are IEEE-754 concepts. Python does run on non-IEEE-754 platforms, and I don't think that python-dev will want to entirely exclude them. You will have to do *something* about those platforms. Possibly, they just won't support NaNs and infs at all, but you'd have to make sure that the bit pattern that is a NaN on IEEE-754 systems won't be misinterpreted as a NaN on the non-IEEE-754 systems.
Sounds like some clever #ifdefs is needed here. How does isnan() dealing with this?
It doesn't. numpy does require IEEE-754. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On 6/25/07, Torgil Svensson <torgil.svensson@gmail.com> wrote:
On 6/25/07, Robert Kern <robert.kern@gmail.com> wrote:
NaNs and infs are IEEE-754 concepts. Python does run on non-IEEE-754 platforms, and I don't think that python-dev will want to entirely exclude them. You will have to do *something* about those platforms. Possibly, they just won't support NaNs and infs at all, but you'd have to make sure that the bit pattern that is a NaN on IEEE-754 systems won't be misinterpreted as a NaN on the non-IEEE-754 systems.
Sounds like some clever #ifdefs is needed here.
I took a quick glance at the python code. On the positive side there is ways to detect if a platform is IEEE-754, they error out if special values are unpacked to non IEEE-754 platforms (we can do the same on strings). So far it looks straight forward. The problem is what strings to expect, the string generation uses OS-specific routines (probably the c-library, haven't looked). I think python should be consistent regarding this across platforms but I don't know if different c-libraries generates different strings for special numbers. Anyone? If it's true, this might go political (should string representation follow the system or be unified) and implementation should either be in OS-common code or OS-specific code. //Torgil
Torgil Svensson wrote:
OS-specific routines (probably the c-library, haven't looked). I think python should be consistent regarding this across platforms but I don't know if different c-libraries generates different strings for special numbers. Anyone?
Windows and Linux certainly generate different strings for special numbers from current Python, and I guess the origin is the libc on those platforms. But, as Python is moving away from the libc for file IO in Python 3K, perhaps string representation of floats would be considered, too. (In fact for all I know, perhaps it has already been considered.) Maybe you should email the python-3k-dev list? -Andrew
On 6/26/07, Andrew Straw <strawman@astraw.com> wrote:
But, as Python is moving away from the libc for file IO in Python 3K, perhaps string representation of floats would be considered, too. (In fact for all I know, perhaps it has already been considered.) Maybe you should email the python-3k-dev list?
Good idea. I found this mailing thread on python-dev: http://mail.python.org/pipermail/python-dev/2007-June/073625.html There's also one interesting bug regarding to this. 1732212: repr of 'nan' floats not parseable https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1732212&group_id=5470 This seems to indicate that float('nan') works on some platforms but str(nan) isn't. Is this true on Linux? Could anyone confirm this? What about float('inf') and repr(inf) on Linux? This may also mean that I have an easy long term way out => move to Linux and follow up the resolution of this bug. Windows is all troubles anyway in several areas and may not be worth the extra effort. //Torgil
Torgil Svensson wrote:
This seems to indicate that float('nan') works on some platforms but str(nan) isn't. Is this true on Linux? Could anyone confirm this? What about float('inf') and repr(inf) on Linux?
On Ubuntu Feisty (amd64) Linux (but this behavior has been the same for at least the 6 years I can remember.): $ python Python 2.5.1 (r251:54863, May 2 2007, 16:27:44) [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
float('nan') nan float('inf') inf import numpy repr(numpy.inf) 'inf' repr(numpy.nan) 'nan'
On 6/26/07, Andrew Straw <strawman@astraw.com> wrote:
Torgil Svensson wrote:
This seems to indicate that float('nan') works on some platforms but str(nan) isn't. Is this true on Linux? Could anyone confirm this? What about float('inf') and repr(inf) on Linux?
On Ubuntu Feisty (amd64) Linux (but this behavior has been the same for at least the 6 years I can remember.):
$ python Python 2.5.1 (r251:54863, May 2 2007, 16:27:44) [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
float('nan') nan float('inf') inf import numpy repr(numpy.inf) 'inf' repr(numpy.nan) 'nan'
I should have guessed this and tried it earlier, but the odds on Windows in these cases are too low to give something in return. I think I put this in the bag of annoyances/problems that will go away with Windows and just live with it in the meantime. Thanks for this report and all other good feedback from the list! //Torgil
On Mon, 25 Jun 2007, Torgil Svensson wrote:
handled with sub-classing. At a minimum "float(str(nan))==nan" should evaluate as True.
False. No NaN should ever compare equal to anything, even itself. But if the system is 754-compliant, it won't. "isnan(float(str(nan))) == True" would be nice, though. w
On 6/25/07, Warren Focke <focke@slac.stanford.edu> wrote:
False. No NaN should ever compare equal to anything, even itself. But if the system is 754-compliant, it won't.
"isnan(float(str(nan))) == True" would be nice, though.
Good point. Does this also hold true for the quiet nan's? //Torgil
Torgil Svensson wrote:
On 6/25/07, Warren Focke <focke@slac.stanford.edu> wrote:
False. No NaN should ever compare equal to anything, even itself. But if the system is 754-compliant, it won't.
"isnan(float(str(nan))) == True" would be nice, though.
Good point. Does this also hold true for the quiet nan's?
Yes. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On Jun 20, 2007, at 04:35 , Torgil Svensson wrote:
Hi
Is there a reason for numpy.float not to convert it's own string representation correctly?
numpy.float is the Python float type, so there's nothing we can do. I am working on adding NaN and Inf support for numpy dtypes, though, so that, for instance, numpy.float64('-1.#IND') would work as expected. I'll put it higher on my priority list :-)
Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on win32>>> import numpy
numpy.__version__ '1.0.3' numpy.float("1.0") 1.0 numpy.nan -1.#IND numpy.float("-1.#IND") Traceback (most recent call last): File "<pyshell#20>", line 1, in <module> numpy.float("-1.#IND") ValueError: invalid literal for float(): -1.#IND
Also, nan and -nan are represented differently for different float to string conversion methods. I guess the added zeros are a bug somewhere.
str(nan) '-1.#IND' "%f" % nan '-1.#IND00' str(-nan) '1.#QNAN' "%f" % -nan '1.#QNAN0'
This is a problem when floats are stored in text-files that are later read to be numerically processed. For now I use the following to convert the number.
special_numbers=dict([('-1.#INF',-inf),('1.#INF',inf), ('-1.#IND',nan),('-1.#IND00',nan), ('1.#QNAN',-nan),('1.#QNAN0',-nan)]) def string_to_number(x): if x in special_numbers: return special_numbers[x] return float(x) if ("." in x) or ("e" in x) else int(x)
Is there a simpler way that I missed?
Best Regards,
//Torgil _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
-- |>|\/|< /------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm@physics.mcmaster.ca
On 6/21/07, David M. Cooke <cookedm@physics.mcmaster.ca> wrote:
On Jun 20, 2007, at 04:35 , Torgil Svensson wrote:
Hi
Is there a reason for numpy.float not to convert it's own string representation correctly?
numpy.float is the Python float type, so there's nothing we can do. I am working on adding NaN and Inf support for numpy dtypes, though, so that, for instance, numpy.float64('-1.#IND') would work as expected. I'll put it higher on my priority list :-)
Great news. Thanks! //Torgil
Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on win32>>> import numpy
numpy.__version__ '1.0.3' numpy.float("1.0") 1.0 numpy.nan -1.#IND numpy.float("-1.#IND") Traceback (most recent call last): File "<pyshell#20>", line 1, in <module> numpy.float("-1.#IND") ValueError: invalid literal for float(): -1.#IND
Also, nan and -nan are represented differently for different float to string conversion methods. I guess the added zeros are a bug somewhere.
str(nan) '-1.#IND' "%f" % nan '-1.#IND00' str(-nan) '1.#QNAN' "%f" % -nan '1.#QNAN0'
This is a problem when floats are stored in text-files that are later read to be numerically processed. For now I use the following to convert the number.
special_numbers=dict([('-1.#INF',-inf),('1.#INF',inf), ('-1.#IND',nan),('-1.#IND00',nan), ('1.#QNAN',-nan),('1.#QNAN0',-nan)]) def string_to_number(x): if x in special_numbers: return special_numbers[x] return float(x) if ("." in x) or ("e" in x) else int(x)
Is there a simpler way that I missed?
Best Regards,
//Torgil _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
-- |>|\/|< /------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm@physics.mcmaster.ca
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
participants (6)
-
Andrew Straw
-
David M. Cooke
-
Matthieu Brucher
-
Robert Kern
-
Torgil Svensson
-
Warren Focke