annoying numpy string to float conversion behaviour
Hi Is there a reason for numpy.float not to convert it's own string representation correctly? Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on win32>>> import numpy
numpy.__version__ '1.0.3' numpy.float("1.0") 1.0 numpy.nan 1.#IND numpy.float("1.#IND") Traceback (most recent call last): File "
", line 1, in <module> numpy.float("1.#IND") ValueError: invalid literal for float(): 1.#IND
Also, nan and nan are represented differently for different float to string conversion methods. I guess the added zeros are a bug somewhere.
str(nan) '1.#IND' "%f" % nan '1.#IND00' str(nan) '1.#QNAN' "%f" % nan '1.#QNAN0'
This is a problem when floats are stored in textfiles that are later read to be numerically processed. For now I use the following to convert the number. special_numbers=dict([('1.#INF',inf),('1.#INF',inf), ('1.#IND',nan),('1.#IND00',nan), ('1.#QNAN',nan),('1.#QNAN0',nan)]) def string_to_number(x): if x in special_numbers: return special_numbers[x] return float(x) if ("." in x) or ("e" in x) else int(x) Is there a simpler way that I missed? Best Regards, //Torgil
Hi,
This was discussed some time ago (I started it because I had exactly the
same problem), numpy is not responsible for this, Python is. Python uses the
C standard library and in C by MS, NaN and Inf can be displayed, but not
read from a string, so this is the behaviour displayed here.
Wait for Python 3k...
Matthieu
2007/6/20, Torgil Svensson
Hi
Is there a reason for numpy.float not to convert it's own string representation correctly?
Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on win32>>> import numpy
numpy.__version__ '1.0.3' numpy.float("1.0") 1.0 numpy.nan 1.#IND numpy.float("1.#IND") Traceback (most recent call last): File "
", line 1, in <module> numpy.float("1.#IND") ValueError: invalid literal for float(): 1.#IND Also, nan and nan are represented differently for different float to string conversion methods. I guess the added zeros are a bug somewhere.
str(nan) '1.#IND' "%f" % nan '1.#IND00' str(nan) '1.#QNAN' "%f" % nan '1.#QNAN0'
This is a problem when floats are stored in textfiles that are later read to be numerically processed. For now I use the following to convert the number.
special_numbers=dict([('1.#INF',inf),('1.#INF',inf), ('1.#IND',nan),('1.#IND00',nan), ('1.#QNAN',nan),('1.#QNAN0',nan)]) def string_to_number(x): if x in special_numbers: return special_numbers[x] return float(x) if ("." in x) or ("e" in x) else int(x)
Is there a simpler way that I missed?
Best Regards,
//Torgil _______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
Have you specific insights in Python 3k regarding this? I assume 3k
is the next millenium. Maybe they can accept patches before that.
//Torgil
On 6/20/07, Matthieu Brucher
Hi,
This was discussed some time ago (I started it because I had exactly the same problem), numpy is not responsible for this, Python is. Python uses the C standard library and in C by MS, NaN and Inf can be displayed, but not read from a string, so this is the behaviour displayed here. Wait for Python 3k...
Matthieu
2007/6/20, Torgil Svensson
: Hi
Is there a reason for numpy.float not to convert it's own string representation correctly?
Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on win32>>> import numpy
numpy.__version__ '1.0.3' numpy.float("1.0") 1.0 numpy.nan 1.#IND numpy.float("1.#IND") Traceback (most recent call last): File "
", line 1, in <module> numpy.float("1.#IND") ValueError: invalid literal for float(): 1.#IND Also, nan and nan are represented differently for different float to string conversion methods. I guess the added zeros are a bug somewhere.
str(nan) '1.#IND' "%f" % nan '1.#IND00' str(nan) '1.#QNAN' "%f" % nan '1.#QNAN0'
This is a problem when floats are stored in textfiles that are later read to be numerically processed. For now I use the following to convert the number.
special_numbers=dict([('1.#INF',inf),('1.#INF',inf), ('1.#IND',nan),('1.#IND00',nan),
('1.#QNAN',nan),('1.#QNAN0',nan)])
def string_to_number(x): if x in special_numbers: return special_numbers[x] return float(x) if ("." in x) or ("e" in x) else int(x)
Is there a simpler way that I missed?
Best Regards,
//Torgil _______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpydiscussion
_______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
Torgil Svensson wrote:
Have you specific insights in Python 3k regarding this? I assume 3k is the next millenium. Maybe they can accept patches before that.
Work on Python 3.0 is going on now. An alpha should be out this year. The deadline for proposing major changes requiring a PEP has passed, but this particular issue isn't big enough to be a major concern for 3.0. It has been discussed for Python 2.6, even. Making float types parse/emit standard string representations for NaNs and infs could probably go in if you were to provide an implementation and work out all of the bugs and crossplatform issues.  Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth."  Umberto Eco
On 6/22/07, Robert Kern
Making float types parse/emit standard string representations for NaNs and infs could probably go in if you were to provide an implementation and work out all of the bugs and crossplatform issues.
The float types already emit stringrepresentation of nan's and inf's but doesn't know how to parse them back. This parsing step should be trivial to implement. I cannot see any crossplatform issues with this. If the floats aren't binary compatible across platforms we'll have to face these issues regardless of the string representation (I think they are, except for endianess). If crossplatform issues includes string representation from other sources than python 3.0, things get trickier. I think that python should handle it's own string representation, others could always be handled with subclassing. At a minimum "float(str(nan))==nan" should evaluate as True. //Torgil
Torgil Svensson wrote:
On 6/22/07, Robert Kern
wrote: Making float types parse/emit standard string representations for NaNs and infs could probably go in if you were to provide an implementation and work out all of the bugs and crossplatform issues.
The float types already emit stringrepresentation of nan's and inf's but doesn't know how to parse them back. This parsing step should be trivial to implement.
I cannot see any crossplatform issues with this.
Well, the string representation that is (currently) emitted is not crossplatform, so you will have to add that back to the list.
If the floats aren't binary compatible across platforms we'll have to face these issues regardless of the string representation (I think they are, except for endianess).
NaNs and infs are IEEE754 concepts. Python does run on nonIEEE754 platforms, and I don't think that pythondev will want to entirely exclude them. You will have to do *something* about those platforms. Possibly, they just won't support NaNs and infs at all, but you'd have to make sure that the bit pattern that is a NaN on IEEE754 systems won't be misinterpreted as a NaN on the nonIEEE754 systems.
If crossplatform issues includes string representation from other sources than python 3.0, things get trickier. I think that python should handle it's own string representation, others could always be handled with subclassing. At a minimum "float(str(nan))==nan" should evaluate as True.
Then go for it.  Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth."  Umberto Eco
On 6/25/07, Robert Kern
NaNs and infs are IEEE754 concepts. Python does run on nonIEEE754 platforms, and I don't think that pythondev will want to entirely exclude them. You will have to do *something* about those platforms. Possibly, they just won't support NaNs and infs at all, but you'd have to make sure that the bit pattern that is a NaN on IEEE754 systems won't be misinterpreted as a NaN on the nonIEEE754 systems.
Sounds like some clever #ifdefs is needed here. How does isnan() dealing with this? //Torgil
Torgil Svensson wrote:
On 6/25/07, Robert Kern
wrote: NaNs and infs are IEEE754 concepts. Python does run on nonIEEE754 platforms, and I don't think that pythondev will want to entirely exclude them. You will have to do *something* about those platforms. Possibly, they just won't support NaNs and infs at all, but you'd have to make sure that the bit pattern that is a NaN on IEEE754 systems won't be misinterpreted as a NaN on the nonIEEE754 systems.
Sounds like some clever #ifdefs is needed here. How does isnan() dealing with this?
It doesn't. numpy does require IEEE754.  Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth."  Umberto Eco
On 6/25/07, Torgil Svensson
On 6/25/07, Robert Kern
wrote: NaNs and infs are IEEE754 concepts. Python does run on nonIEEE754 platforms, and I don't think that pythondev will want to entirely exclude them. You will have to do *something* about those platforms. Possibly, they just won't support NaNs and infs at all, but you'd have to make sure that the bit pattern that is a NaN on IEEE754 systems won't be misinterpreted as a NaN on the nonIEEE754 systems.
Sounds like some clever #ifdefs is needed here.
I took a quick glance at the python code. On the positive side there is ways to detect if a platform is IEEE754, they error out if special values are unpacked to non IEEE754 platforms (we can do the same on strings). So far it looks straight forward. The problem is what strings to expect, the string generation uses OSspecific routines (probably the clibrary, haven't looked). I think python should be consistent regarding this across platforms but I don't know if different clibraries generates different strings for special numbers. Anyone? If it's true, this might go political (should string representation follow the system or be unified) and implementation should either be in OScommon code or OSspecific code. //Torgil
Torgil Svensson wrote:
OSspecific routines (probably the clibrary, haven't looked). I think python should be consistent regarding this across platforms but I don't know if different clibraries generates different strings for special numbers. Anyone?
Windows and Linux certainly generate different strings for special numbers from current Python, and I guess the origin is the libc on those platforms. But, as Python is moving away from the libc for file IO in Python 3K, perhaps string representation of floats would be considered, too. (In fact for all I know, perhaps it has already been considered.) Maybe you should email the python3kdev list? Andrew
On 6/26/07, Andrew Straw
But, as Python is moving away from the libc for file IO in Python 3K, perhaps string representation of floats would be considered, too. (In fact for all I know, perhaps it has already been considered.) Maybe you should email the python3kdev list?
Good idea. I found this mailing thread on pythondev: http://mail.python.org/pipermail/pythondev/2007June/073625.html There's also one interesting bug regarding to this. 1732212: repr of 'nan' floats not parseable https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1732212&group_id=5470 This seems to indicate that float('nan') works on some platforms but str(nan) isn't. Is this true on Linux? Could anyone confirm this? What about float('inf') and repr(inf) on Linux? This may also mean that I have an easy long term way out => move to Linux and follow up the resolution of this bug. Windows is all troubles anyway in several areas and may not be worth the extra effort. //Torgil
Torgil Svensson wrote:
This seems to indicate that float('nan') works on some platforms but str(nan) isn't. Is this true on Linux? Could anyone confirm this? What about float('inf') and repr(inf) on Linux?
On Ubuntu Feisty (amd64) Linux (but this behavior has been the same for at least the 6 years I can remember.): $ python Python 2.5.1 (r251:54863, May 2 2007, 16:27:44) [GCC 4.1.2 (Ubuntu 4.1.20ubuntu4)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
float('nan') nan float('inf') inf import numpy repr(numpy.inf) 'inf' repr(numpy.nan) 'nan'
On 6/26/07, Andrew Straw
Torgil Svensson wrote:
This seems to indicate that float('nan') works on some platforms but str(nan) isn't. Is this true on Linux? Could anyone confirm this? What about float('inf') and repr(inf) on Linux?
On Ubuntu Feisty (amd64) Linux (but this behavior has been the same for at least the 6 years I can remember.):
$ python Python 2.5.1 (r251:54863, May 2 2007, 16:27:44) [GCC 4.1.2 (Ubuntu 4.1.20ubuntu4)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
float('nan') nan float('inf') inf import numpy repr(numpy.inf) 'inf' repr(numpy.nan) 'nan'
I should have guessed this and tried it earlier, but the odds on Windows in these cases are too low to give something in return. I think I put this in the bag of annoyances/problems that will go away with Windows and just live with it in the meantime. Thanks for this report and all other good feedback from the list! //Torgil
On Mon, 25 Jun 2007, Torgil Svensson wrote:
handled with subclassing. At a minimum "float(str(nan))==nan" should evaluate as True.
False. No NaN should ever compare equal to anything, even itself. But if the system is 754compliant, it won't. "isnan(float(str(nan))) == True" would be nice, though. w
Torgil Svensson wrote:
On 6/25/07, Warren Focke
wrote: False. No NaN should ever compare equal to anything, even itself. But if the system is 754compliant, it won't.
"isnan(float(str(nan))) == True" would be nice, though.
Good point. Does this also hold true for the quiet nan's?
Yes.  Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth."  Umberto Eco
On Jun 20, 2007, at 04:35 , Torgil Svensson wrote:
Hi
Is there a reason for numpy.float not to convert it's own string representation correctly?
numpy.float is the Python float type, so there's nothing we can do. I am working on adding NaN and Inf support for numpy dtypes, though, so that, for instance, numpy.float64('1.#IND') would work as expected. I'll put it higher on my priority list :)
Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on win32>>> import numpy
numpy.__version__ '1.0.3' numpy.float("1.0") 1.0 numpy.nan 1.#IND numpy.float("1.#IND") Traceback (most recent call last): File "
", line 1, in <module> numpy.float("1.#IND") ValueError: invalid literal for float(): 1.#IND Also, nan and nan are represented differently for different float to string conversion methods. I guess the added zeros are a bug somewhere.
str(nan) '1.#IND' "%f" % nan '1.#IND00' str(nan) '1.#QNAN' "%f" % nan '1.#QNAN0'
This is a problem when floats are stored in textfiles that are later read to be numerically processed. For now I use the following to convert the number.
special_numbers=dict([('1.#INF',inf),('1.#INF',inf), ('1.#IND',nan),('1.#IND00',nan), ('1.#QNAN',nan),('1.#QNAN0',nan)]) def string_to_number(x): if x in special_numbers: return special_numbers[x] return float(x) if ("." in x) or ("e" in x) else int(x)
Is there a simpler way that I missed?
Best Regards,
//Torgil _______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
 >\/< /\ David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ cookedm@physics.mcmaster.ca
On 6/21/07, David M. Cooke
On Jun 20, 2007, at 04:35 , Torgil Svensson wrote:
Hi
Is there a reason for numpy.float not to convert it's own string representation correctly?
numpy.float is the Python float type, so there's nothing we can do. I am working on adding NaN and Inf support for numpy dtypes, though, so that, for instance, numpy.float64('1.#IND') would work as expected. I'll put it higher on my priority list :)
Great news. Thanks! //Torgil
Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on win32>>> import numpy
numpy.__version__ '1.0.3' numpy.float("1.0") 1.0 numpy.nan 1.#IND numpy.float("1.#IND") Traceback (most recent call last): File "
", line 1, in <module> numpy.float("1.#IND") ValueError: invalid literal for float(): 1.#IND Also, nan and nan are represented differently for different float to string conversion methods. I guess the added zeros are a bug somewhere.
str(nan) '1.#IND' "%f" % nan '1.#IND00' str(nan) '1.#QNAN' "%f" % nan '1.#QNAN0'
This is a problem when floats are stored in textfiles that are later read to be numerically processed. For now I use the following to convert the number.
special_numbers=dict([('1.#INF',inf),('1.#INF',inf), ('1.#IND',nan),('1.#IND00',nan), ('1.#QNAN',nan),('1.#QNAN0',nan)]) def string_to_number(x): if x in special_numbers: return special_numbers[x] return float(x) if ("." in x) or ("e" in x) else int(x)
Is there a simpler way that I missed?
Best Regards,
//Torgil _______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
 >\/< /\ David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ cookedm@physics.mcmaster.ca
_______________________________________________ Numpydiscussion mailing list Numpydiscussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpydiscussion
participants (6)

Andrew Straw

David M. Cooke

Matthieu Brucher

Robert Kern

Torgil Svensson

Warren Focke