[Numpy-discussion] major bug in fromstring, ascii mode

Eric Firing efiring at hawaii.edu
Sun Jan 27 14:40:18 EST 2008


Pauli Virtanen wrote:
> su, 2008-01-27 kello 01:16 -0700, Charles R Harris kirjoitti:
>>
>> On Jan 26, 2008 11:30 PM, Eric Firing <efiring at hawaii.edu> wrote:
>>         In the course of trying to parse ascii times, I ran into a
>>         puzzling bug.
>>          Sometimes it works as expected:
>>         
>>         In [31]:npy.fromstring('23:19:01', dtype=int, sep=':')
>>         Out[31]:array([23, 19,  1])
>>         
>>         But sometimes it doesn't:
>>         
>>         In [32]:npy.fromstring('23:09:01', dtype=int, sep=':')
>>         Out[32]:array([23,  0])
>>         
>>         In [33]:npy.__version__
>>         Out[33]:'1.0.5.dev4742'
>>
>> Works here.
> 
> I think it's that some numbers work, and some don't. Consider:
> 
>>>> npy.fromstring('23:06:01', dtype=int, sep=':')
> array([23,  6,  1])
>>>> npy.fromstring('23:07:01', dtype=int, sep=':')
> array([23,  7,  1])
>>>> npy.fromstring('23:08:01', dtype=int, sep=':')
> array([23,  0])
>>>> npy.fromstring('23:09:01', dtype=int, sep=':')
> array([23,  0])
> 
> and
> 
>>>> npy.fromstring('23:010:01', dtype=int, sep=':')
> array([23,  8,  1])
>>>> npy.fromstring('23:011:01', dtype=int, sep=':')
> array([23,  9,  1])
> 
> and
> 
>>>> npy.fromstring('23:0xff:01', dtype=int, sep=':')
> array([ 23, 255,   1])
> 
> Smells like some scanf function is interpreting numbers beginning with
> zero as octal, and recognizing also hexadecimals.

That is it exactly. The code in core/src/arraytypes.inc.src is using 
scanf, and scanf tries hard to recognize integers specified in different 
ways.  So, what caught me is a feature, not a bug, and I should have 
recognized it as such right away.  The bug was in my expectations, not 
in the code.

> 
> This is a bit surprising, and whether this is the desired behavior is
> questionable.
> 
 From a user's standpoint it would be nice to be able to have numbers 
with leading zeros interpreted as base 10 instead of octal, since this 
turns up any time one converts date and time-of-day strings, and can 
occur in many other contexts also. (Outside of computer science octal is 
rare, as far as I know.) It looks like supporting this would require 
quite a bit of change in the code, however.  I suspect it would have to 
go in as a kwarg that would be propagated through several layers of C 
function calls.  Otherwise, if octal conversion support were simply 
dropped, I suspect someone else's code would break, and equally 
reasonable expectations would be violated.

Eric




More information about the NumPy-Discussion mailing list