[Numpy-discussion] Fixing #736 and possible memory leak

Thu Apr 24 21:51:21 EDT 2008

On Thu, Apr 24, 2008 at 7:11 PM, Charles R Harris <charlesr.harris at gmail.com>
wrote:

>
>
> On Thu, Apr 24, 2008 at 5:58 PM, Robert Kern <robert.kern at gmail.com>
> wrote:
>
>> On Thu, Apr 24, 2008 at 5:37 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>> > Hi,
>> >
>> > I've been looking into ticket #736 and playing with some things. In
>> > arrayobject.c starting at line 8534 I added a check for strings.
>> >
>> >         if (PyString_Check(op)) {
>> >             r = Array_FromPyScalar(op, newtype);
>> >          }
>> >         if (PySequence_Check(op)) {
>> >             PyObject *thiserr = NULL;
>> >
>> >             /* necessary but not sufficient */
>> >             Py_INCREF(newtype);
>> >             r = Array_FromSequence(op, newtype, flags & FORTRAN,
>> >                                     min_depth, max_depth);
>> >             if (r == NULL && (thiserr=PyErr_Occurred())) {
>> >                 if (PyErr_GivenExceptionMatches(thiserr,
>> >                                                 PyExc_MemoryError)) {
>> >                      return NULL;
>> >                 }
>> >
>> > I think there may be a failure to decrement the reference to newtype
>> unless
>> > Array_FromSequence does that (nasty side effect);
>> >
>> > Anyway, the added check for a string fixes the conversion problem for
>> such
>> > things as int32('123'). There remains a problem with array('123',
>> > dtype=int32) and with array(['123','123'], dtype=int32), but I think I
>> can
>> > track those down. The question is, will changing the current behavior so
>> > that strings get converted to numbers cause problems with other programs
>> out
>> > there. I suspect I also need to check that strings are converted this
>> way
>> > only when the type is explicitly given, not detected.
>>
>> Seems to work for me.
>>
>> In [5]: array([124, '123', '123'])
>> Out[5]:
>> array(['124', '123', '123'],
>>      dtype='|S4')
>
>
> Sure, but you didn't specify the type, so numpy determined that it was
> numpy string type. Wrong test. Try
>
> In [1]: array(['123'], dtype=int32)
> Out[1]: array([[1, 2, 3]])
>
> In [2]: a = ones(3, dtype=int32)
>
> In [3]: a[...] = '123'
>
> In [4]: a
> Out[4]: array([1, 2, 3])
>
> In [5]: a[...] = int32('123')
>
> In [6]: a
> Out[6]: array([123, 123, 123])
>
> So on and so forth. The problem is this bit of code (among others)
>
>     stop_at_string = ((type == PyArray_OBJECT) ||
>                       (type == PyArray_STRING &&
>                        typecode->type == PyArray_STRINGLTR) ||
>                       (type == PyArray_UNICODE) ||
>                       (type == PyArray_VOID));
>
>
> The question is, how do we interpret a string when the type is specified? I
> think in that case we should try to convert the string to the relevant type,
> just as we cast numbers to the relevant type. So we should always stop at
> string.
>

Setting stop_at_string = 1 fixes all the problems I can see, but introduces
one new one:

Traceback (most recent call last):
  File
"/usr/lib/python2.5/site-packages/numpy/core/tests/test_regression.py", line
537, in check_numeric_carray_compare
    assert_equal(np.array([ 'X' ], 'c'),'X')
ValueError: setting an array element with a sequence

on the other hand array(['X']) == 'X' works fine, so I don't know what's
going on with the 'c' type.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20080424/ab0d333a/attachment.html>