[Numpy-discussion] Fixing #736 and possible memory leak

Thu Apr 24 21:11:27 EDT 2008

On Thu, Apr 24, 2008 at 5:58 PM, Robert Kern <robert.kern at gmail.com> wrote:

> On Thu, Apr 24, 2008 at 5:37 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
> > Hi,
> >
> > I've been looking into ticket #736 and playing with some things. In
> > arrayobject.c starting at line 8534 I added a check for strings.
> >
> >         if (PyString_Check(op)) {
> >             r = Array_FromPyScalar(op, newtype);
> >          }
> >         if (PySequence_Check(op)) {
> >             PyObject *thiserr = NULL;
> >
> >             /* necessary but not sufficient */
> >             Py_INCREF(newtype);
> >             r = Array_FromSequence(op, newtype, flags & FORTRAN,
> >                                     min_depth, max_depth);
> >             if (r == NULL && (thiserr=PyErr_Occurred())) {
> >                 if (PyErr_GivenExceptionMatches(thiserr,
> >                                                 PyExc_MemoryError)) {
> >                      return NULL;
> >                 }
> >
> > I think there may be a failure to decrement the reference to newtype
> unless
> > Array_FromSequence does that (nasty side effect);
> >
> > Anyway, the added check for a string fixes the conversion problem for
> such
> > things as int32('123'). There remains a problem with array('123',
> > dtype=int32) and with array(['123','123'], dtype=int32), but I think I
> can
> > track those down. The question is, will changing the current behavior so
> > that strings get converted to numbers cause problems with other programs
> out
> > there. I suspect I also need to check that strings are converted this way
> > only when the type is explicitly given, not detected.
>
> Seems to work for me.
>
> In [5]: array([124, '123', '123'])
> Out[5]:
> array(['124', '123', '123'],
>      dtype='|S4')

Sure, but you didn't specify the type, so numpy determined that it was numpy
string type. Wrong test. Try

In [1]: array(['123'], dtype=int32)
Out[1]: array([[1, 2, 3]])

In [2]: a = ones(3, dtype=int32)

In [3]: a[...] = '123'

In [4]: a
Out[4]: array([1, 2, 3])

In [5]: a[...] = int32('123')

In [6]: a
Out[6]: array([123, 123, 123])

So on and so forth. The problem is this bit of code (among others)

    stop_at_string = ((type == PyArray_OBJECT) ||
                      (type == PyArray_STRING &&
                       typecode->type == PyArray_STRINGLTR) ||
                      (type == PyArray_UNICODE) ||
                      (type == PyArray_VOID));

The question is, how do we interpret a string when the type is specified? I
think in that case we should try to convert the string to the relevant type,
just as we cast numbers to the relevant type. So we should always stop at
string.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20080424/1309aea8/attachment.html>