[Numpy-discussion] Status of NumPy and Python 3.3

Sat Jul 28 18:31:11 EDT 2012

On Sat, Jul 28, 2012 at 3:04 PM, Ondřej Čertík <ondrej.certik at gmail.com> wrote:
> Many of the failures in
> https://gist.github.com/3194707/5696c8d3091b16ba8a9f00a921d512ed02e94d71
> are of the type:
>
> ======================================================================
> FAIL: Check byteorder of single-dimensional objects
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_unicode.py",
> line 286, in test_valuesSD
>     self.assertTrue(ua[0] != ua2[0])
> AssertionError: False is not true
>
>
> and those are caused by the following minimal example:
>
> Python 3.2:
>
>>>> from numpy import array
>>>> a = array(["abc"])
>>>> b = a.newbyteorder()
>>>> a.dtype
> dtype('<U3')
>>>> b.dtype
> dtype('>U3')
>>>> a[0].dtype
> dtype('<U3')
>>>> b[0].dtype
> dtype('<U6')
>>>> a[0] == b[0]
> False
>>>> a[0]
> 'abc'
>>>> b[0]
> 'ៀ\udc00埀\udc00韀\udc00'
>
>
> Python 3.3:
>
>
>>>> from numpy import array
>>>> a = array(["abc"])
>>>> b = a.newbyteorder()
>>>> a.dtype
> dtype('<U3')
>>>> b.dtype
> dtype('>U3')
>>>> a[0].dtype
> dtype('<U3')
>>>> b[0].dtype
> dtype('<U3')
>>>> a[0] == b[0]
> True
>>>> a[0]
> 'abc'
>>>> b[0]
> 'abc'
>
>
> So somehow the newbyteorder() method doesn't change the dtype of the
> elements in our new code.
> This method is implemented in numpy/core/src/multiarray/descriptor.c
> (I think), but so far I don't see
> where the problem could be.
>
> Any ideas?

Ok, after some investigating, I think we need to do something along these lines:

diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/s
index c134aed..daf7fc4 100644
--- a/numpy/core/src/multiarray/scalarapi.c
+++ b/numpy/core/src/multiarray/scalarapi.c
@@ -644,7 +644,20 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject *
 #if PY_VERSION_HEX >= 0x03030000
     if (type_num == NPY_UNICODE) {
         PyObject *b, *args;
-        b = PyBytes_FromStringAndSize(data, itemsize);
+        if (swap) {
+            char *buffer;
+            buffer = malloc(itemsize);
+            if (buffer == NULL) {
+                PyErr_NoMemory();
+            }
+            memcpy(buffer, data, itemsize);
+            byte_swap_vector(buffer, itemsize, 4);
+            b = PyBytes_FromStringAndSize(buffer, itemsize);
+            // We have to deallocate this later, otherwise we get a segfault...
+            //free(buffer);
+        } else {
+            b = PyBytes_FromStringAndSize(data, itemsize);
+        }
         if (b == NULL) {
             return NULL;
         }

This particular implementation still fails though:


>>> from numpy import array
>>> a = array(["abc"])
>>> b = a.newbyteorder()
>>> a.dtype
dtype('<U3')
>>> b.dtype
dtype('>U3')
>>> a[0].dtype
dtype('<U3')
>>> b[0].dtype
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
codepoint not in range(0x110000)
>>> a[0] == b[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
codepoint not in range(0x110000)
>>> a[0]
'abc'
>>> b[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
codepoint not in range(0x110000)



But I think that we simply need to take into account the "swap" flag.

Ondrej