[Numpy-discussion] Status of NumPy and Python 3.3
Ondřej Čertík
ondrej.certik at gmail.com
Sat Jul 28 18:31:11 EDT 2012
On Sat, Jul 28, 2012 at 3:04 PM, Ondřej Čertík <ondrej.certik at gmail.com> wrote:
> Many of the failures in
> https://gist.github.com/3194707/5696c8d3091b16ba8a9f00a921d512ed02e94d71
> are of the type:
>
> ======================================================================
> FAIL: Check byteorder of single-dimensional objects
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_unicode.py",
> line 286, in test_valuesSD
> self.assertTrue(ua[0] != ua2[0])
> AssertionError: False is not true
>
>
> and those are caused by the following minimal example:
>
> Python 3.2:
>
>>>> from numpy import array
>>>> a = array(["abc"])
>>>> b = a.newbyteorder()
>>>> a.dtype
> dtype('<U3')
>>>> b.dtype
> dtype('>U3')
>>>> a[0].dtype
> dtype('<U3')
>>>> b[0].dtype
> dtype('<U6')
>>>> a[0] == b[0]
> False
>>>> a[0]
> 'abc'
>>>> b[0]
> 'ៀ\udc00埀\udc00韀\udc00'
>
>
> Python 3.3:
>
>
>>>> from numpy import array
>>>> a = array(["abc"])
>>>> b = a.newbyteorder()
>>>> a.dtype
> dtype('<U3')
>>>> b.dtype
> dtype('>U3')
>>>> a[0].dtype
> dtype('<U3')
>>>> b[0].dtype
> dtype('<U3')
>>>> a[0] == b[0]
> True
>>>> a[0]
> 'abc'
>>>> b[0]
> 'abc'
>
>
> So somehow the newbyteorder() method doesn't change the dtype of the
> elements in our new code.
> This method is implemented in numpy/core/src/multiarray/descriptor.c
> (I think), but so far I don't see
> where the problem could be.
>
> Any ideas?
Ok, after some investigating, I think we need to do something along these lines:
diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/s
index c134aed..daf7fc4 100644
--- a/numpy/core/src/multiarray/scalarapi.c
+++ b/numpy/core/src/multiarray/scalarapi.c
@@ -644,7 +644,20 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject *
#if PY_VERSION_HEX >= 0x03030000
if (type_num == NPY_UNICODE) {
PyObject *b, *args;
- b = PyBytes_FromStringAndSize(data, itemsize);
+ if (swap) {
+ char *buffer;
+ buffer = malloc(itemsize);
+ if (buffer == NULL) {
+ PyErr_NoMemory();
+ }
+ memcpy(buffer, data, itemsize);
+ byte_swap_vector(buffer, itemsize, 4);
+ b = PyBytes_FromStringAndSize(buffer, itemsize);
+ // We have to deallocate this later, otherwise we get a segfault...
+ //free(buffer);
+ } else {
+ b = PyBytes_FromStringAndSize(data, itemsize);
+ }
if (b == NULL) {
return NULL;
}
This particular implementation still fails though:
>>> from numpy import array
>>> a = array(["abc"])
>>> b = a.newbyteorder()
>>> a.dtype
dtype('<U3')
>>> b.dtype
dtype('>U3')
>>> a[0].dtype
dtype('<U3')
>>> b[0].dtype
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
codepoint not in range(0x110000)
>>> a[0] == b[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
codepoint not in range(0x110000)
>>> a[0]
'abc'
>>> b[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
codepoint not in range(0x110000)
But I think that we simply need to take into account the "swap" flag.
Ondrej
More information about the NumPy-Discussion
mailing list