Mailman 3 Status of NumPy and Python 3.3 - NumPy-Discussion

newer
Moving away from using accelerate...

Status of NumPy and Python 3.3

Travis Oliphant

27 Jul 2012 27 Jul '12

6:30 a.m.

Hey all, I'm wondering who has tried to make NumPy work with Python 3.3. The Unicode handling was significantly improved in Python 3.3 and the array-scalar code (which assumed a certain structure for UnicodeObjects) is not working now. It would be nice to get 1.7.0 working with Python 3.3 if possible before the release. Anyone interested in tackling that little challenge? If someone has already tried it would be nice to hear your experience. -Travis

Show replies by date

David Cournapeau

27 Jul 27 Jul

8:28 a.m.

On Fri, Jul 27, 2012 at 7:30 AM, Travis Oliphant wrote:

...

Hey all,

I'm wondering who has tried to make NumPy work with Python 3.3. The Unicode handling was significantly improved in Python 3.3 and the array-scalar code (which assumed a certain structure for UnicodeObjects) is not working now.

It would be nice to get 1.7.0 working with Python 3.3 if possible before the release. Anyone interested in tackling that little challenge? If someone has already tried it would be nice to hear your experience.

Given that we're late with 1.7, I would suggest passing this to the next release, unless the fix is simple (just a change of API). cheers, David

David Cournapeau

8:43 a.m.

On Fri, Jul 27, 2012 at 9:28 AM, David Cournapeau wrote:

...

On Fri, Jul 27, 2012 at 7:30 AM, Travis Oliphant wrote:

...
Hey all,

I'm wondering who has tried to make NumPy work with Python 3.3. The Unicode handling was significantly improved in Python 3.3 and the array-scalar code (which assumed a certain structure for UnicodeObjects) is not working now.

It would be nice to get 1.7.0 working with Python 3.3 if possible before the release. Anyone interested in tackling that little challenge? If someone has already tried it would be nice to hear your experience.

Given that we're late with 1.7, I would suggest passing this to the next release, unless the fix is simple (just a change of API).

I took a brief look at it, and from the errors I have seen, one is cosmetic, the other one is a bit more involved (rewriting PyArray_Scalar unicode support). While it is not difficult in nature, the current code has multiple #ifdef of Py_UNICODE_WIDE, meaning it would require multiple configurations on multiple python versions to be tested. I don't think python 3.3 support is critical - people who want to play with bet interpreters can build numpy by themselves from master, so I am -1 on integrating this into 1.7. I may have a fix within tonight for it, though, David

Ralf Gommers

1:47 p.m.

On Fri, Jul 27, 2012 at 10:43 AM, David Cournapeau wrote:

...

On Fri, Jul 27, 2012 at 9:28 AM, David Cournapeau wrote:

...
On Fri, Jul 27, 2012 at 7:30 AM, Travis Oliphant wrote:

...
Hey all,

I'm wondering who has tried to make NumPy work with Python 3.3. The Unicode handling was significantly improved in Python 3.3 and the array-scalar code (which assumed a certain structure for UnicodeObjects) is not working now.

It would be nice to get 1.7.0 working with Python 3.3 if possible before the release. Anyone interested in tackling that little challenge? If someone has already tried it would be nice to hear your experience.

Given that we're late with 1.7, I would suggest passing this to the next release, unless the fix is simple (just a change of API).

I took a brief look at it, and from the errors I have seen, one is cosmetic, the other one is a bit more involved (rewriting PyArray_Scalar unicode support). While it is not difficult in nature, the current code has multiple #ifdef of Py_UNICODE_WIDE, meaning it would require multiple configurations on multiple python versions to be tested.

I don't think python 3.3 support is critical - people who want to play with bet interpreters can build numpy by themselves from master, so I am -1 on integrating this into 1.7.

I may have a fix within tonight for it, though,

There are 2 tickets about this: http://projects.scipy.org/numpy/ticket/2145 http://projects.scipy.org/numpy/ticket/1471 Ralf

Ondřej Čertík

9 p.m.

On Fri, Jul 27, 2012 at 6:47 AM, Ralf Gommers wrote:

...

On Fri, Jul 27, 2012 at 10:43 AM, David Cournapeau wrote:

...
On Fri, Jul 27, 2012 at 9:28 AM, David Cournapeau wrote:

...
On Fri, Jul 27, 2012 at 7:30 AM, Travis Oliphant wrote:

...
Hey all,

I'm wondering who has tried to make NumPy work with Python 3.3. The Unicode handling was significantly improved in Python 3.3 and the array-scalar code (which assumed a certain structure for UnicodeObjects) is not working now.

It would be nice to get 1.7.0 working with Python 3.3 if possible before the release. Anyone interested in tackling that little challenge? If someone has already tried it would be nice to hear your experience.

Given that we're late with 1.7, I would suggest passing this to the next release, unless the fix is simple (just a change of API).

I took a brief look at it, and from the errors I have seen, one is cosmetic, the other one is a bit more involved (rewriting PyArray_Scalar unicode support). While it is not difficult in nature, the current code has multiple #ifdef of Py_UNICODE_WIDE, meaning it would require multiple configurations on multiple python versions to be tested.

I don't think python 3.3 support is critical - people who want to play with bet interpreters can build numpy by themselves from master, so I am -1 on integrating this into 1.7.

I may have a fix within tonight for it, though,

There are 2 tickets about this: http://projects.scipy.org/numpy/ticket/2145 http://projects.scipy.org/numpy/ticket/1471

I am currently working on a PR trying to fix the unicode failures: https://github.com/numpy/numpy/pull/366 It's a work in progress, I am still have some little issues, see the PR for up-to-date details. Ondrej

Stefan Krah

28 Jul 28 Jul

9:36 a.m.

Ond??ej ??ert??k wrote:

...

...
...
I took a brief look at it, and from the errors I have seen, one is cosmetic, the other one is a bit more involved (rewriting PyArray_Scalar unicode support). While it is not difficult in nature, the current code has multiple #ifdef of Py_UNICODE_WIDE, meaning it would require multiple configurations on multiple python versions to be tested.

The cleanest way might be to leave the existing code in place and write completely new and independent code for Python 3.3.

...

https://github.com/numpy/numpy/pull/366

It's a work in progress, I am still have some little issues, see the PR for up-to-date details.

I'm not a Unicode expert, but I think it's best to avoid Py_UNICODE altogether. What should matter in 3.3 is the maximum character in a Unicode string that determines the kind of the string: PyUnicode_1BYTE_KIND -> Py_UCS1 PyUnicode_2BYTE_KIND -> Py_UCS2 PyUnicode_4BYTE_KIND -> Py_UCS4 So Py_UNICODE_WIDE should not matter as all builds support PyUnicode_4BYTE_KIND. That's why I /think/ it's possible to drop Py_UNICODE altogether. For instance, the line in https://github.com/certik/numpy/commit/d02e36e5c85d5ee444614254643037aafc8de... should probably be: itemsize = PyUnicode_GetLength(robj) * PyUnicode_KIND(robj) Stefan Krah

Ondřej Čertík

2:58 p.m.

Stefan, On Sat, Jul 28, 2012 at 2:36 AM, Stefan Krah wrote:

...

Ond??ej ??ert??k wrote:

...
...
...
I took a brief look at it, and from the errors I have seen, one is cosmetic, the other one is a bit more involved (rewriting PyArray_Scalar unicode support). While it is not difficult in nature, the current code has multiple #ifdef of Py_UNICODE_WIDE, meaning it would require multiple configurations on multiple python versions to be tested.

The cleanest way might be to leave the existing code in place and write completely new and independent code for Python 3.3.

...
https://github.com/numpy/numpy/pull/366

It's a work in progress, I am still have some little issues, see the PR for up-to-date details.

I'm not a Unicode expert, but I think it's best to avoid Py_UNICODE altogether.

I think so too.

...

What should matter in 3.3 is the maximum character in a Unicode string that determines the kind of the string:

PyUnicode_1BYTE_KIND -> Py_UCS1 PyUnicode_2BYTE_KIND -> Py_UCS2 PyUnicode_4BYTE_KIND -> Py_UCS4

So Py_UNICODE_WIDE should not matter as all builds support PyUnicode_4BYTE_KIND. That's why I /think/ it's possible to drop Py_UNICODE altogether. For instance, the line in https://github.com/certik/numpy/commit/d02e36e5c85d5ee444614254643037aafc8de... should probably be:

itemsize = PyUnicode_GetLength(robj) * PyUnicode_KIND(robj)

Yes, I think that's it. I've changed it and pushed in the change into the PR. I am now seeing failures like these: ====================================================================== ERROR: test_rmul (test_defchararray.TestOperations) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_defchararray.py", line 592, in test_rmul Ar = np.array([[A[0,0]*r, A[0,1]*r], File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/defchararray.py", line 1916, in __getitem__ if issubclass(val.dtype.type, character) and not _len(val) == 0: AttributeError: 'str' object has no attribute 'dtype' Here is the code in defchararray.py: 1911 if not _globalvar and self.dtype.char not in 'SUbc': 1912 raise ValueError("Can only create a chararray from string data.") 1913 1914 def __getitem__(self, obj): 1915 val = ndarray.__getitem__(self, obj) 1916 -> if issubclass(val.dtype.type, character) and not _len(val) == 0: 1917 temp = val.rstrip() 1918 if _len(temp) == 0: 1919 val = '' 1920 else: 1921 val = temp and here is some debugging info: (Pdb) p self (Pdb) p obj (0, 0) (Pdb) p val 'abc' (Pdb) p type(val) So "val" is a Python string, which of course doesn't have .dtype. What I don't understand yet is why val = ndarray.__getitem__(self, obj) returns a Python string. I've been debugging it for a few hours yesterday, but so far no luck. Then there are failures in the test_unicode.py of the following type: ====================================================================== FAIL: Check byteorder of single-dimensional objects ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_unicode.py", line 286, in test_valuesSD self.assertTrue(ua[0] != ua2[0]) AssertionError: False is not true I didn't dig into those yet. If anyone has any ideas, let me know. Ondrej

Ondřej Čertík

3:04 p.m.

On Sat, Jul 28, 2012 at 7:58 AM, Ondřej Čertík wrote: [...]

...

Here is the code in defchararray.py:

1911 if not _globalvar and self.dtype.char not in 'SUbc': 1912 raise ValueError("Can only create a chararray from string data.") 1913 1914 def __getitem__(self, obj): 1915 val = ndarray.__getitem__(self, obj) 1916 -> if issubclass(val.dtype.type, character) and not _len(val) == 0: 1917 temp = val.rstrip() 1918 if _len(temp) == 0: 1919 val = '' 1920 else: 1921 val = temp

and here is some debugging info:

Python 3.3:

...

(Pdb) p self (Pdb) p obj (0, 0) (Pdb) p val 'abc' (Pdb) p type(val)

Python 3.2: (Pdb) p self chararray([['abc', '123'], ['789', 'xyz']], dtype=' So I think there might be some conversion issues int the chararray, that instead of using numpy.str_, it uses Python's str. Weird. Ondrej

Ondřej Čertík

3:12 p.m.

On Sat, Jul 28, 2012 at 8:04 AM, Ondřej Čertík wrote:

...

On Sat, Jul 28, 2012 at 7:58 AM, Ondřej Čertík wrote: [...]

...
Here is the code in defchararray.py:

1911 if not _globalvar and self.dtype.char not in 'SUbc': 1912 raise ValueError("Can only create a chararray from string data.") 1913 1914 def __getitem__(self, obj): 1915 val = ndarray.__getitem__(self, obj) 1916 -> if issubclass(val.dtype.type, character) and not _len(val) == 0: 1917 temp = val.rstrip() 1918 if _len(temp) == 0: 1919 val = '' 1920 else: 1921 val = temp

and here is some debugging info:

Python 3.3:

...
(Pdb) p self (Pdb) p obj (0, 0) (Pdb) p val 'abc' (Pdb) p type(val)

Python 3.2:

(Pdb) p self chararray([['abc', '123'], ['789', 'xyz']], dtype='

So I think there might be some conversion issues int the chararray, that instead of using numpy.str_, it uses Python's str. Weird.

Ok, found this minimal example of the problem. Python 3.3:

...

...
...
from numpy import array a = array(["123", "abc"]) a array(['123', 'abc'], dtype='

Python 3.2:

...

...
...
from numpy import array a = array(["123", "abc"]) a array(['123', 'abc'], dtype='

So at some point, the strings get converted to numpy strings in 3.2, but not in 3.3. Ondrej

Stefan Krah

6:19 p.m.

Ond??ej ??ert??k wrote:

...

So at some point, the strings get converted to numpy strings in 3.2, but not in 3.3.

PyArray_Scalar() must return a subtype of PyUnicodeObject. I'm boldly assuming that data is in utf-32. If so, then this unoptimized version should work: diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/scalarapi.c index 2e255c0..c134aed 100644 --- a/numpy/core/src/multiarray/scalarapi.c +++ b/numpy/core/src/multiarray/scalarapi.c @@ -643,7 +643,20 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject *base) } #if PY_VERSION_HEX >= 0x03030000 if (type_num == NPY_UNICODE) { - return PyUnicode_FromKindAndData(PyUnicode_4BYTE_KIND, data, itemsize/4); + PyObject *b, *args; + b = PyBytes_FromStringAndSize(data, itemsize); + if (b == NULL) { + return NULL; + } + args = Py_BuildValue("(Os)", b, "utf-32"); + if (args == NULL) { + Py_DECREF(b); + return NULL; + } + obj = type->tp_new(type, args, NULL); + Py_DECREF(b); + Py_DECREF(args); + return obj; } #endif if (type->tp_itemsize != 0) { Stefan Krah

Ondřej Čertík

8:43 p.m.

On Sat, Jul 28, 2012 at 11:19 AM, Stefan Krah wrote:

...

Ond??ej ??ert??k wrote:

...
So at some point, the strings get converted to numpy strings in 3.2, but not in 3.3.

PyArray_Scalar() must return a subtype of PyUnicodeObject. I'm boldly assuming that data is in utf-32. If so, then this unoptimized version should work:

diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/scalarapi.c index 2e255c0..c134aed 100644 --- a/numpy/core/src/multiarray/scalarapi.c +++ b/numpy/core/src/multiarray/scalarapi.c @@ -643,7 +643,20 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject *base) } #if PY_VERSION_HEX >= 0x03030000 if (type_num == NPY_UNICODE) { - return PyUnicode_FromKindAndData(PyUnicode_4BYTE_KIND, data, itemsize/4);

Why doesn't PyUnicode_FromKindAndData return a subtype of PyUnicodeObject? http://docs.python.org/dev/c-api/unicode.html#PyUnicode_FromKindAndData

...

+ PyObject *b, *args; + b = PyBytes_FromStringAndSize(data, itemsize); + if (b == NULL) { + return NULL; + } + args = Py_BuildValue("(Os)", b, "utf-32"); + if (args == NULL) { + Py_DECREF(b); + return NULL; + } + obj = type->tp_new(type, args, NULL); + Py_DECREF(b); + Py_DECREF(args); + return obj; } #endif if (type->tp_itemsize != 0) {

Nice!! I pushed your patch into the PR, now it works great in Python 3.3. There are still other failures: https://gist.github.com/3194707 But this particular bug is fixed. Thanks for your help! Ondrej

Ondřej Čertík

10:04 p.m.

...

...
...
from numpy import array a = array(["abc"]) b = a.newbyteorder() a.dtype

...

...
...
b.dtype

...

...
...
a[0].dtype

...

...
...
b[0].dtype

Many of the failures in https://gist.github.com/3194707/5696c8d3091b16ba8a9f00a921d512ed02e94d71 are of the type: ====================================================================== FAIL: Check byteorder of single-dimensional objects ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_unicode.py", line 286, in test_valuesSD self.assertTrue(ua[0] != ua2[0]) AssertionError: False is not true and those are caused by the following minimal example: Python 3.2: dtype('U3') dtype('

...

...
...
a[0] == b[0] False a[0] 'abc' b[0] 'ៀ\udc00埀\udc00韀\udc00'

...

...
...
from numpy import array a = array(["abc"]) b = a.newbyteorder() a.dtype

...

...
...
b.dtype

...

...
...
a[0].dtype

...

...
...
b[0].dtype

Python 3.3: dtype('U3') dtype('

...

...
...
a[0] == b[0] True a[0] 'abc' b[0] 'abc'

So somehow the newbyteorder() method doesn't change the dtype of the elements in our new code. This method is implemented in numpy/core/src/multiarray/descriptor.c (I think), but so far I don't see where the problem could be. Any ideas? Ondrej

Ondřej Čertík

10:31 p.m.

On Sat, Jul 28, 2012 at 3:04 PM, Ondřej Čertík wrote:

...

Many of the failures in https://gist.github.com/3194707/5696c8d3091b16ba8a9f00a921d512ed02e94d71 are of the type:

====================================================================== FAIL: Check byteorder of single-dimensional objects ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_unicode.py", line 286, in test_valuesSD self.assertTrue(ua[0] != ua2[0]) AssertionError: False is not true

and those are caused by the following minimal example:

Python 3.2:

...
...
...
from numpy import array a = array(["abc"]) b = a.newbyteorder() a.dtype dtype('U3') a[0].dtype dtype('
Python 3.3:

...
...
...
from numpy import array a = array(["abc"]) b = a.newbyteorder() a.dtype dtype('U3') a[0].dtype dtype('
So somehow the newbyteorder() method doesn't change the dtype of the elements in our new code. This method is implemented in numpy/core/src/multiarray/descriptor.c (I think), but so far I don't see where the problem could be.

Any ideas?

...

...
...
from numpy import array a = array(["abc"]) b = a.newbyteorder() a.dtype

...

...
...
b.dtype

...

...
...
a[0].dtype

Ok, after some investigating, I think we need to do something along these lines: diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/s index c134aed..daf7fc4 100644 --- a/numpy/core/src/multiarray/scalarapi.c +++ b/numpy/core/src/multiarray/scalarapi.c @@ -644,7 +644,20 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject * #if PY_VERSION_HEX >= 0x03030000 if (type_num == NPY_UNICODE) { PyObject *b, *args; - b = PyBytes_FromStringAndSize(data, itemsize); + if (swap) { + char *buffer; + buffer = malloc(itemsize); + if (buffer == NULL) { + PyErr_NoMemory(); + } + memcpy(buffer, data, itemsize); + byte_swap_vector(buffer, itemsize, 4); + b = PyBytes_FromStringAndSize(buffer, itemsize); + // We have to deallocate this later, otherwise we get a segfault... + //free(buffer); + } else { + b = PyBytes_FromStringAndSize(data, itemsize); + } if (b == NULL) { return NULL; } This particular implementation still fails though: dtype('U3') dtype('

...

...
...
b[0].dtype Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: codepoint not in range(0x110000) a[0] == b[0] Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: codepoint not in range(0x110000) a[0] 'abc' b[0] Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: codepoint not in range(0x110000)

But I think that we simply need to take into account the "swap" flag. Ondrej

Ondřej Čertík

29 Jul 29 Jul

12:09 a.m.

On Sat, Jul 28, 2012 at 3:31 PM, Ondřej Čertík wrote:

...

On Sat, Jul 28, 2012 at 3:04 PM, Ondřej Čertík wrote:

...
Many of the failures in https://gist.github.com/3194707/5696c8d3091b16ba8a9f00a921d512ed02e94d71 are of the type:

====================================================================== FAIL: Check byteorder of single-dimensional objects ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_unicode.py", line 286, in test_valuesSD self.assertTrue(ua[0] != ua2[0]) AssertionError: False is not true

and those are caused by the following minimal example:

Python 3.2:

...
...
...
from numpy import array a = array(["abc"]) b = a.newbyteorder() a.dtype dtype('U3') a[0].dtype dtype('
Python 3.3:

...
...
...
from numpy import array a = array(["abc"]) b = a.newbyteorder() a.dtype dtype('U3') a[0].dtype dtype('
So somehow the newbyteorder() method doesn't change the dtype of the elements in our new code. This method is implemented in numpy/core/src/multiarray/descriptor.c (I think), but so far I don't see where the problem could be.

Any ideas?

Ok, after some investigating, I think we need to do something along these lines:

diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/s index c134aed..daf7fc4 100644 --- a/numpy/core/src/multiarray/scalarapi.c +++ b/numpy/core/src/multiarray/scalarapi.c @@ -644,7 +644,20 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject * #if PY_VERSION_HEX >= 0x03030000 if (type_num == NPY_UNICODE) { PyObject *b, *args; - b = PyBytes_FromStringAndSize(data, itemsize); + if (swap) { + char *buffer; + buffer = malloc(itemsize); + if (buffer == NULL) { + PyErr_NoMemory(); + } + memcpy(buffer, data, itemsize); + byte_swap_vector(buffer, itemsize, 4); + b = PyBytes_FromStringAndSize(buffer, itemsize); + // We have to deallocate this later, otherwise we get a segfault... + //free(buffer); + } else { + b = PyBytes_FromStringAndSize(data, itemsize); + } if (b == NULL) { return NULL; }

This particular implementation still fails though:

...
...
...
from numpy import array a = array(["abc"]) b = a.newbyteorder() a.dtype dtype('U3') a[0].dtype dtype('
But I think that we simply need to take into account the "swap" flag.

Ok, so first of all, I tried to disable the swapping in Python 3.2: if (swap) { byte_swap_vector(buffer, itemsize >> 2, 4); } And then it behaves *exactly* as in Python 3.3. So I am pretty sure that the problem is right there and something along the lines of my patch above should fix it. I had a few bugs there, here is the correct version: diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/s index c134aed..bed73f7 100644 --- a/numpy/core/src/multiarray/scalarapi.c +++ b/numpy/core/src/multiarray/scalarapi.c @@ -644,7 +644,19 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject * #if PY_VERSION_HEX >= 0x03030000 if (type_num == NPY_UNICODE) { PyObject *b, *args; - b = PyBytes_FromStringAndSize(data, itemsize); + if (swap) { + char *buffer; + buffer = malloc(itemsize); + if (buffer == NULL) { + PyErr_NoMemory(); + } + memcpy(buffer, data, itemsize); + byte_swap_vector(buffer, itemsize >> 2, 4); + b = PyBytes_FromStringAndSize(buffer, itemsize); + free(buffer); + } else { + b = PyBytes_FromStringAndSize(data, itemsize); + } if (b == NULL) { return NULL; } That works well, except that it gives the UnicodeDecodeError:

...

...
...
b[0].dtype NULL Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: codepoint not in range(0x110000)

This error is actually triggered by this line: obj = type->tp_new(type, args, NULL); in the patch by Stefan above. So I think what is happening is that it simply tries to convert it from bytes to a string and fails. That makes great sense. The question is why doesn't it fail in exactly the same way in Python 3.2? I think it's because the conversion check is bypassed somehow. Stefan, I think we need to swap it after the object is created. I am still experimenting with this. Ondrej

Ondřej Čertík

1:09 a.m.

On Sat, Jul 28, 2012 at 5:09 PM, Ondřej Čertík wrote:

...

On Sat, Jul 28, 2012 at 3:31 PM, Ondřej Čertík wrote:

...
On Sat, Jul 28, 2012 at 3:04 PM, Ondřej Čertík wrote:

...
Many of the failures in https://gist.github.com/3194707/5696c8d3091b16ba8a9f00a921d512ed02e94d71 are of the type:

====================================================================== FAIL: Check byteorder of single-dimensional objects ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_unicode.py", line 286, in test_valuesSD self.assertTrue(ua[0] != ua2[0]) AssertionError: False is not true

and those are caused by the following minimal example:

Python 3.2:

...
...
...
from numpy import array a = array(["abc"]) b = a.newbyteorder() a.dtype dtype('U3') a[0].dtype dtype('
Python 3.3:

...
...
...
from numpy import array a = array(["abc"]) b = a.newbyteorder() a.dtype dtype('U3') a[0].dtype dtype('
So somehow the newbyteorder() method doesn't change the dtype of the elements in our new code. This method is implemented in numpy/core/src/multiarray/descriptor.c (I think), but so far I don't see where the problem could be.

Any ideas?

Ok, after some investigating, I think we need to do something along these lines:

diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/s index c134aed..daf7fc4 100644 --- a/numpy/core/src/multiarray/scalarapi.c +++ b/numpy/core/src/multiarray/scalarapi.c @@ -644,7 +644,20 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject * #if PY_VERSION_HEX >= 0x03030000 if (type_num == NPY_UNICODE) { PyObject *b, *args; - b = PyBytes_FromStringAndSize(data, itemsize); + if (swap) { + char *buffer; + buffer = malloc(itemsize); + if (buffer == NULL) { + PyErr_NoMemory(); + } + memcpy(buffer, data, itemsize); + byte_swap_vector(buffer, itemsize, 4); + b = PyBytes_FromStringAndSize(buffer, itemsize); + // We have to deallocate this later, otherwise we get a segfault... + //free(buffer); + } else { + b = PyBytes_FromStringAndSize(data, itemsize); + } if (b == NULL) { return NULL; }

This particular implementation still fails though:

...
...
...
from numpy import array a = array(["abc"]) b = a.newbyteorder() a.dtype dtype('U3') a[0].dtype dtype('
But I think that we simply need to take into account the "swap" flag.

Ok, so first of all, I tried to disable the swapping in Python 3.2:

if (swap) { byte_swap_vector(buffer, itemsize >> 2, 4); }

And then it behaves *exactly* as in Python 3.3. So I am pretty sure that the problem is right there and something along the lines of my patch above should fix it. I had a few bugs there, here is the correct version:

diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/s index c134aed..bed73f7 100644 --- a/numpy/core/src/multiarray/scalarapi.c +++ b/numpy/core/src/multiarray/scalarapi.c @@ -644,7 +644,19 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject * #if PY_VERSION_HEX >= 0x03030000 if (type_num == NPY_UNICODE) { PyObject *b, *args; - b = PyBytes_FromStringAndSize(data, itemsize); + if (swap) { + char *buffer; + buffer = malloc(itemsize); + if (buffer == NULL) { + PyErr_NoMemory(); + } + memcpy(buffer, data, itemsize); + byte_swap_vector(buffer, itemsize >> 2, 4); + b = PyBytes_FromStringAndSize(buffer, itemsize); + free(buffer); + } else { + b = PyBytes_FromStringAndSize(data, itemsize); + } if (b == NULL) { return NULL; }

That works well, except that it gives the UnicodeDecodeError:

...
...
...
b[0].dtype NULL Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: codepoint not in range(0x110000)

This error is actually triggered by this line:

obj = type->tp_new(type, args, NULL);

in the patch by Stefan above. So I think what is happening is that it simply tries to convert it from bytes to a string and fails. That makes great sense. The question is why doesn't it fail in exactly the same way in Python 3.2? I think it's because the conversion check is bypassed somehow. Stefan, I think we need to swap it after the object is created. I am still experimenting with this.

Well, I simply went to the Python sources and then implemented a solution that works with this patch: https://github.com/certik/numpy/commit/36fcd1327746a3d0ad346ce58ffbe00506e27... So now the PR actually seems to work. The rest of the failures are here: https://gist.github.com/3195520 and they seem to be unrelated. Can somebody please review this PR? https://github.com/numpy/numpy/pull/366 I will squash the commits after it's reviewed (I want to keep the history there for now). Ondrej

Christoph Gohlke

1:17 a.m.

On 7/28/2012 6:09 PM, Ondřej Čertík wrote:

...

On Sat, Jul 28, 2012 at 5:09 PM, Ondřej Čertík wrote:

...
On Sat, Jul 28, 2012 at 3:31 PM, Ondřej Čertík wrote:

...
On Sat, Jul 28, 2012 at 3:04 PM, Ondřej Čertík wrote:

...
Many of the failures in https://gist.github.com/3194707/5696c8d3091b16ba8a9f00a921d512ed02e94d71 are of the type:

====================================================================== FAIL: Check byteorder of single-dimensional objects ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_unicode.py", line 286, in test_valuesSD self.assertTrue(ua[0] != ua2[0]) AssertionError: False is not true

and those are caused by the following minimal example:

Python 3.2:

...
...
> from numpy import array > a = array(["abc"]) > b = a.newbyteorder() > a.dtype dtype(' b.dtype dtype('>U3') > a[0].dtype dtype(' b[0].dtype dtype(' a[0] == b[0] False > a[0] 'abc' > b[0] 'ៀ\udc00埀\udc00韀\udc00'

Python 3.3:

...
...
> from numpy import array > a = array(["abc"]) > b = a.newbyteorder() > a.dtype dtype(' b.dtype dtype('>U3') > a[0].dtype dtype(' b[0].dtype dtype(' a[0] == b[0] True > a[0] 'abc' > b[0] 'abc'

So somehow the newbyteorder() method doesn't change the dtype of the elements in our new code. This method is implemented in numpy/core/src/multiarray/descriptor.c (I think), but so far I don't see where the problem could be.

Any ideas?

Ok, after some investigating, I think we need to do something along these lines:

diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/s index c134aed..daf7fc4 100644 --- a/numpy/core/src/multiarray/scalarapi.c +++ b/numpy/core/src/multiarray/scalarapi.c @@ -644,7 +644,20 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject * #if PY_VERSION_HEX >= 0x03030000 if (type_num == NPY_UNICODE) { PyObject *b, *args; - b = PyBytes_FromStringAndSize(data, itemsize); + if (swap) { + char *buffer; + buffer = malloc(itemsize); + if (buffer == NULL) { + PyErr_NoMemory(); + } + memcpy(buffer, data, itemsize); + byte_swap_vector(buffer, itemsize, 4); + b = PyBytes_FromStringAndSize(buffer, itemsize); + // We have to deallocate this later, otherwise we get a segfault... + //free(buffer); + } else { + b = PyBytes_FromStringAndSize(data, itemsize); + } if (b == NULL) { return NULL; }

This particular implementation still fails though:

...
...
...
from numpy import array a = array(["abc"]) b = a.newbyteorder() a.dtype dtype('U3') a[0].dtype dtype('
But I think that we simply need to take into account the "swap" flag.

Ok, so first of all, I tried to disable the swapping in Python 3.2:

if (swap) { byte_swap_vector(buffer, itemsize >> 2, 4); }

And then it behaves *exactly* as in Python 3.3. So I am pretty sure that the problem is right there and something along the lines of my patch above should fix it. I had a few bugs there, here is the correct version:

diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/s index c134aed..bed73f7 100644 --- a/numpy/core/src/multiarray/scalarapi.c +++ b/numpy/core/src/multiarray/scalarapi.c @@ -644,7 +644,19 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject * #if PY_VERSION_HEX >= 0x03030000 if (type_num == NPY_UNICODE) { PyObject *b, *args; - b = PyBytes_FromStringAndSize(data, itemsize); + if (swap) { + char *buffer; + buffer = malloc(itemsize); + if (buffer == NULL) { + PyErr_NoMemory(); + } + memcpy(buffer, data, itemsize); + byte_swap_vector(buffer, itemsize >> 2, 4); + b = PyBytes_FromStringAndSize(buffer, itemsize); + free(buffer); + } else { + b = PyBytes_FromStringAndSize(data, itemsize); + } if (b == NULL) { return NULL; }

That works well, except that it gives the UnicodeDecodeError:

...
...
...
b[0].dtype NULL Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: codepoint not in range(0x110000)

This error is actually triggered by this line:

obj = type->tp_new(type, args, NULL);

in the patch by Stefan above. So I think what is happening is that it simply tries to convert it from bytes to a string and fails. That makes great sense. The question is why doesn't it fail in exactly the same way in Python 3.2? I think it's because the conversion check is bypassed somehow. Stefan, I think we need to swap it after the object is created. I am still experimenting with this.

Well, I simply went to the Python sources and then implemented a solution that works with this patch:

https://github.com/certik/numpy/commit/36fcd1327746a3d0ad346ce58ffbe00506e27...

So now the PR actually seems to work. The rest of the failures are here:

https://gist.github.com/3195520

and they seem to be unrelated. Can somebody please review this PR?

https://github.com/numpy/numpy/pull/366

I will squash the commits after it's reviewed (I want to keep the history there for now).

Ondrej

Thank you. I backported the PR to numpy 1.6.2 and it works for me on win-amd64-py3.3 with the msvc10 compiler. I get the same 5 test failures of the kind: AssertionError: Items are not equal: ACTUAL: () DESIRED: None Christoph

Christoph Gohlke

6:25 a.m.

On 7/28/2012 6:17 PM, Christoph Gohlke wrote:

...

On 7/28/2012 6:09 PM, Ondřej Čertík wrote:

...
On Sat, Jul 28, 2012 at 5:09 PM, Ondřej Čertík wrote:

...
On Sat, Jul 28, 2012 at 3:31 PM, Ondřej Čertík wrote:

...
On Sat, Jul 28, 2012 at 3:04 PM, Ondřej Čertík wrote:

...
Many of the failures in https://gist.github.com/3194707/5696c8d3091b16ba8a9f00a921d512ed02e94d71 are of the type:

====================================================================== FAIL: Check byteorder of single-dimensional objects ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_unicode.py", line 286, in test_valuesSD self.assertTrue(ua[0] != ua2[0]) AssertionError: False is not true

and those are caused by the following minimal example:

Python 3.2:

...
>> from numpy import array >> a = array(["abc"]) >> b = a.newbyteorder() >> a.dtype dtype('> b.dtype dtype('>U3') >> a[0].dtype dtype('> b[0].dtype dtype('> a[0] == b[0] False >> a[0] 'abc' >> b[0] 'ៀ\udc00埀\udc00韀\udc00'

Python 3.3:

...
>> from numpy import array >> a = array(["abc"]) >> b = a.newbyteorder() >> a.dtype dtype('> b.dtype dtype('>U3') >> a[0].dtype dtype('> b[0].dtype dtype('> a[0] == b[0] True >> a[0] 'abc' >> b[0] 'abc'

So somehow the newbyteorder() method doesn't change the dtype of the elements in our new code. This method is implemented in numpy/core/src/multiarray/descriptor.c (I think), but so far I don't see where the problem could be.

Any ideas?

Ok, after some investigating, I think we need to do something along these lines:

diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/s index c134aed..daf7fc4 100644 --- a/numpy/core/src/multiarray/scalarapi.c +++ b/numpy/core/src/multiarray/scalarapi.c @@ -644,7 +644,20 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject * #if PY_VERSION_HEX >= 0x03030000 if (type_num == NPY_UNICODE) { PyObject *b, *args; - b = PyBytes_FromStringAndSize(data, itemsize); + if (swap) { + char *buffer; + buffer = malloc(itemsize); + if (buffer == NULL) { + PyErr_NoMemory(); + } + memcpy(buffer, data, itemsize); + byte_swap_vector(buffer, itemsize, 4); + b = PyBytes_FromStringAndSize(buffer, itemsize); + // We have to deallocate this later, otherwise we get a segfault... + //free(buffer); + } else { + b = PyBytes_FromStringAndSize(data, itemsize); + } if (b == NULL) { return NULL; }

This particular implementation still fails though:

...
...
> from numpy import array > a = array(["abc"]) > b = a.newbyteorder() > a.dtype dtype(' b.dtype dtype('>U3') > a[0].dtype dtype(' b[0].dtype Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: codepoint not in range(0x110000) > a[0] == b[0] Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: codepoint not in range(0x110000) > a[0] 'abc' > b[0] Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: codepoint not in range(0x110000)

But I think that we simply need to take into account the "swap" flag.

Ok, so first of all, I tried to disable the swapping in Python 3.2:

if (swap) { byte_swap_vector(buffer, itemsize >> 2, 4); }

And then it behaves *exactly* as in Python 3.3. So I am pretty sure that the problem is right there and something along the lines of my patch above should fix it. I had a few bugs there, here is the correct version:

diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/s index c134aed..bed73f7 100644 --- a/numpy/core/src/multiarray/scalarapi.c +++ b/numpy/core/src/multiarray/scalarapi.c @@ -644,7 +644,19 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject * #if PY_VERSION_HEX >= 0x03030000 if (type_num == NPY_UNICODE) { PyObject *b, *args; - b = PyBytes_FromStringAndSize(data, itemsize); + if (swap) { + char *buffer; + buffer = malloc(itemsize); + if (buffer == NULL) { + PyErr_NoMemory(); + } + memcpy(buffer, data, itemsize); + byte_swap_vector(buffer, itemsize >> 2, 4); + b = PyBytes_FromStringAndSize(buffer, itemsize); + free(buffer); + } else { + b = PyBytes_FromStringAndSize(data, itemsize); + } if (b == NULL) { return NULL; }

That works well, except that it gives the UnicodeDecodeError:

...
...
...
b[0].dtype NULL Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: codepoint not in range(0x110000)

This error is actually triggered by this line:

obj = type->tp_new(type, args, NULL);

in the patch by Stefan above. So I think what is happening is that it simply tries to convert it from bytes to a string and fails. That makes great sense. The question is why doesn't it fail in exactly the same way in Python 3.2? I think it's because the conversion check is bypassed somehow. Stefan, I think we need to swap it after the object is created. I am still experimenting with this.

Well, I simply went to the Python sources and then implemented a solution that works with this patch:

https://github.com/certik/numpy/commit/36fcd1327746a3d0ad346ce58ffbe00506e27...

So now the PR actually seems to work. The rest of the failures are here:

https://gist.github.com/3195520

and they seem to be unrelated. Can somebody please review this PR?

https://github.com/numpy/numpy/pull/366

I will squash the commits after it's reviewed (I want to keep the history there for now).

Ondrej

Thank you. I backported the PR to numpy 1.6.2 and it works for me on win-amd64-py3.3 with the msvc10 compiler. I get the same 5 test failures of the kind:

AssertionError: Items are not equal: ACTUAL: () DESIRED: None

Christoph

Pull request #367 should fix the NewBufferProtocol test failures. https://github.com/numpy/numpy/pull/367 Christoph

Stefan Krah

10:40 a.m.

Ond??ej ??ert??k wrote:

...

Well, I simply went to the Python sources and then implemented a solution that works with this patch:

https://github.com/certik/numpy/commit/36fcd1327746a3d0ad346ce58ffbe00506e27...

...

https://github.com/numpy/numpy/pull/366

Nice! I hit the same problem yesterday: unicode_new() does not accept byte-swapped input with an encoding, since the input is not valid. But your solution circumvents the validation. I'm not sure what the use case is for byte-swapped (invalid?) unicode strings, but the approach looks good to me in the sense that it does the same thing as the Py_UNICODE_WIDE path in 3.2. In PyArray_Scalar() I only have these comments, two of which are stylistic: - I think the 'size' parameter in PyUnicode_New() refers to the number of code points (UCS4 in this case), so: PyUnicode_New(itemsize >> 2, max_char) - The 'b' variable could be renamed to 'u' now. - PyArray_Scalar() is beginning to look a little crowded. Perhaps the whole PY_VERSION_HEX >= 0x03030000 block could go into a separate function such as: NPY_NO_EXPORT PyObject * get_unicode_scalar_3_3(PyTypeObject *type, void *data, Py_ssize_t itemsize, int swap); Then there's another problem in numpy.test() if Python 3.3 is compiled --with-pydebug: .python3.3: numpy/core/src/multiarray/common.c:161: PyArray_DTypeFromObjectHelper: Assertion `((((((PyObject*)(temp))->ob_type))->tp_flags & ((1L<<27))) != 0)' failed. Aborted Stefan Krah

Stefan Krah

11:52 a.m.

Stefan Krah wrote:

...

Then there's another problem in numpy.test() if Python 3.3 is compiled --with-pydebug:

.python3.3: numpy/core/src/multiarray/common.c:161: PyArray_DTypeFromObjectHelper: Assertion `((((((PyObject*)(temp))->ob_type))->tp_flags & ((1L<<27))) != 0)' failed. Aborted

This also occurs with Python 3.2, so it's unrelated to the Unicode changes: http://projects.scipy.org/numpy/ticket/2193 Stefan Krah

Stefan Krah

1:42 p.m.

Stefan Krah wrote:

...

...
.python3.3: numpy/core/src/multiarray/common.c:161: PyArray_DTypeFromObjectHelper: Assertion `((((((PyObject*)(temp))->ob_type))->tp_flags & ((1L<<27))) != 0)' failed. Aborted

This also occurs with Python 3.2, so it's unrelated to the Unicode changes:

http://projects.scipy.org/numpy/ticket/2193

I've uploaded a patch for the issue. Stefan Krah

Ondřej Čertík

3:50 p.m.

On Sun, Jul 29, 2012 at 3:40 AM, Stefan Krah wrote:

...

Ond??ej ??ert??k wrote:

...
Well, I simply went to the Python sources and then implemented a solution that works with this patch:

https://github.com/certik/numpy/commit/36fcd1327746a3d0ad346ce58ffbe00506e27...

...
https://github.com/numpy/numpy/pull/366

Nice! I hit the same problem yesterday: unicode_new() does not accept byte-swapped input with an encoding, since the input is not valid. But your solution circumvents the validation.

I'm not sure what the use case is for byte-swapped (invalid?) unicode strings, but the approach looks good to me in the sense that it does the same thing as the Py_UNICODE_WIDE path in 3.2.

In PyArray_Scalar() I only have these comments, two of which are stylistic:

- I think the 'size' parameter in PyUnicode_New() refers to the number of code points (UCS4 in this case), so:

PyUnicode_New(itemsize >> 2, max_char)

Right. Done.

...

- The 'b' variable could be renamed to 'u' now.

Done.

...

- PyArray_Scalar() is beginning to look a little crowded. Perhaps the whole PY_VERSION_HEX >= 0x03030000 block could go into a separate function such as:

NPY_NO_EXPORT PyObject * get_unicode_scalar_3_3(PyTypeObject *type, void *data, Py_ssize_t itemsize, int swap);

I didn't do this, as I think the function is fine as it is. If further refactoring is needed, then one should probably create 3 functions, one for 3.3, one for <3.3-wide and one for <3.3-narrow. I've also rebased and squashed the commits, so now it is ready to be merged: https://github.com/numpy/numpy/pull/366 Thanks Stefan for your help. Can somebody with push access please review it? Ondrej

Stefan Krah

1:56 p.m.

Ond??ej ??ert??k wrote:

...

https://github.com/numpy/numpy/pull/366

...

...
...
from numpy import array [206376 refs] a = array(["abc"]) [206382 refs] b = a.newbyteorder() [206387 refs] b

Using Python 3.3 compiled --with-pydebug it appears to be impossible to fool the new Unicode implementation with byte-swapped data: Apply the patch from: http://projects.scipy.org/numpy/ticket/2193 Then: Python 3.3.0b1 (default:68e2690a471d+, Jul 29 2012, 15:28:41) [GCC 4.4.3] on linux Type "help", "copyright", "credits" or "license" for more information. python3.3: Objects/unicodeobject.c:401: _PyUnicode_CheckConsistency: Assertion `maxchar <= 0x10ffff' failed. Program received signal SIGABRT, Aborted. 0x00007ffff71e6a75 in *__GI_raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 64 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory. in ../nptl/sysdeps/unix/sysv/linux/raise.c (gdb) This should be expected since the byte-swapped strings aren't valid. Stefan Krah

Ondřej Čertík

3:12 p.m.

On Sun, Jul 29, 2012 at 6:56 AM, Stefan Krah wrote:

...

Ond??ej ??ert??k wrote:

...
https://github.com/numpy/numpy/pull/366

Using Python 3.3 compiled --with-pydebug it appears to be impossible to fool the new Unicode implementation with byte-swapped data:

Apply the patch from:

http://projects.scipy.org/numpy/ticket/2193

Then:

...
...
...
from numpy import array [206376 refs] a = array(["abc"]) [206382 refs] b = a.newbyteorder() [206387 refs] b

Python 3.3.0b1 (default:68e2690a471d+, Jul 29 2012, 15:28:41) [GCC 4.4.3] on linux Type "help", "copyright", "credits" or "license" for more information. python3.3: Objects/unicodeobject.c:401: _PyUnicode_CheckConsistency: Assertion `maxchar <= 0x10ffff' failed.

Program received signal SIGABRT, Aborted. 0x00007ffff71e6a75 in *__GI_raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 64 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory. in ../nptl/sysdeps/unix/sysv/linux/raise.c (gdb)

This should be expected since the byte-swapped strings aren't valid.

Exactly, I am aware that my solution is a hack. So is the Python 3.2 solution, except that Python 3.2 doesn't seem to have the _PyUnicode_CheckConsistency() function, so no checks are done. As such, I think that my PR simply extends the numpy approach to Python 3.3. A separate issue is that the swapping thing is a hack -- Travis, what is the purpose of the newbyteorder() and the need to swap the internals of the unicode object? Ondrej

Stefan Krah

3:26 p.m.

Ond??ej ??ert??k wrote:

...

...
This should be expected since the byte-swapped strings aren't valid.

Exactly, I am aware that my solution is a hack. So is the Python 3.2 solution, except that Python 3.2 doesn't seem to have the _PyUnicode_CheckConsistency() function, so no checks are done. As such, I think that my PR simply extends the numpy approach to Python 3.3.

Absolutely, I also think that using invalid Unicode strings in 3.2 looks kind of hackish. -- Nothing wrong with your 3.3 implementation, it's the general concept that I don't understand. Stefan Krah

Ronan Lamy

9:27 p.m.

Le samedi 28 juillet 2012 à 18:09 -0700, Ondřej Čertík a écrit :

...

So now the PR actually seems to work. The rest of the failures are here:

https://gist.github.com/3195520

I wanted to have a look at the import errors in your previous gist. How did you get rid of them? I can't even install numpy on 3.3 as setup.py chokes on 'import numpy.distutils.core': (py33)ronan@ronan-desktop:~/dev/numpy$ python setup.py install Converting to Python3 via 2to3... Running from numpy source directory. /home/ronan/dev/numpy/py33/lib/python3.3/distutils/__init__.py:16: ResourceWarning: unclosed file <_io.TextIOWrapper name='/usr/local/lib/python3.3/distutils/__init__.py' mode='r' encoding='UTF-8'> exec(open(os.path.join(distutils_path, '__init__.py')).read()) Traceback (most recent call last): File "setup.py", line 214, in <module> setup_package() File "setup.py", line 191, in setup_package from numpy.distutils.core import setup File "/home/ronan/dev/numpy/build/py3k/numpy/distutils/core.py", line 25, in <module> from numpy.distutils.command import config, config_compiler, \ File "/home/ronan/dev/numpy/build/py3k/numpy/distutils/command/__init__.py", line 17, in <module> __import__('distutils.command',globals(),locals(),distutils_all) ImportError: No module named 'distutils.command.install_clib' Actually, I don't even understand how this __import__() call can work on earlier versions, nor what it's trying to achieve.

Ondřej Čertík

9:45 p.m.

Hi Ronan! On Sun, Jul 29, 2012 at 2:27 PM, Ronan Lamy wrote:

...

Le samedi 28 juillet 2012 à 18:09 -0700, Ondřej Čertík a écrit :

...
So now the PR actually seems to work. The rest of the failures are here:

https://gist.github.com/3195520

I wanted to have a look at the import errors in your previous gist. How did you get rid of them? I can't even install numpy on 3.3 as setup.py

Do you mean this gist: https://gist.github.com/3194707/482382fb6fd6f0d756128d97ea6c892ddb31fff9 ? I have incorrectly run the tests from the wrong directory and numpy was picking up the wrong files to import --- I think either from the numpy directory directly (there is a check for this though), or from numpy/core or something, I don't remember anymore. So I then run the tests from /tmp and posted the correct result into the same gist as a new commit: https://gist.github.com/3194707/5696c8d3091b16ba8a9f00a921d512ed02e94d71

...

chokes on 'import numpy.distutils.core':

(py33)ronan@ronan-desktop:~/dev/numpy$ python setup.py install Converting to Python3 via 2to3... Running from numpy source directory. /home/ronan/dev/numpy/py33/lib/python3.3/distutils/__init__.py:16: ResourceWarning: unclosed file <_io.TextIOWrapper name='/usr/local/lib/python3.3/distutils/__init__.py' mode='r' encoding='UTF-8'> exec(open(os.path.join(distutils_path, '__init__.py')).read()) Traceback (most recent call last): File "setup.py", line 214, in <module> setup_package() File "setup.py", line 191, in setup_package from numpy.distutils.core import setup File "/home/ronan/dev/numpy/build/py3k/numpy/distutils/core.py", line 25, in <module> from numpy.distutils.command import config, config_compiler, \ File "/home/ronan/dev/numpy/build/py3k/numpy/distutils/command/__init__.py", line 17, in <module> __import__('distutils.command',globals(),locals(),distutils_all) ImportError: No module named 'distutils.command.install_clib'

Actually, I don't even understand how this __import__() call can work on earlier versions, nor what it's trying to achieve.

That's weird, I've never seen this error before. Try to install numpy using your regular Python like this: python setup.py install --prefix /tmp let's say. If it works, then something is wrong with your Python 3.3 installation. If you want to reproduce my setup, checkout my repo: https://github.com/certik/python-3.3 and from inside it, run: SPKG_LOCAL=`pwd`/xx MAKEFLAGS="-j4" sh spkg-install (adjust the "-j4" flag, or remove it). You need a few packages installed like zlib1g-dev and so on. Then install virtualenv by downloading the tar.gz and from inside it doing "/path/to/my/python-3.3/xx/bin/python3.3 setup.py install". Add the file /path/to/my/python-3.3/xx/bin/virtualenv-3.3 into your $PATH. Then: rm -rf $HOME/py33 virtualenv-3.3 $HOME/py33 . $HOME/py33/bin/activate go to your numpy directory and do "python setup.py install". To run tests, you also need to: TMPDIR=/tmp/numpy-env rm -rf $TMPDIR mkdir $TMPDIR cd $TMPDIR tar xzf $tarballs/nose-1.1.2.tar.gz cd nose-1.1.2 python setup.py install using the virtualenv environment. When I tried to install nose into the python installation in python-3.3./xx, then it failed... Ondrej

Ronan Lamy

30 Jul 30 Jul

1 a.m.

Le dimanche 29 juillet 2012 à 14:45 -0700, Ondřej Čertík a écrit :

...

Hi Ronan!

On Sun, Jul 29, 2012 at 2:27 PM, Ronan Lamy wrote:

...
Le samedi 28 juillet 2012 à 18:09 -0700, Ondřej Čertík a écrit :

...
So now the PR actually seems to work. The rest of the failures are here:

https://gist.github.com/3195520

I wanted to have a look at the import errors in your previous gist. How did you get rid of them? I can't even install numpy on 3.3 as setup.py

Do you mean this gist:

https://gist.github.com/3194707/482382fb6fd6f0d756128d97ea6c892ddb31fff9

? I have incorrectly run the tests from the wrong directory and numpy was picking up the wrong files to import --- I think either from the numpy directory directly (there is a check for this though), or from numpy/core or something, I don't remember anymore. So I then run the tests from /tmp and posted the correct result into the same gist as a new commit:

https://gist.github.com/3194707/5696c8d3091b16ba8a9f00a921d512ed02e94d71

Ah, OK. False alarm, then. I'm on the lookout for import errors with Python 3.3, as the import system has been completely rewritten and anything that relied on undocumented behaviour is likely to break.

...

...
chokes on 'import numpy.distutils.core':

(py33)ronan@ronan-desktop:~/dev/numpy$ python setup.py install Converting to Python3 via 2to3... Running from numpy source directory. /home/ronan/dev/numpy/py33/lib/python3.3/distutils/__init__.py:16: ResourceWarning: unclosed file <_io.TextIOWrapper name='/usr/local/lib/python3.3/distutils/__init__.py' mode='r' encoding='UTF-8'> exec(open(os.path.join(distutils_path, '__init__.py')).read()) Traceback (most recent call last): File "setup.py", line 214, in <module> setup_package() File "setup.py", line 191, in setup_package from numpy.distutils.core import setup File "/home/ronan/dev/numpy/build/py3k/numpy/distutils/core.py", line 25, in <module> from numpy.distutils.command import config, config_compiler, \ File "/home/ronan/dev/numpy/build/py3k/numpy/distutils/command/__init__.py", line 17, in <module> __import__('distutils.command',globals(),locals(),distutils_all) ImportError: No module named 'distutils.command.install_clib'

Actually, I don't even understand how this __import__() call can work on earlier versions, nor what it's trying to achieve.

That's weird, I've never seen this error before. Try to install numpy using your regular Python like this:

python setup.py install --prefix /tmp

let's say. If it works, then something is wrong with your Python 3.3

I simply used a virtualenv (you might need to get the latest from PyPI), roughly as follows: virtualenv -p python3.3 py33 py33/bin/python setup.py install It worked fine with 3.2 and 2.7, but not with 3.3.

...

installation. If you want to reproduce my setup, checkout my repo:

https://github.com/certik/python-3.3

and from inside it, run:

SPKG_LOCAL=`pwd`/xx MAKEFLAGS="-j4" sh spkg-install

(adjust the "-j4" flag, or remove it). You need a few packages installed like zlib1g-dev and so on. Then install virtualenv by downloading the tar.gz and from inside it doing "/path/to/my/python-3.3/xx/bin/python3.3 setup.py install". Add the file /path/to/my/python-3.3/xx/bin/virtualenv-3.3 into your $PATH.

Then:

rm -rf $HOME/py33 virtualenv-3.3 $HOME/py33 . $HOME/py33/bin/activate

go to your numpy directory and do "python setup.py install". To run tests, you also need to:

TMPDIR=/tmp/numpy-env rm -rf $TMPDIR mkdir $TMPDIR cd $TMPDIR tar xzf $tarballs/nose-1.1.2.tar.gz cd nose-1.1.2 python setup.py install

using the virtualenv environment. When I tried to install nose into the python installation in python-3.3./xx, then it failed...

Installing nose from a git checkout works fine for me. Maybe nose-1.1.2 isn't really compatible with Python 3.3? Anyway, I managed to compile (by blanking numpy/distutils/command/__init__.py) and to run the tests. I only see the 2 pickle errors from your latest gist. So that's all good!

Ronan Lamy

3:57 a.m.

Le lundi 30 juillet 2012 à 02:00 +0100, Ronan Lamy a écrit :

...

Anyway, I managed to compile (by blanking numpy/distutils/command/__init__.py) and to run the tests. I only see the 2 pickle errors from your latest gist. So that's all good!

And the cause of these errors is that running the test suite somehow corrupts Python's internal cache of bytes objects, causing the following:

...

...
...
b'\x01XXX'[0:1] b'\xbb'

Ronan Lamy

4:10 p.m.

Le lundi 30 juillet 2012 à 04:57 +0100, Ronan Lamy a écrit :

...

Le lundi 30 juillet 2012 à 02:00 +0100, Ronan Lamy a écrit :

...
Anyway, I managed to compile (by blanking numpy/distutils/command/__init__.py) and to run the tests. I only see the 2 pickle errors from your latest gist. So that's all good!

And the cause of these errors is that running the test suite somehow corrupts Python's internal cache of bytes objects, causing the following:

...
...
...
b'\x01XXX'[0:1] b'\xbb'

The culprit is test_pickle_string_overwrite() in test_regression.py. The test actually tries to check for that kind of problem, but on Python 3, it only manages to trigger it without detecting it. Here's a simple way to reproduce the issue:

...

...
...
a = numpy.array([1], 'b') b = pickle.loads(pickle.dumps(a)) b[0] = 77 b'\x01 '[0:1] b'M'

Actually, this problem is probably quite old: I can see it in 1.6.1 w/ Python 3.2.3. 3.3 only makes it more visible. I'll open an issue on GitHub ASAP.

Ronan Lamy

5:04 p.m.

Le lundi 30 juillet 2012 à 17:10 +0100, Ronan Lamy a écrit :

...

Le lundi 30 juillet 2012 à 04:57 +0100, Ronan Lamy a écrit :

...
Le lundi 30 juillet 2012 à 02:00 +0100, Ronan Lamy a écrit :

...
Anyway, I managed to compile (by blanking numpy/distutils/command/__init__.py) and to run the tests. I only see the 2 pickle errors from your latest gist. So that's all good!

And the cause of these errors is that running the test suite somehow corrupts Python's internal cache of bytes objects, causing the following:

...
...
...
b'\x01XXX'[0:1] b'\xbb'

The culprit is test_pickle_string_overwrite() in test_regression.py. The test actually tries to check for that kind of problem, but on Python 3, it only manages to trigger it without detecting it. Here's a simple way to reproduce the issue:

...
...
...
a = numpy.array([1], 'b') b = pickle.loads(pickle.dumps(a)) b[0] = 77 b'\x01 '[0:1] b'M'

Actually, this problem is probably quite old: I can see it in 1.6.1 w/ Python 3.2.3. 3.3 only makes it more visible.

I'll open an issue on GitHub ASAP.

https://github.com/numpy/numpy/issues/370

Ondřej Čertík

6:07 p.m.

On Mon, Jul 30, 2012 at 10:04 AM, Ronan Lamy wrote:

...

Le lundi 30 juillet 2012 à 17:10 +0100, Ronan Lamy a écrit :

...
Le lundi 30 juillet 2012 à 04:57 +0100, Ronan Lamy a écrit :

...
Le lundi 30 juillet 2012 à 02:00 +0100, Ronan Lamy a écrit :

...
Anyway, I managed to compile (by blanking numpy/distutils/command/__init__.py) and to run the tests. I only see the 2 pickle errors from your latest gist. So that's all good!

And the cause of these errors is that running the test suite somehow corrupts Python's internal cache of bytes objects, causing the following:

...
...
...
b'\x01XXX'[0:1] b'\xbb'

The culprit is test_pickle_string_overwrite() in test_regression.py. The test actually tries to check for that kind of problem, but on Python 3, it only manages to trigger it without detecting it. Here's a simple way to reproduce the issue:

...
...
...
a = numpy.array([1], 'b') b = pickle.loads(pickle.dumps(a)) b[0] = 77 b'\x01 '[0:1] b'M'

Actually, this problem is probably quite old: I can see it in 1.6.1 w/ Python 3.2.3. 3.3 only makes it more visible.

I'll open an issue on GitHub ASAP.

https://github.com/numpy/numpy/issues/370

Thanks Ronan, nice work! Since you looked into this -- do you know a way to fix this? (Both NumPy and the test.) Ondrej

Ronan Lamy

31 Jul 31 Jul

midnight

Le lundi 30 juillet 2012 à 11:07 -0700, Ondřej Čertík a écrit :

...

On Mon, Jul 30, 2012 at 10:04 AM, Ronan Lamy wrote:

...
Le lundi 30 juillet 2012 à 17:10 +0100, Ronan Lamy a écrit :

...
Le lundi 30 juillet 2012 à 04:57 +0100, Ronan Lamy a écrit :

...
Le lundi 30 juillet 2012 à 02:00 +0100, Ronan Lamy a écrit :

...
Anyway, I managed to compile (by blanking numpy/distutils/command/__init__.py) and to run the tests. I only see the 2 pickle errors from your latest gist. So that's all good!

And the cause of these errors is that running the test suite somehow corrupts Python's internal cache of bytes objects, causing the following:

...
...
> b'\x01XXX'[0:1] b'\xbb'

The culprit is test_pickle_string_overwrite() in test_regression.py. The test actually tries to check for that kind of problem, but on Python 3, it only manages to trigger it without detecting it. Here's a simple way to reproduce the issue:

...
...
...
a = numpy.array([1], 'b') b = pickle.loads(pickle.dumps(a)) b[0] = 77 b'\x01 '[0:1] b'M'

Actually, this problem is probably quite old: I can see it in 1.6.1 w/ Python 3.2.3. 3.3 only makes it more visible.

I'll open an issue on GitHub ASAP.

https://github.com/numpy/numpy/issues/370

Thanks Ronan, nice work!

Since you looked into this -- do you know a way to fix this? (Both NumPy and the test.)

Pauli found out how to fix the code, so I'll try to send a PR tonight.

Ondřej Čertík

3 Aug 3 Aug

3:03 p.m.

On Mon, Jul 30, 2012 at 5:00 PM, Ronan Lamy wrote:

...

Le lundi 30 juillet 2012 à 11:07 -0700, Ondřej Čertík a écrit :

...
On Mon, Jul 30, 2012 at 10:04 AM, Ronan Lamy wrote:

...
Le lundi 30 juillet 2012 à 17:10 +0100, Ronan Lamy a écrit :

...
Le lundi 30 juillet 2012 à 04:57 +0100, Ronan Lamy a écrit :

...
Le lundi 30 juillet 2012 à 02:00 +0100, Ronan Lamy a écrit :

...
Anyway, I managed to compile (by blanking numpy/distutils/command/__init__.py) and to run the tests. I only see the 2 pickle errors from your latest gist. So that's all good!

And the cause of these errors is that running the test suite somehow corrupts Python's internal cache of bytes objects, causing the following:

...
>> b'\x01XXX'[0:1] b'\xbb'

The culprit is test_pickle_string_overwrite() in test_regression.py. The test actually tries to check for that kind of problem, but on Python 3, it only manages to trigger it without detecting it. Here's a simple way to reproduce the issue:

...
...
> a = numpy.array([1], 'b') > b = pickle.loads(pickle.dumps(a)) > b[0] = 77 > b'\x01 '[0:1] b'M'

Actually, this problem is probably quite old: I can see it in 1.6.1 w/ Python 3.2.3. 3.3 only makes it more visible.

I'll open an issue on GitHub ASAP.

https://github.com/numpy/numpy/issues/370

Thanks Ronan, nice work!

Since you looked into this -- do you know a way to fix this? (Both NumPy and the test.)

Pauli found out how to fix the code, so I'll try to send a PR tonight.

So this PR is now in and the issue is fixed. As far as swapping the unicode issues, I finally understand what is going on and I posted my current understanding into the Python tracker issue (http://bugs.python.org/issue15540) which was recently created for this same issue: http://bugs.python.org/msg167280 but it was determined that it is not a bug in Python so it is closed now. Finally, I have submitted a reworked version of my patch here: https://github.com/numpy/numpy/pull/372 It implements things in a clean way. Ondrej

Ondřej Čertík

4 Aug 4 Aug

6:14 p.m.

On Fri, Aug 3, 2012 at 8:03 AM, Ondřej Čertík wrote:

...

On Mon, Jul 30, 2012 at 5:00 PM, Ronan Lamy wrote:

...
Le lundi 30 juillet 2012 à 11:07 -0700, Ondřej Čertík a écrit :

...
On Mon, Jul 30, 2012 at 10:04 AM, Ronan Lamy wrote:

...
Le lundi 30 juillet 2012 à 17:10 +0100, Ronan Lamy a écrit :

...
Le lundi 30 juillet 2012 à 04:57 +0100, Ronan Lamy a écrit :

...
Le lundi 30 juillet 2012 à 02:00 +0100, Ronan Lamy a écrit :

> > Anyway, I managed to compile (by blanking > numpy/distutils/command/__init__.py) and to run the tests. I only see > the 2 pickle errors from your latest gist. So that's all good!

And the cause of these errors is that running the test suite somehow corrupts Python's internal cache of bytes objects, causing the following: >>> b'\x01XXX'[0:1] b'\xbb'

The culprit is test_pickle_string_overwrite() in test_regression.py. The test actually tries to check for that kind of problem, but on Python 3, it only manages to trigger it without detecting it. Here's a simple way to reproduce the issue:

...
>> a = numpy.array([1], 'b') >> b = pickle.loads(pickle.dumps(a)) >> b[0] = 77 >> b'\x01 '[0:1] b'M'

Actually, this problem is probably quite old: I can see it in 1.6.1 w/ Python 3.2.3. 3.3 only makes it more visible.

I'll open an issue on GitHub ASAP.

https://github.com/numpy/numpy/issues/370

Thanks Ronan, nice work!

Since you looked into this -- do you know a way to fix this? (Both NumPy and the test.)

Pauli found out how to fix the code, so I'll try to send a PR tonight.

So this PR is now in and the issue is fixed.

As far as swapping the unicode issues, I finally understand what is going on and I posted my current understanding into the Python tracker issue (http://bugs.python.org/issue15540) which was recently created for this same issue:

http://bugs.python.org/msg167280

but it was determined that it is not a bug in Python so it is closed now. Finally, I have submitted a reworked version of my patch here:

https://github.com/numpy/numpy/pull/372

It implements things in a clean way.

Final update: the patch is in, so NumPy now passes all tests in Python 3.3. There seems to be a better way to support unicode and that is discussed in another thread. Ondrej

Stefan Krah

29 Jul 29 Jul

9:55 p.m.

Ronan Lamy wrote:

...

ImportError: No module named 'distutils.command.install_clib'

I'm seeing the same with Python 3.3.0b1 (68e2690a471d+) and this patch solves the problem: diff --git a/numpy/distutils/command/__init__.py b/numpy/distutils/command/__init__.py index f8f0884..b9f0d09 100644 --- a/numpy/distutils/command/__init__.py +++ b/numpy/distutils/command/__init__.py @@ -7,13 +7,13 @@ __revision__ = "$Id: __init__.py,v 1.3 2005/05/16 11:08:49 pearu Exp $" distutils_all = [ #'build_py', 'clean', - 'install_clib', 'install_scripts', 'bdist', 'bdist_dumb', 'bdist_wininst', ] +from numpy.distutils.command import install_clib __import__('distutils.command',globals(),locals(),distutils_all) __all__ = ['build', Stefan Krah

Ronan Lamy

30 Jul 30 Jul

12:52 a.m.

Le dimanche 29 juillet 2012 à 23:55 +0200, Stefan Krah a écrit :

...

Ronan Lamy wrote:

...
ImportError: No module named 'distutils.command.install_clib'

I'm seeing the same with Python 3.3.0b1 (68e2690a471d+) and this patch solves the problem:

diff --git a/numpy/distutils/command/__init__.py b/numpy/distutils/command/__init__.py index f8f0884..b9f0d09 100644 --- a/numpy/distutils/command/__init__.py +++ b/numpy/distutils/command/__init__.py @@ -7,13 +7,13 @@ __revision__ = "$Id: __init__.py,v 1.3 2005/05/16 11:08:49 pearu Exp $"

distutils_all = [ #'build_py', 'clean', - 'install_clib', 'install_scripts', 'bdist', 'bdist_dumb', 'bdist_wininst', ]

+from numpy.distutils.command import install_clib __import__('distutils.command',globals(),locals(),distutils_all)

__all__ = ['build',

That does indeed solve the problem, thanks. However, I'm quite sure that 'rm numpy/distutils/command/__init__.py && touch numpy/distutils/command/__init__.py' works just as well - or probably better, in fact, as it allows 'from numpy.distutils.command import *' to run without error.

Stefan Krah

29 Jul 29 Jul

9:20 a.m.

Ond??ej ??ert??k wrote:

...

Why doesn't PyUnicode_FromKindAndData return a subtype of PyUnicodeObject?

http://docs.python.org/dev/c-api/unicode.html#PyUnicode_FromKindAndData

Well, it would need a PyTypeObject * parameter to do that. I agree that many C-API functions would be more useful if they did this. Stefan Krah

Nathaniel Smith

27 Jul 27 Jul

10:24 a.m.

On Fri, Jul 27, 2012 at 9:28 AM, David Cournapeau wrote:

...

On Fri, Jul 27, 2012 at 7:30 AM, Travis Oliphant wrote:

...
Hey all,

I'm wondering who has tried to make NumPy work with Python 3.3. The Unicode handling was significantly improved in Python 3.3 and the array-scalar code (which assumed a certain structure for UnicodeObjects) is not working now.

It would be nice to get 1.7.0 working with Python 3.3 if possible before the release. Anyone interested in tackling that little challenge? If someone has already tried it would be nice to hear your experience.

Given that we're late with 1.7, I would suggest passing this to the next release, unless the fix is simple (just a change of API).

IMO, it's not a regression so it's not a release blocker. Of course we should release the fix whenever it's ready (in 1.7 if it's ready by then, else in 1.7.1), but we shouldn't hold up the release for it. -n

4282

Age (days ago)

4290

Last active (days ago)

List overview

Download

37 comments

8 participants

participants (8)

Christoph Gohlke
David Cournapeau
Nathaniel Smith
Ondřej Čertík
Ralf Gommers
Ronan Lamy
Stefan Krah
Travis Oliphant

Status of NumPy and Python 3.3

Stefan Krah

Stefan Krah

Stefan Krah

Stefan Krah

Stefan Krah

Stefan Krah

Stefan Krah

Stefan Krah

Stefan Krah

tags

participants (8)