Hey all, I'm wondering who has tried to make NumPy work with Python 3.3. The Unicode handling was significantly improved in Python 3.3 and the array-scalar code (which assumed a certain structure for UnicodeObjects) is not working now. It would be nice to get 1.7.0 working with Python 3.3 if possible before the release. Anyone interested in tackling that little challenge? If someone has already tried it would be nice to hear your experience. -Travis
On Fri, Jul 27, 2012 at 7:30 AM, Travis Oliphant
Hey all,
I'm wondering who has tried to make NumPy work with Python 3.3. The Unicode handling was significantly improved in Python 3.3 and the array-scalar code (which assumed a certain structure for UnicodeObjects) is not working now.
It would be nice to get 1.7.0 working with Python 3.3 if possible before the release. Anyone interested in tackling that little challenge? If someone has already tried it would be nice to hear your experience.
Given that we're late with 1.7, I would suggest passing this to the next release, unless the fix is simple (just a change of API). cheers, David
On Fri, Jul 27, 2012 at 9:28 AM, David Cournapeau
On Fri, Jul 27, 2012 at 7:30 AM, Travis Oliphant
wrote: Hey all,
I'm wondering who has tried to make NumPy work with Python 3.3. The Unicode handling was significantly improved in Python 3.3 and the array-scalar code (which assumed a certain structure for UnicodeObjects) is not working now.
It would be nice to get 1.7.0 working with Python 3.3 if possible before the release. Anyone interested in tackling that little challenge? If someone has already tried it would be nice to hear your experience.
Given that we're late with 1.7, I would suggest passing this to the next release, unless the fix is simple (just a change of API).
I took a brief look at it, and from the errors I have seen, one is cosmetic, the other one is a bit more involved (rewriting PyArray_Scalar unicode support). While it is not difficult in nature, the current code has multiple #ifdef of Py_UNICODE_WIDE, meaning it would require multiple configurations on multiple python versions to be tested. I don't think python 3.3 support is critical - people who want to play with bet interpreters can build numpy by themselves from master, so I am -1 on integrating this into 1.7. I may have a fix within tonight for it, though, David
On Fri, Jul 27, 2012 at 10:43 AM, David Cournapeau
On Fri, Jul 27, 2012 at 9:28 AM, David Cournapeau
wrote: On Fri, Jul 27, 2012 at 7:30 AM, Travis Oliphant
wrote: Hey all,
I'm wondering who has tried to make NumPy work with Python 3.3. The Unicode handling was significantly improved in Python 3.3 and the array-scalar code (which assumed a certain structure for UnicodeObjects) is not working now.
It would be nice to get 1.7.0 working with Python 3.3 if possible before the release. Anyone interested in tackling that little challenge? If someone has already tried it would be nice to hear your experience.
Given that we're late with 1.7, I would suggest passing this to the next release, unless the fix is simple (just a change of API).
I took a brief look at it, and from the errors I have seen, one is cosmetic, the other one is a bit more involved (rewriting PyArray_Scalar unicode support). While it is not difficult in nature, the current code has multiple #ifdef of Py_UNICODE_WIDE, meaning it would require multiple configurations on multiple python versions to be tested.
I don't think python 3.3 support is critical - people who want to play with bet interpreters can build numpy by themselves from master, so I am -1 on integrating this into 1.7.
I may have a fix within tonight for it, though,
There are 2 tickets about this: http://projects.scipy.org/numpy/ticket/2145 http://projects.scipy.org/numpy/ticket/1471 Ralf
On Fri, Jul 27, 2012 at 6:47 AM, Ralf Gommers
On Fri, Jul 27, 2012 at 10:43 AM, David Cournapeau
wrote: On Fri, Jul 27, 2012 at 9:28 AM, David Cournapeau
wrote: On Fri, Jul 27, 2012 at 7:30 AM, Travis Oliphant
wrote: Hey all,
I'm wondering who has tried to make NumPy work with Python 3.3. The Unicode handling was significantly improved in Python 3.3 and the array-scalar code (which assumed a certain structure for UnicodeObjects) is not working now.
It would be nice to get 1.7.0 working with Python 3.3 if possible before the release. Anyone interested in tackling that little challenge? If someone has already tried it would be nice to hear your experience.
Given that we're late with 1.7, I would suggest passing this to the next release, unless the fix is simple (just a change of API).
I took a brief look at it, and from the errors I have seen, one is cosmetic, the other one is a bit more involved (rewriting PyArray_Scalar unicode support). While it is not difficult in nature, the current code has multiple #ifdef of Py_UNICODE_WIDE, meaning it would require multiple configurations on multiple python versions to be tested.
I don't think python 3.3 support is critical - people who want to play with bet interpreters can build numpy by themselves from master, so I am -1 on integrating this into 1.7.
I may have a fix within tonight for it, though,
There are 2 tickets about this: http://projects.scipy.org/numpy/ticket/2145 http://projects.scipy.org/numpy/ticket/1471
I am currently working on a PR trying to fix the unicode failures: https://github.com/numpy/numpy/pull/366 It's a work in progress, I am still have some little issues, see the PR for up-to-date details. Ondrej
Ond??ej ??ert??k
I took a brief look at it, and from the errors I have seen, one is cosmetic, the other one is a bit more involved (rewriting PyArray_Scalar unicode support). While it is not difficult in nature, the current code has multiple #ifdef of Py_UNICODE_WIDE, meaning it would require multiple configurations on multiple python versions to be tested.
The cleanest way might be to leave the existing code in place and write completely new and independent code for Python 3.3.
https://github.com/numpy/numpy/pull/366
It's a work in progress, I am still have some little issues, see the PR for up-to-date details.
I'm not a Unicode expert, but I think it's best to avoid Py_UNICODE altogether. What should matter in 3.3 is the maximum character in a Unicode string that determines the kind of the string: PyUnicode_1BYTE_KIND -> Py_UCS1 PyUnicode_2BYTE_KIND -> Py_UCS2 PyUnicode_4BYTE_KIND -> Py_UCS4 So Py_UNICODE_WIDE should not matter as all builds support PyUnicode_4BYTE_KIND. That's why I /think/ it's possible to drop Py_UNICODE altogether. For instance, the line in https://github.com/certik/numpy/commit/d02e36e5c85d5ee444614254643037aafc8de... should probably be: itemsize = PyUnicode_GetLength(robj) * PyUnicode_KIND(robj) Stefan Krah
Stefan,
On Sat, Jul 28, 2012 at 2:36 AM, Stefan Krah
Ond??ej ??ert??k
wrote: I took a brief look at it, and from the errors I have seen, one is cosmetic, the other one is a bit more involved (rewriting PyArray_Scalar unicode support). While it is not difficult in nature, the current code has multiple #ifdef of Py_UNICODE_WIDE, meaning it would require multiple configurations on multiple python versions to be tested.
The cleanest way might be to leave the existing code in place and write completely new and independent code for Python 3.3.
https://github.com/numpy/numpy/pull/366
It's a work in progress, I am still have some little issues, see the PR for up-to-date details.
I'm not a Unicode expert, but I think it's best to avoid Py_UNICODE altogether.
I think so too.
What should matter in 3.3 is the maximum character in a Unicode string that determines the kind of the string:
PyUnicode_1BYTE_KIND -> Py_UCS1 PyUnicode_2BYTE_KIND -> Py_UCS2 PyUnicode_4BYTE_KIND -> Py_UCS4
So Py_UNICODE_WIDE should not matter as all builds support PyUnicode_4BYTE_KIND. That's why I /think/ it's possible to drop Py_UNICODE altogether. For instance, the line in https://github.com/certik/numpy/commit/d02e36e5c85d5ee444614254643037aafc8de... should probably be:
itemsize = PyUnicode_GetLength(robj) * PyUnicode_KIND(robj)
Yes, I think that's it. I've changed it and pushed in the change into the PR.
I am now seeing failures like these:
======================================================================
ERROR: test_rmul (test_defchararray.TestOperations)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_defchararray.py",
line 592, in test_rmul
Ar = np.array([[A[0,0]*r, A[0,1]*r],
File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/defchararray.py",
line 1916, in __getitem__
if issubclass(val.dtype.type, character) and not _len(val) == 0:
AttributeError: 'str' object has no attribute 'dtype'
Here is the code in defchararray.py:
1911 if not _globalvar and self.dtype.char not in 'SUbc':
1912 raise ValueError("Can only create a chararray from
string data.")
1913
1914 def __getitem__(self, obj):
1915 val = ndarray.__getitem__(self, obj)
1916 -> if issubclass(val.dtype.type, character) and not _len(val) == 0:
1917 temp = val.rstrip()
1918 if _len(temp) == 0:
1919 val = ''
1920 else:
1921 val = temp
and here is some debugging info:
(Pdb) p self
(Pdb) p obj
(0, 0)
(Pdb) p val
'abc'
(Pdb) p type(val)
On Sat, Jul 28, 2012 at 7:58 AM, Ondřej Čertík
Here is the code in defchararray.py:
1911 if not _globalvar and self.dtype.char not in 'SUbc': 1912 raise ValueError("Can only create a chararray from string data.") 1913 1914 def __getitem__(self, obj): 1915 val = ndarray.__getitem__(self, obj) 1916 -> if issubclass(val.dtype.type, character) and not _len(val) == 0: 1917 temp = val.rstrip() 1918 if _len(temp) == 0: 1919 val = '' 1920 else: 1921 val = temp
and here is some debugging info:
Python 3.3:
(Pdb) p self (Pdb) p obj (0, 0) (Pdb) p val 'abc' (Pdb) p type(val)
Python 3.2:
(Pdb) p self
chararray([['abc', '123'],
['789', 'xyz']],
dtype='
On Sat, Jul 28, 2012 at 8:04 AM, Ondřej Čertík
On Sat, Jul 28, 2012 at 7:58 AM, Ondřej Čertík
wrote: [...] Here is the code in defchararray.py:
1911 if not _globalvar and self.dtype.char not in 'SUbc': 1912 raise ValueError("Can only create a chararray from string data.") 1913 1914 def __getitem__(self, obj): 1915 val = ndarray.__getitem__(self, obj) 1916 -> if issubclass(val.dtype.type, character) and not _len(val) == 0: 1917 temp = val.rstrip() 1918 if _len(temp) == 0: 1919 val = '' 1920 else: 1921 val = temp
and here is some debugging info:
Python 3.3:
(Pdb) p self (Pdb) p obj (0, 0) (Pdb) p val 'abc' (Pdb) p type(val)
Python 3.2:
(Pdb) p self chararray([['abc', '123'], ['789', 'xyz']], dtype='
So I think there might be some conversion issues int the chararray, that instead of using numpy.str_, it uses Python's str. Weird.
Ok, found this minimal example of the problem. Python 3.3:
from numpy import array a = array(["123", "abc"]) a array(['123', 'abc'], dtype='
Python 3.2:
from numpy import array a = array(["123", "abc"]) a array(['123', 'abc'], dtype='
So at some point, the strings get converted to numpy strings in 3.2, but not in 3.3. Ondrej
Ond??ej ??ert??k
So at some point, the strings get converted to numpy strings in 3.2, but not in 3.3.
PyArray_Scalar() must return a subtype of PyUnicodeObject. I'm boldly assuming that data is in utf-32. If so, then this unoptimized version should work: diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/scalarapi.c index 2e255c0..c134aed 100644 --- a/numpy/core/src/multiarray/scalarapi.c +++ b/numpy/core/src/multiarray/scalarapi.c @@ -643,7 +643,20 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject *base) } #if PY_VERSION_HEX >= 0x03030000 if (type_num == NPY_UNICODE) { - return PyUnicode_FromKindAndData(PyUnicode_4BYTE_KIND, data, itemsize/4); + PyObject *b, *args; + b = PyBytes_FromStringAndSize(data, itemsize); + if (b == NULL) { + return NULL; + } + args = Py_BuildValue("(Os)", b, "utf-32"); + if (args == NULL) { + Py_DECREF(b); + return NULL; + } + obj = type->tp_new(type, args, NULL); + Py_DECREF(b); + Py_DECREF(args); + return obj; } #endif if (type->tp_itemsize != 0) { Stefan Krah
On Sat, Jul 28, 2012 at 11:19 AM, Stefan Krah
Ond??ej ??ert??k
wrote: So at some point, the strings get converted to numpy strings in 3.2, but not in 3.3.
PyArray_Scalar() must return a subtype of PyUnicodeObject. I'm boldly assuming that data is in utf-32. If so, then this unoptimized version should work:
diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/scalarapi.c index 2e255c0..c134aed 100644 --- a/numpy/core/src/multiarray/scalarapi.c +++ b/numpy/core/src/multiarray/scalarapi.c @@ -643,7 +643,20 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject *base) } #if PY_VERSION_HEX >= 0x03030000 if (type_num == NPY_UNICODE) { - return PyUnicode_FromKindAndData(PyUnicode_4BYTE_KIND, data, itemsize/4);
Why doesn't PyUnicode_FromKindAndData return a subtype of PyUnicodeObject? http://docs.python.org/dev/c-api/unicode.html#PyUnicode_FromKindAndData
+ PyObject *b, *args; + b = PyBytes_FromStringAndSize(data, itemsize); + if (b == NULL) { + return NULL; + } + args = Py_BuildValue("(Os)", b, "utf-32"); + if (args == NULL) { + Py_DECREF(b); + return NULL; + } + obj = type->tp_new(type, args, NULL); + Py_DECREF(b); + Py_DECREF(args); + return obj; } #endif if (type->tp_itemsize != 0) {
Nice!! I pushed your patch into the PR, now it works great in Python 3.3. There are still other failures: https://gist.github.com/3194707 But this particular bug is fixed. Thanks for your help! Ondrej
from numpy import array a = array(["abc"]) b = a.newbyteorder() a.dtype
b.dtype
a[0].dtype
b[0].dtype
Many of the failures in
https://gist.github.com/3194707/5696c8d3091b16ba8a9f00a921d512ed02e94d71
are of the type:
======================================================================
FAIL: Check byteorder of single-dimensional objects
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_unicode.py",
line 286, in test_valuesSD
self.assertTrue(ua[0] != ua2[0])
AssertionError: False is not true
and those are caused by the following minimal example:
Python 3.2:
dtype(' a[0] == b[0]
False
a[0]
'abc'
b[0]
'ៀ\udc00埀\udc00韀\udc00' from numpy import array
a = array(["abc"])
b = a.newbyteorder()
a.dtype b.dtype a[0].dtype b[0].dtype Python 3.3:
dtype(' a[0] == b[0]
True
a[0]
'abc'
b[0]
'abc' So somehow the newbyteorder() method doesn't change the dtype of the
elements in our new code.
This method is implemented in numpy/core/src/multiarray/descriptor.c
(I think), but so far I don't see
where the problem could be.
Any ideas?
Ondrej
On Sat, Jul 28, 2012 at 3:04 PM, Ondřej Čertík
Many of the failures in https://gist.github.com/3194707/5696c8d3091b16ba8a9f00a921d512ed02e94d71 are of the type:
====================================================================== FAIL: Check byteorder of single-dimensional objects ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_unicode.py", line 286, in test_valuesSD self.assertTrue(ua[0] != ua2[0]) AssertionError: False is not true
and those are caused by the following minimal example:
Python 3.2:
from numpy import array a = array(["abc"]) b = a.newbyteorder() a.dtype dtype('
U3') a[0].dtype dtype(' Python 3.3:
from numpy import array a = array(["abc"]) b = a.newbyteorder() a.dtype dtype('
U3') a[0].dtype dtype(' So somehow the newbyteorder() method doesn't change the dtype of the elements in our new code. This method is implemented in numpy/core/src/multiarray/descriptor.c (I think), but so far I don't see where the problem could be.
Any ideas?
from numpy import array a = array(["abc"]) b = a.newbyteorder() a.dtype
b.dtype
a[0].dtype
Ok, after some investigating, I think we need to do something along these lines:
diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/s
index c134aed..daf7fc4 100644
--- a/numpy/core/src/multiarray/scalarapi.c
+++ b/numpy/core/src/multiarray/scalarapi.c
@@ -644,7 +644,20 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject *
#if PY_VERSION_HEX >= 0x03030000
if (type_num == NPY_UNICODE) {
PyObject *b, *args;
- b = PyBytes_FromStringAndSize(data, itemsize);
+ if (swap) {
+ char *buffer;
+ buffer = malloc(itemsize);
+ if (buffer == NULL) {
+ PyErr_NoMemory();
+ }
+ memcpy(buffer, data, itemsize);
+ byte_swap_vector(buffer, itemsize, 4);
+ b = PyBytes_FromStringAndSize(buffer, itemsize);
+ // We have to deallocate this later, otherwise we get a segfault...
+ //free(buffer);
+ } else {
+ b = PyBytes_FromStringAndSize(data, itemsize);
+ }
if (b == NULL) {
return NULL;
}
This particular implementation still fails though:
dtype(' b[0].dtype
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
codepoint not in range(0x110000)
a[0] == b[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
codepoint not in range(0x110000)
a[0]
'abc'
b[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3:
codepoint not in range(0x110000) But I think that we simply need to take into account the "swap" flag.
Ondrej
On Sat, Jul 28, 2012 at 3:31 PM, Ondřej Čertík
On Sat, Jul 28, 2012 at 3:04 PM, Ondřej Čertík
wrote: Many of the failures in https://gist.github.com/3194707/5696c8d3091b16ba8a9f00a921d512ed02e94d71 are of the type:
====================================================================== FAIL: Check byteorder of single-dimensional objects ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_unicode.py", line 286, in test_valuesSD self.assertTrue(ua[0] != ua2[0]) AssertionError: False is not true
and those are caused by the following minimal example:
Python 3.2:
from numpy import array a = array(["abc"]) b = a.newbyteorder() a.dtype dtype('
U3') a[0].dtype dtype(' Python 3.3:
from numpy import array a = array(["abc"]) b = a.newbyteorder() a.dtype dtype('
U3') a[0].dtype dtype(' So somehow the newbyteorder() method doesn't change the dtype of the elements in our new code. This method is implemented in numpy/core/src/multiarray/descriptor.c (I think), but so far I don't see where the problem could be.
Any ideas?
Ok, after some investigating, I think we need to do something along these lines:
diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/s index c134aed..daf7fc4 100644 --- a/numpy/core/src/multiarray/scalarapi.c +++ b/numpy/core/src/multiarray/scalarapi.c @@ -644,7 +644,20 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject * #if PY_VERSION_HEX >= 0x03030000 if (type_num == NPY_UNICODE) { PyObject *b, *args; - b = PyBytes_FromStringAndSize(data, itemsize); + if (swap) { + char *buffer; + buffer = malloc(itemsize); + if (buffer == NULL) { + PyErr_NoMemory(); + } + memcpy(buffer, data, itemsize); + byte_swap_vector(buffer, itemsize, 4); + b = PyBytes_FromStringAndSize(buffer, itemsize); + // We have to deallocate this later, otherwise we get a segfault... + //free(buffer); + } else { + b = PyBytes_FromStringAndSize(data, itemsize); + } if (b == NULL) { return NULL; }
This particular implementation still fails though:
from numpy import array a = array(["abc"]) b = a.newbyteorder() a.dtype dtype('
U3') a[0].dtype dtype(' But I think that we simply need to take into account the "swap" flag.
Ok, so first of all, I tried to disable the swapping in Python 3.2: if (swap) { byte_swap_vector(buffer, itemsize >> 2, 4); } And then it behaves *exactly* as in Python 3.3. So I am pretty sure that the problem is right there and something along the lines of my patch above should fix it. I had a few bugs there, here is the correct version: diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/s index c134aed..bed73f7 100644 --- a/numpy/core/src/multiarray/scalarapi.c +++ b/numpy/core/src/multiarray/scalarapi.c @@ -644,7 +644,19 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject * #if PY_VERSION_HEX >= 0x03030000 if (type_num == NPY_UNICODE) { PyObject *b, *args; - b = PyBytes_FromStringAndSize(data, itemsize); + if (swap) { + char *buffer; + buffer = malloc(itemsize); + if (buffer == NULL) { + PyErr_NoMemory(); + } + memcpy(buffer, data, itemsize); + byte_swap_vector(buffer, itemsize >> 2, 4); + b = PyBytes_FromStringAndSize(buffer, itemsize); + free(buffer); + } else { + b = PyBytes_FromStringAndSize(data, itemsize); + } if (b == NULL) { return NULL; } That works well, except that it gives the UnicodeDecodeError:
b[0].dtype NULL Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: codepoint not in range(0x110000)
This error is actually triggered by this line: obj = type->tp_new(type, args, NULL); in the patch by Stefan above. So I think what is happening is that it simply tries to convert it from bytes to a string and fails. That makes great sense. The question is why doesn't it fail in exactly the same way in Python 3.2? I think it's because the conversion check is bypassed somehow. Stefan, I think we need to swap it after the object is created. I am still experimenting with this. Ondrej
On Sat, Jul 28, 2012 at 5:09 PM, Ondřej Čertík
On Sat, Jul 28, 2012 at 3:31 PM, Ondřej Čertík
wrote: On Sat, Jul 28, 2012 at 3:04 PM, Ondřej Čertík
wrote: Many of the failures in https://gist.github.com/3194707/5696c8d3091b16ba8a9f00a921d512ed02e94d71 are of the type:
====================================================================== FAIL: Check byteorder of single-dimensional objects ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_unicode.py", line 286, in test_valuesSD self.assertTrue(ua[0] != ua2[0]) AssertionError: False is not true
and those are caused by the following minimal example:
Python 3.2:
from numpy import array a = array(["abc"]) b = a.newbyteorder() a.dtype dtype('
U3') a[0].dtype dtype(' Python 3.3:
from numpy import array a = array(["abc"]) b = a.newbyteorder() a.dtype dtype('
U3') a[0].dtype dtype(' So somehow the newbyteorder() method doesn't change the dtype of the elements in our new code. This method is implemented in numpy/core/src/multiarray/descriptor.c (I think), but so far I don't see where the problem could be.
Any ideas?
Ok, after some investigating, I think we need to do something along these lines:
diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/s index c134aed..daf7fc4 100644 --- a/numpy/core/src/multiarray/scalarapi.c +++ b/numpy/core/src/multiarray/scalarapi.c @@ -644,7 +644,20 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject * #if PY_VERSION_HEX >= 0x03030000 if (type_num == NPY_UNICODE) { PyObject *b, *args; - b = PyBytes_FromStringAndSize(data, itemsize); + if (swap) { + char *buffer; + buffer = malloc(itemsize); + if (buffer == NULL) { + PyErr_NoMemory(); + } + memcpy(buffer, data, itemsize); + byte_swap_vector(buffer, itemsize, 4); + b = PyBytes_FromStringAndSize(buffer, itemsize); + // We have to deallocate this later, otherwise we get a segfault... + //free(buffer); + } else { + b = PyBytes_FromStringAndSize(data, itemsize); + } if (b == NULL) { return NULL; }
This particular implementation still fails though:
from numpy import array a = array(["abc"]) b = a.newbyteorder() a.dtype dtype('
U3') a[0].dtype dtype(' But I think that we simply need to take into account the "swap" flag.
Ok, so first of all, I tried to disable the swapping in Python 3.2:
if (swap) { byte_swap_vector(buffer, itemsize >> 2, 4); }
And then it behaves *exactly* as in Python 3.3. So I am pretty sure that the problem is right there and something along the lines of my patch above should fix it. I had a few bugs there, here is the correct version:
diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/s index c134aed..bed73f7 100644 --- a/numpy/core/src/multiarray/scalarapi.c +++ b/numpy/core/src/multiarray/scalarapi.c @@ -644,7 +644,19 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject * #if PY_VERSION_HEX >= 0x03030000 if (type_num == NPY_UNICODE) { PyObject *b, *args; - b = PyBytes_FromStringAndSize(data, itemsize); + if (swap) { + char *buffer; + buffer = malloc(itemsize); + if (buffer == NULL) { + PyErr_NoMemory(); + } + memcpy(buffer, data, itemsize); + byte_swap_vector(buffer, itemsize >> 2, 4); + b = PyBytes_FromStringAndSize(buffer, itemsize); + free(buffer); + } else { + b = PyBytes_FromStringAndSize(data, itemsize); + } if (b == NULL) { return NULL; }
That works well, except that it gives the UnicodeDecodeError:
b[0].dtype NULL Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: codepoint not in range(0x110000)
This error is actually triggered by this line:
obj = type->tp_new(type, args, NULL);
in the patch by Stefan above. So I think what is happening is that it simply tries to convert it from bytes to a string and fails. That makes great sense. The question is why doesn't it fail in exactly the same way in Python 3.2? I think it's because the conversion check is bypassed somehow. Stefan, I think we need to swap it after the object is created. I am still experimenting with this.
Well, I simply went to the Python sources and then implemented a solution that works with this patch: https://github.com/certik/numpy/commit/36fcd1327746a3d0ad346ce58ffbe00506e27... So now the PR actually seems to work. The rest of the failures are here: https://gist.github.com/3195520 and they seem to be unrelated. Can somebody please review this PR? https://github.com/numpy/numpy/pull/366 I will squash the commits after it's reviewed (I want to keep the history there for now). Ondrej
On 7/28/2012 6:09 PM, Ondřej Čertík wrote:
On Sat, Jul 28, 2012 at 5:09 PM, Ondřej Čertík
wrote: On Sat, Jul 28, 2012 at 3:31 PM, Ondřej Čertík
wrote: On Sat, Jul 28, 2012 at 3:04 PM, Ondřej Čertík
wrote: Many of the failures in https://gist.github.com/3194707/5696c8d3091b16ba8a9f00a921d512ed02e94d71 are of the type:
====================================================================== FAIL: Check byteorder of single-dimensional objects ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_unicode.py", line 286, in test_valuesSD self.assertTrue(ua[0] != ua2[0]) AssertionError: False is not true
and those are caused by the following minimal example:
Python 3.2:
> from numpy import array > a = array(["abc"]) > b = a.newbyteorder() > a.dtype dtype('
b.dtype dtype('>U3') > a[0].dtype dtype(' b[0].dtype dtype(' a[0] == b[0] False > a[0] 'abc' > b[0] 'ៀ\udc00埀\udc00韀\udc00' Python 3.3:
> from numpy import array > a = array(["abc"]) > b = a.newbyteorder() > a.dtype dtype('
b.dtype dtype('>U3') > a[0].dtype dtype(' b[0].dtype dtype(' a[0] == b[0] True > a[0] 'abc' > b[0] 'abc' So somehow the newbyteorder() method doesn't change the dtype of the elements in our new code. This method is implemented in numpy/core/src/multiarray/descriptor.c (I think), but so far I don't see where the problem could be.
Any ideas?
Ok, after some investigating, I think we need to do something along these lines:
diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/s index c134aed..daf7fc4 100644 --- a/numpy/core/src/multiarray/scalarapi.c +++ b/numpy/core/src/multiarray/scalarapi.c @@ -644,7 +644,20 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject * #if PY_VERSION_HEX >= 0x03030000 if (type_num == NPY_UNICODE) { PyObject *b, *args; - b = PyBytes_FromStringAndSize(data, itemsize); + if (swap) { + char *buffer; + buffer = malloc(itemsize); + if (buffer == NULL) { + PyErr_NoMemory(); + } + memcpy(buffer, data, itemsize); + byte_swap_vector(buffer, itemsize, 4); + b = PyBytes_FromStringAndSize(buffer, itemsize); + // We have to deallocate this later, otherwise we get a segfault... + //free(buffer); + } else { + b = PyBytes_FromStringAndSize(data, itemsize); + } if (b == NULL) { return NULL; }
This particular implementation still fails though:
from numpy import array a = array(["abc"]) b = a.newbyteorder() a.dtype dtype('
U3') a[0].dtype dtype(' But I think that we simply need to take into account the "swap" flag.
Ok, so first of all, I tried to disable the swapping in Python 3.2:
if (swap) { byte_swap_vector(buffer, itemsize >> 2, 4); }
And then it behaves *exactly* as in Python 3.3. So I am pretty sure that the problem is right there and something along the lines of my patch above should fix it. I had a few bugs there, here is the correct version:
diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/s index c134aed..bed73f7 100644 --- a/numpy/core/src/multiarray/scalarapi.c +++ b/numpy/core/src/multiarray/scalarapi.c @@ -644,7 +644,19 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject * #if PY_VERSION_HEX >= 0x03030000 if (type_num == NPY_UNICODE) { PyObject *b, *args; - b = PyBytes_FromStringAndSize(data, itemsize); + if (swap) { + char *buffer; + buffer = malloc(itemsize); + if (buffer == NULL) { + PyErr_NoMemory(); + } + memcpy(buffer, data, itemsize); + byte_swap_vector(buffer, itemsize >> 2, 4); + b = PyBytes_FromStringAndSize(buffer, itemsize); + free(buffer); + } else { + b = PyBytes_FromStringAndSize(data, itemsize); + } if (b == NULL) { return NULL; }
That works well, except that it gives the UnicodeDecodeError:
b[0].dtype NULL Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: codepoint not in range(0x110000)
This error is actually triggered by this line:
obj = type->tp_new(type, args, NULL);
in the patch by Stefan above. So I think what is happening is that it simply tries to convert it from bytes to a string and fails. That makes great sense. The question is why doesn't it fail in exactly the same way in Python 3.2? I think it's because the conversion check is bypassed somehow. Stefan, I think we need to swap it after the object is created. I am still experimenting with this.
Well, I simply went to the Python sources and then implemented a solution that works with this patch:
https://github.com/certik/numpy/commit/36fcd1327746a3d0ad346ce58ffbe00506e27...
So now the PR actually seems to work. The rest of the failures are here:
https://gist.github.com/3195520
and they seem to be unrelated. Can somebody please review this PR?
https://github.com/numpy/numpy/pull/366
I will squash the commits after it's reviewed (I want to keep the history there for now).
Ondrej
Thank you. I backported the PR to numpy 1.6.2 and it works for me on win-amd64-py3.3 with the msvc10 compiler. I get the same 5 test failures of the kind: AssertionError: Items are not equal: ACTUAL: () DESIRED: None Christoph
On 7/28/2012 6:17 PM, Christoph Gohlke wrote:
On 7/28/2012 6:09 PM, Ondřej Čertík wrote:
On Sat, Jul 28, 2012 at 5:09 PM, Ondřej Čertík
wrote: On Sat, Jul 28, 2012 at 3:31 PM, Ondřej Čertík
wrote: On Sat, Jul 28, 2012 at 3:04 PM, Ondřej Čertík
wrote: Many of the failures in https://gist.github.com/3194707/5696c8d3091b16ba8a9f00a921d512ed02e94d71 are of the type:
====================================================================== FAIL: Check byteorder of single-dimensional objects ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/ondrej/py33/lib/python3.3/site-packages/numpy/core/tests/test_unicode.py", line 286, in test_valuesSD self.assertTrue(ua[0] != ua2[0]) AssertionError: False is not true
and those are caused by the following minimal example:
Python 3.2:
>> from numpy import array >> a = array(["abc"]) >> b = a.newbyteorder() >> a.dtype dtype('
> b.dtype dtype('>U3') >> a[0].dtype dtype(' > b[0].dtype dtype(' > a[0] == b[0] False >> a[0] 'abc' >> b[0] 'ៀ\udc00埀\udc00韀\udc00' Python 3.3:
>> from numpy import array >> a = array(["abc"]) >> b = a.newbyteorder() >> a.dtype dtype('
> b.dtype dtype('>U3') >> a[0].dtype dtype(' > b[0].dtype dtype(' > a[0] == b[0] True >> a[0] 'abc' >> b[0] 'abc' So somehow the newbyteorder() method doesn't change the dtype of the elements in our new code. This method is implemented in numpy/core/src/multiarray/descriptor.c (I think), but so far I don't see where the problem could be.
Any ideas?
Ok, after some investigating, I think we need to do something along these lines:
diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/s index c134aed..daf7fc4 100644 --- a/numpy/core/src/multiarray/scalarapi.c +++ b/numpy/core/src/multiarray/scalarapi.c @@ -644,7 +644,20 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject * #if PY_VERSION_HEX >= 0x03030000 if (type_num == NPY_UNICODE) { PyObject *b, *args; - b = PyBytes_FromStringAndSize(data, itemsize); + if (swap) { + char *buffer; + buffer = malloc(itemsize); + if (buffer == NULL) { + PyErr_NoMemory(); + } + memcpy(buffer, data, itemsize); + byte_swap_vector(buffer, itemsize, 4); + b = PyBytes_FromStringAndSize(buffer, itemsize); + // We have to deallocate this later, otherwise we get a segfault... + //free(buffer); + } else { + b = PyBytes_FromStringAndSize(data, itemsize); + } if (b == NULL) { return NULL; }
This particular implementation still fails though:
> from numpy import array > a = array(["abc"]) > b = a.newbyteorder() > a.dtype dtype('
b.dtype dtype('>U3') > a[0].dtype dtype(' b[0].dtype Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: codepoint not in range(0x110000) > a[0] == b[0] Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: codepoint not in range(0x110000) > a[0] 'abc' > b[0] Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: codepoint not in range(0x110000) But I think that we simply need to take into account the "swap" flag.
Ok, so first of all, I tried to disable the swapping in Python 3.2:
if (swap) { byte_swap_vector(buffer, itemsize >> 2, 4); }
And then it behaves *exactly* as in Python 3.3. So I am pretty sure that the problem is right there and something along the lines of my patch above should fix it. I had a few bugs there, here is the correct version:
diff --git a/numpy/core/src/multiarray/scalarapi.c b/numpy/core/src/multiarray/s index c134aed..bed73f7 100644 --- a/numpy/core/src/multiarray/scalarapi.c +++ b/numpy/core/src/multiarray/scalarapi.c @@ -644,7 +644,19 @@ PyArray_Scalar(void *data, PyArray_Descr *descr, PyObject * #if PY_VERSION_HEX >= 0x03030000 if (type_num == NPY_UNICODE) { PyObject *b, *args; - b = PyBytes_FromStringAndSize(data, itemsize); + if (swap) { + char *buffer; + buffer = malloc(itemsize); + if (buffer == NULL) { + PyErr_NoMemory(); + } + memcpy(buffer, data, itemsize); + byte_swap_vector(buffer, itemsize >> 2, 4); + b = PyBytes_FromStringAndSize(buffer, itemsize); + free(buffer); + } else { + b = PyBytes_FromStringAndSize(data, itemsize); + } if (b == NULL) { return NULL; }
That works well, except that it gives the UnicodeDecodeError:
b[0].dtype NULL Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: codepoint not in range(0x110000)
This error is actually triggered by this line:
obj = type->tp_new(type, args, NULL);
in the patch by Stefan above. So I think what is happening is that it simply tries to convert it from bytes to a string and fails. That makes great sense. The question is why doesn't it fail in exactly the same way in Python 3.2? I think it's because the conversion check is bypassed somehow. Stefan, I think we need to swap it after the object is created. I am still experimenting with this.
Well, I simply went to the Python sources and then implemented a solution that works with this patch:
https://github.com/certik/numpy/commit/36fcd1327746a3d0ad346ce58ffbe00506e27...
So now the PR actually seems to work. The rest of the failures are here:
https://gist.github.com/3195520
and they seem to be unrelated. Can somebody please review this PR?
https://github.com/numpy/numpy/pull/366
I will squash the commits after it's reviewed (I want to keep the history there for now).
Ondrej
Thank you. I backported the PR to numpy 1.6.2 and it works for me on win-amd64-py3.3 with the msvc10 compiler. I get the same 5 test failures of the kind:
AssertionError: Items are not equal: ACTUAL: () DESIRED: None
Christoph
Pull request #367 should fix the NewBufferProtocol test failures. https://github.com/numpy/numpy/pull/367 Christoph
Ond??ej ??ert??k
Well, I simply went to the Python sources and then implemented a solution that works with this patch:
https://github.com/certik/numpy/commit/36fcd1327746a3d0ad346ce58ffbe00506e27...
Nice! I hit the same problem yesterday: unicode_new() does not accept byte-swapped input with an encoding, since the input is not valid. But your solution circumvents the validation. I'm not sure what the use case is for byte-swapped (invalid?) unicode strings, but the approach looks good to me in the sense that it does the same thing as the Py_UNICODE_WIDE path in 3.2. In PyArray_Scalar() I only have these comments, two of which are stylistic: - I think the 'size' parameter in PyUnicode_New() refers to the number of code points (UCS4 in this case), so: PyUnicode_New(itemsize >> 2, max_char) - The 'b' variable could be renamed to 'u' now. - PyArray_Scalar() is beginning to look a little crowded. Perhaps the whole PY_VERSION_HEX >= 0x03030000 block could go into a separate function such as: NPY_NO_EXPORT PyObject * get_unicode_scalar_3_3(PyTypeObject *type, void *data, Py_ssize_t itemsize, int swap); Then there's another problem in numpy.test() if Python 3.3 is compiled --with-pydebug: .python3.3: numpy/core/src/multiarray/common.c:161: PyArray_DTypeFromObjectHelper: Assertion `((((((PyObject*)(temp))->ob_type))->tp_flags & ((1L<<27))) != 0)' failed. Aborted Stefan Krah
Stefan Krah
Then there's another problem in numpy.test() if Python 3.3 is compiled --with-pydebug:
.python3.3: numpy/core/src/multiarray/common.c:161: PyArray_DTypeFromObjectHelper: Assertion `((((((PyObject*)(temp))->ob_type))->tp_flags & ((1L<<27))) != 0)' failed. Aborted
This also occurs with Python 3.2, so it's unrelated to the Unicode changes: http://projects.scipy.org/numpy/ticket/2193 Stefan Krah
Stefan Krah
.python3.3: numpy/core/src/multiarray/common.c:161: PyArray_DTypeFromObjectHelper: Assertion `((((((PyObject*)(temp))->ob_type))->tp_flags & ((1L<<27))) != 0)' failed. Aborted
This also occurs with Python 3.2, so it's unrelated to the Unicode changes:
I've uploaded a patch for the issue. Stefan Krah
On Sun, Jul 29, 2012 at 3:40 AM, Stefan Krah
Ond??ej ??ert??k
wrote: Well, I simply went to the Python sources and then implemented a solution that works with this patch:
https://github.com/certik/numpy/commit/36fcd1327746a3d0ad346ce58ffbe00506e27...
Nice! I hit the same problem yesterday: unicode_new() does not accept byte-swapped input with an encoding, since the input is not valid. But your solution circumvents the validation.
I'm not sure what the use case is for byte-swapped (invalid?) unicode strings, but the approach looks good to me in the sense that it does the same thing as the Py_UNICODE_WIDE path in 3.2.
In PyArray_Scalar() I only have these comments, two of which are stylistic:
- I think the 'size' parameter in PyUnicode_New() refers to the number of code points (UCS4 in this case), so:
PyUnicode_New(itemsize >> 2, max_char)
Right. Done.
- The 'b' variable could be renamed to 'u' now.
Done.
- PyArray_Scalar() is beginning to look a little crowded. Perhaps the whole PY_VERSION_HEX >= 0x03030000 block could go into a separate function such as:
NPY_NO_EXPORT PyObject * get_unicode_scalar_3_3(PyTypeObject *type, void *data, Py_ssize_t itemsize, int swap);
I didn't do this, as I think the function is fine as it is. If further refactoring is needed, then one should probably create 3 functions, one for 3.3, one for <3.3-wide and one for <3.3-narrow. I've also rebased and squashed the commits, so now it is ready to be merged: https://github.com/numpy/numpy/pull/366 Thanks Stefan for your help. Can somebody with push access please review it? Ondrej
Ond??ej ??ert??k
from numpy import array [206376 refs] a = array(["abc"]) [206382 refs] b = a.newbyteorder() [206387 refs] b
Using Python 3.3 compiled --with-pydebug it appears to be impossible to fool the new Unicode implementation with byte-swapped data: Apply the patch from: http://projects.scipy.org/numpy/ticket/2193 Then: Python 3.3.0b1 (default:68e2690a471d+, Jul 29 2012, 15:28:41) [GCC 4.4.3] on linux Type "help", "copyright", "credits" or "license" for more information. python3.3: Objects/unicodeobject.c:401: _PyUnicode_CheckConsistency: Assertion `maxchar <= 0x10ffff' failed. Program received signal SIGABRT, Aborted. 0x00007ffff71e6a75 in *__GI_raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 64 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory. in ../nptl/sysdeps/unix/sysv/linux/raise.c (gdb) This should be expected since the byte-swapped strings aren't valid. Stefan Krah
On Sun, Jul 29, 2012 at 6:56 AM, Stefan Krah
Ond??ej ??ert??k
wrote: Using Python 3.3 compiled --with-pydebug it appears to be impossible to fool the new Unicode implementation with byte-swapped data:
Apply the patch from:
http://projects.scipy.org/numpy/ticket/2193
Then:
from numpy import array [206376 refs] a = array(["abc"]) [206382 refs] b = a.newbyteorder() [206387 refs] b
Python 3.3.0b1 (default:68e2690a471d+, Jul 29 2012, 15:28:41) [GCC 4.4.3] on linux Type "help", "copyright", "credits" or "license" for more information. python3.3: Objects/unicodeobject.c:401: _PyUnicode_CheckConsistency: Assertion `maxchar <= 0x10ffff' failed.
Program received signal SIGABRT, Aborted. 0x00007ffff71e6a75 in *__GI_raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 64 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory. in ../nptl/sysdeps/unix/sysv/linux/raise.c (gdb)
This should be expected since the byte-swapped strings aren't valid.
Exactly, I am aware that my solution is a hack. So is the Python 3.2 solution, except that Python 3.2 doesn't seem to have the _PyUnicode_CheckConsistency() function, so no checks are done. As such, I think that my PR simply extends the numpy approach to Python 3.3. A separate issue is that the swapping thing is a hack -- Travis, what is the purpose of the newbyteorder() and the need to swap the internals of the unicode object? Ondrej
Ond??ej ??ert??k
This should be expected since the byte-swapped strings aren't valid.
Exactly, I am aware that my solution is a hack. So is the Python 3.2 solution, except that Python 3.2 doesn't seem to have the _PyUnicode_CheckConsistency() function, so no checks are done. As such, I think that my PR simply extends the numpy approach to Python 3.3.
Absolutely, I also think that using invalid Unicode strings in 3.2 looks kind of hackish. -- Nothing wrong with your 3.3 implementation, it's the general concept that I don't understand. Stefan Krah
Le samedi 28 juillet 2012 à 18:09 -0700, Ondřej Čertík a écrit :
So now the PR actually seems to work. The rest of the failures are here:
I wanted to have a look at the import errors in your previous gist. How did you get rid of them? I can't even install numpy on 3.3 as setup.py chokes on 'import numpy.distutils.core': (py33)ronan@ronan-desktop:~/dev/numpy$ python setup.py install Converting to Python3 via 2to3... Running from numpy source directory. /home/ronan/dev/numpy/py33/lib/python3.3/distutils/__init__.py:16: ResourceWarning: unclosed file <_io.TextIOWrapper name='/usr/local/lib/python3.3/distutils/__init__.py' mode='r' encoding='UTF-8'> exec(open(os.path.join(distutils_path, '__init__.py')).read()) Traceback (most recent call last): File "setup.py", line 214, in <module> setup_package() File "setup.py", line 191, in setup_package from numpy.distutils.core import setup File "/home/ronan/dev/numpy/build/py3k/numpy/distutils/core.py", line 25, in <module> from numpy.distutils.command import config, config_compiler, \ File "/home/ronan/dev/numpy/build/py3k/numpy/distutils/command/__init__.py", line 17, in <module> __import__('distutils.command',globals(),locals(),distutils_all) ImportError: No module named 'distutils.command.install_clib' Actually, I don't even understand how this __import__() call can work on earlier versions, nor what it's trying to achieve.
Hi Ronan!
On Sun, Jul 29, 2012 at 2:27 PM, Ronan Lamy
Le samedi 28 juillet 2012 à 18:09 -0700, Ondřej Čertík a écrit :
So now the PR actually seems to work. The rest of the failures are here:
I wanted to have a look at the import errors in your previous gist. How did you get rid of them? I can't even install numpy on 3.3 as setup.py
Do you mean this gist: https://gist.github.com/3194707/482382fb6fd6f0d756128d97ea6c892ddb31fff9 ? I have incorrectly run the tests from the wrong directory and numpy was picking up the wrong files to import --- I think either from the numpy directory directly (there is a check for this though), or from numpy/core or something, I don't remember anymore. So I then run the tests from /tmp and posted the correct result into the same gist as a new commit: https://gist.github.com/3194707/5696c8d3091b16ba8a9f00a921d512ed02e94d71
chokes on 'import numpy.distutils.core':
(py33)ronan@ronan-desktop:~/dev/numpy$ python setup.py install Converting to Python3 via 2to3... Running from numpy source directory. /home/ronan/dev/numpy/py33/lib/python3.3/distutils/__init__.py:16: ResourceWarning: unclosed file <_io.TextIOWrapper name='/usr/local/lib/python3.3/distutils/__init__.py' mode='r' encoding='UTF-8'> exec(open(os.path.join(distutils_path, '__init__.py')).read()) Traceback (most recent call last): File "setup.py", line 214, in <module> setup_package() File "setup.py", line 191, in setup_package from numpy.distutils.core import setup File "/home/ronan/dev/numpy/build/py3k/numpy/distutils/core.py", line 25, in <module> from numpy.distutils.command import config, config_compiler, \ File "/home/ronan/dev/numpy/build/py3k/numpy/distutils/command/__init__.py", line 17, in <module> __import__('distutils.command',globals(),locals(),distutils_all) ImportError: No module named 'distutils.command.install_clib'
Actually, I don't even understand how this __import__() call can work on earlier versions, nor what it's trying to achieve.
That's weird, I've never seen this error before. Try to install numpy using your regular Python like this: python setup.py install --prefix /tmp let's say. If it works, then something is wrong with your Python 3.3 installation. If you want to reproduce my setup, checkout my repo: https://github.com/certik/python-3.3 and from inside it, run: SPKG_LOCAL=`pwd`/xx MAKEFLAGS="-j4" sh spkg-install (adjust the "-j4" flag, or remove it). You need a few packages installed like zlib1g-dev and so on. Then install virtualenv by downloading the tar.gz and from inside it doing "/path/to/my/python-3.3/xx/bin/python3.3 setup.py install". Add the file /path/to/my/python-3.3/xx/bin/virtualenv-3.3 into your $PATH. Then: rm -rf $HOME/py33 virtualenv-3.3 $HOME/py33 . $HOME/py33/bin/activate go to your numpy directory and do "python setup.py install". To run tests, you also need to: TMPDIR=/tmp/numpy-env rm -rf $TMPDIR mkdir $TMPDIR cd $TMPDIR tar xzf $tarballs/nose-1.1.2.tar.gz cd nose-1.1.2 python setup.py install using the virtualenv environment. When I tried to install nose into the python installation in python-3.3./xx, then it failed... Ondrej
Le dimanche 29 juillet 2012 à 14:45 -0700, Ondřej Čertík a écrit :
Hi Ronan!
On Sun, Jul 29, 2012 at 2:27 PM, Ronan Lamy
wrote: Le samedi 28 juillet 2012 à 18:09 -0700, Ondřej Čertík a écrit :
So now the PR actually seems to work. The rest of the failures are here:
I wanted to have a look at the import errors in your previous gist. How did you get rid of them? I can't even install numpy on 3.3 as setup.py
Do you mean this gist:
https://gist.github.com/3194707/482382fb6fd6f0d756128d97ea6c892ddb31fff9
? I have incorrectly run the tests from the wrong directory and numpy was picking up the wrong files to import --- I think either from the numpy directory directly (there is a check for this though), or from numpy/core or something, I don't remember anymore. So I then run the tests from /tmp and posted the correct result into the same gist as a new commit:
https://gist.github.com/3194707/5696c8d3091b16ba8a9f00a921d512ed02e94d71
Ah, OK. False alarm, then. I'm on the lookout for import errors with Python 3.3, as the import system has been completely rewritten and anything that relied on undocumented behaviour is likely to break.
chokes on 'import numpy.distutils.core':
(py33)ronan@ronan-desktop:~/dev/numpy$ python setup.py install Converting to Python3 via 2to3... Running from numpy source directory. /home/ronan/dev/numpy/py33/lib/python3.3/distutils/__init__.py:16: ResourceWarning: unclosed file <_io.TextIOWrapper name='/usr/local/lib/python3.3/distutils/__init__.py' mode='r' encoding='UTF-8'> exec(open(os.path.join(distutils_path, '__init__.py')).read()) Traceback (most recent call last): File "setup.py", line 214, in <module> setup_package() File "setup.py", line 191, in setup_package from numpy.distutils.core import setup File "/home/ronan/dev/numpy/build/py3k/numpy/distutils/core.py", line 25, in <module> from numpy.distutils.command import config, config_compiler, \ File "/home/ronan/dev/numpy/build/py3k/numpy/distutils/command/__init__.py", line 17, in <module> __import__('distutils.command',globals(),locals(),distutils_all) ImportError: No module named 'distutils.command.install_clib'
Actually, I don't even understand how this __import__() call can work on earlier versions, nor what it's trying to achieve.
That's weird, I've never seen this error before. Try to install numpy using your regular Python like this:
python setup.py install --prefix /tmp
let's say. If it works, then something is wrong with your Python 3.3
I simply used a virtualenv (you might need to get the latest from PyPI), roughly as follows: virtualenv -p python3.3 py33 py33/bin/python setup.py install It worked fine with 3.2 and 2.7, but not with 3.3.
installation. If you want to reproduce my setup, checkout my repo:
https://github.com/certik/python-3.3
and from inside it, run:
SPKG_LOCAL=`pwd`/xx MAKEFLAGS="-j4" sh spkg-install
(adjust the "-j4" flag, or remove it). You need a few packages installed like zlib1g-dev and so on. Then install virtualenv by downloading the tar.gz and from inside it doing "/path/to/my/python-3.3/xx/bin/python3.3 setup.py install". Add the file /path/to/my/python-3.3/xx/bin/virtualenv-3.3 into your $PATH.
Then:
rm -rf $HOME/py33 virtualenv-3.3 $HOME/py33 . $HOME/py33/bin/activate
go to your numpy directory and do "python setup.py install". To run tests, you also need to:
TMPDIR=/tmp/numpy-env rm -rf $TMPDIR mkdir $TMPDIR cd $TMPDIR tar xzf $tarballs/nose-1.1.2.tar.gz cd nose-1.1.2 python setup.py install
using the virtualenv environment. When I tried to install nose into the python installation in python-3.3./xx, then it failed...
Installing nose from a git checkout works fine for me. Maybe nose-1.1.2 isn't really compatible with Python 3.3? Anyway, I managed to compile (by blanking numpy/distutils/command/__init__.py) and to run the tests. I only see the 2 pickle errors from your latest gist. So that's all good!
Le lundi 30 juillet 2012 à 02:00 +0100, Ronan Lamy a écrit :
Anyway, I managed to compile (by blanking numpy/distutils/command/__init__.py) and to run the tests. I only see the 2 pickle errors from your latest gist. So that's all good!
And the cause of these errors is that running the test suite somehow corrupts Python's internal cache of bytes objects, causing the following:
b'\x01XXX'[0:1] b'\xbb'
Le lundi 30 juillet 2012 à 04:57 +0100, Ronan Lamy a écrit :
Le lundi 30 juillet 2012 à 02:00 +0100, Ronan Lamy a écrit :
Anyway, I managed to compile (by blanking numpy/distutils/command/__init__.py) and to run the tests. I only see the 2 pickle errors from your latest gist. So that's all good!
And the cause of these errors is that running the test suite somehow corrupts Python's internal cache of bytes objects, causing the following:
b'\x01XXX'[0:1] b'\xbb'
The culprit is test_pickle_string_overwrite() in test_regression.py. The test actually tries to check for that kind of problem, but on Python 3, it only manages to trigger it without detecting it. Here's a simple way to reproduce the issue:
a = numpy.array([1], 'b') b = pickle.loads(pickle.dumps(a)) b[0] = 77 b'\x01 '[0:1] b'M'
Actually, this problem is probably quite old: I can see it in 1.6.1 w/ Python 3.2.3. 3.3 only makes it more visible. I'll open an issue on GitHub ASAP.
Le lundi 30 juillet 2012 à 17:10 +0100, Ronan Lamy a écrit :
Le lundi 30 juillet 2012 à 04:57 +0100, Ronan Lamy a écrit :
Le lundi 30 juillet 2012 à 02:00 +0100, Ronan Lamy a écrit :
Anyway, I managed to compile (by blanking numpy/distutils/command/__init__.py) and to run the tests. I only see the 2 pickle errors from your latest gist. So that's all good!
And the cause of these errors is that running the test suite somehow corrupts Python's internal cache of bytes objects, causing the following:
b'\x01XXX'[0:1] b'\xbb'
The culprit is test_pickle_string_overwrite() in test_regression.py. The test actually tries to check for that kind of problem, but on Python 3, it only manages to trigger it without detecting it. Here's a simple way to reproduce the issue:
a = numpy.array([1], 'b') b = pickle.loads(pickle.dumps(a)) b[0] = 77 b'\x01 '[0:1] b'M'
Actually, this problem is probably quite old: I can see it in 1.6.1 w/ Python 3.2.3. 3.3 only makes it more visible.
I'll open an issue on GitHub ASAP.
On Mon, Jul 30, 2012 at 10:04 AM, Ronan Lamy
Le lundi 30 juillet 2012 à 17:10 +0100, Ronan Lamy a écrit :
Le lundi 30 juillet 2012 à 04:57 +0100, Ronan Lamy a écrit :
Le lundi 30 juillet 2012 à 02:00 +0100, Ronan Lamy a écrit :
Anyway, I managed to compile (by blanking numpy/distutils/command/__init__.py) and to run the tests. I only see the 2 pickle errors from your latest gist. So that's all good!
And the cause of these errors is that running the test suite somehow corrupts Python's internal cache of bytes objects, causing the following:
b'\x01XXX'[0:1] b'\xbb'
The culprit is test_pickle_string_overwrite() in test_regression.py. The test actually tries to check for that kind of problem, but on Python 3, it only manages to trigger it without detecting it. Here's a simple way to reproduce the issue:
a = numpy.array([1], 'b') b = pickle.loads(pickle.dumps(a)) b[0] = 77 b'\x01 '[0:1] b'M'
Actually, this problem is probably quite old: I can see it in 1.6.1 w/ Python 3.2.3. 3.3 only makes it more visible.
I'll open an issue on GitHub ASAP.
Thanks Ronan, nice work! Since you looked into this -- do you know a way to fix this? (Both NumPy and the test.) Ondrej
Le lundi 30 juillet 2012 à 11:07 -0700, Ondřej Čertík a écrit :
On Mon, Jul 30, 2012 at 10:04 AM, Ronan Lamy
wrote: Le lundi 30 juillet 2012 à 17:10 +0100, Ronan Lamy a écrit :
Le lundi 30 juillet 2012 à 04:57 +0100, Ronan Lamy a écrit :
Le lundi 30 juillet 2012 à 02:00 +0100, Ronan Lamy a écrit :
Anyway, I managed to compile (by blanking numpy/distutils/command/__init__.py) and to run the tests. I only see the 2 pickle errors from your latest gist. So that's all good!
And the cause of these errors is that running the test suite somehow corrupts Python's internal cache of bytes objects, causing the following:
> b'\x01XXX'[0:1] b'\xbb'
The culprit is test_pickle_string_overwrite() in test_regression.py. The test actually tries to check for that kind of problem, but on Python 3, it only manages to trigger it without detecting it. Here's a simple way to reproduce the issue:
a = numpy.array([1], 'b') b = pickle.loads(pickle.dumps(a)) b[0] = 77 b'\x01 '[0:1] b'M'
Actually, this problem is probably quite old: I can see it in 1.6.1 w/ Python 3.2.3. 3.3 only makes it more visible.
I'll open an issue on GitHub ASAP.
Thanks Ronan, nice work!
Since you looked into this -- do you know a way to fix this? (Both NumPy and the test.)
Pauli found out how to fix the code, so I'll try to send a PR tonight.
On Mon, Jul 30, 2012 at 5:00 PM, Ronan Lamy
Le lundi 30 juillet 2012 à 11:07 -0700, Ondřej Čertík a écrit :
On Mon, Jul 30, 2012 at 10:04 AM, Ronan Lamy
wrote: Le lundi 30 juillet 2012 à 17:10 +0100, Ronan Lamy a écrit :
Le lundi 30 juillet 2012 à 04:57 +0100, Ronan Lamy a écrit :
Le lundi 30 juillet 2012 à 02:00 +0100, Ronan Lamy a écrit :
Anyway, I managed to compile (by blanking numpy/distutils/command/__init__.py) and to run the tests. I only see the 2 pickle errors from your latest gist. So that's all good!
And the cause of these errors is that running the test suite somehow corrupts Python's internal cache of bytes objects, causing the following:
>> b'\x01XXX'[0:1] b'\xbb'
The culprit is test_pickle_string_overwrite() in test_regression.py. The test actually tries to check for that kind of problem, but on Python 3, it only manages to trigger it without detecting it. Here's a simple way to reproduce the issue:
> a = numpy.array([1], 'b') > b = pickle.loads(pickle.dumps(a)) > b[0] = 77 > b'\x01 '[0:1] b'M'
Actually, this problem is probably quite old: I can see it in 1.6.1 w/ Python 3.2.3. 3.3 only makes it more visible.
I'll open an issue on GitHub ASAP.
Thanks Ronan, nice work!
Since you looked into this -- do you know a way to fix this? (Both NumPy and the test.)
Pauli found out how to fix the code, so I'll try to send a PR tonight.
So this PR is now in and the issue is fixed. As far as swapping the unicode issues, I finally understand what is going on and I posted my current understanding into the Python tracker issue (http://bugs.python.org/issue15540) which was recently created for this same issue: http://bugs.python.org/msg167280 but it was determined that it is not a bug in Python so it is closed now. Finally, I have submitted a reworked version of my patch here: https://github.com/numpy/numpy/pull/372 It implements things in a clean way. Ondrej
On Fri, Aug 3, 2012 at 8:03 AM, Ondřej Čertík
On Mon, Jul 30, 2012 at 5:00 PM, Ronan Lamy
wrote: Le lundi 30 juillet 2012 à 11:07 -0700, Ondřej Čertík a écrit :
On Mon, Jul 30, 2012 at 10:04 AM, Ronan Lamy
wrote: Le lundi 30 juillet 2012 à 17:10 +0100, Ronan Lamy a écrit :
Le lundi 30 juillet 2012 à 04:57 +0100, Ronan Lamy a écrit :
Le lundi 30 juillet 2012 à 02:00 +0100, Ronan Lamy a écrit :
> > Anyway, I managed to compile (by blanking > numpy/distutils/command/__init__.py) and to run the tests. I only see > the 2 pickle errors from your latest gist. So that's all good!
And the cause of these errors is that running the test suite somehow corrupts Python's internal cache of bytes objects, causing the following: >>> b'\x01XXX'[0:1] b'\xbb'
The culprit is test_pickle_string_overwrite() in test_regression.py. The test actually tries to check for that kind of problem, but on Python 3, it only manages to trigger it without detecting it. Here's a simple way to reproduce the issue:
>> a = numpy.array([1], 'b') >> b = pickle.loads(pickle.dumps(a)) >> b[0] = 77 >> b'\x01 '[0:1] b'M'
Actually, this problem is probably quite old: I can see it in 1.6.1 w/ Python 3.2.3. 3.3 only makes it more visible.
I'll open an issue on GitHub ASAP.
Thanks Ronan, nice work!
Since you looked into this -- do you know a way to fix this? (Both NumPy and the test.)
Pauli found out how to fix the code, so I'll try to send a PR tonight.
So this PR is now in and the issue is fixed.
As far as swapping the unicode issues, I finally understand what is going on and I posted my current understanding into the Python tracker issue (http://bugs.python.org/issue15540) which was recently created for this same issue:
http://bugs.python.org/msg167280
but it was determined that it is not a bug in Python so it is closed now. Finally, I have submitted a reworked version of my patch here:
https://github.com/numpy/numpy/pull/372
It implements things in a clean way.
Final update: the patch is in, so NumPy now passes all tests in Python 3.3. There seems to be a better way to support unicode and that is discussed in another thread. Ondrej
Ronan Lamy
ImportError: No module named 'distutils.command.install_clib'
I'm seeing the same with Python 3.3.0b1 (68e2690a471d+) and this patch solves the problem: diff --git a/numpy/distutils/command/__init__.py b/numpy/distutils/command/__init__.py index f8f0884..b9f0d09 100644 --- a/numpy/distutils/command/__init__.py +++ b/numpy/distutils/command/__init__.py @@ -7,13 +7,13 @@ __revision__ = "$Id: __init__.py,v 1.3 2005/05/16 11:08:49 pearu Exp $" distutils_all = [ #'build_py', 'clean', - 'install_clib', 'install_scripts', 'bdist', 'bdist_dumb', 'bdist_wininst', ] +from numpy.distutils.command import install_clib __import__('distutils.command',globals(),locals(),distutils_all) __all__ = ['build', Stefan Krah
Le dimanche 29 juillet 2012 à 23:55 +0200, Stefan Krah a écrit :
Ronan Lamy
wrote: ImportError: No module named 'distutils.command.install_clib'
I'm seeing the same with Python 3.3.0b1 (68e2690a471d+) and this patch solves the problem:
diff --git a/numpy/distutils/command/__init__.py b/numpy/distutils/command/__init__.py index f8f0884..b9f0d09 100644 --- a/numpy/distutils/command/__init__.py +++ b/numpy/distutils/command/__init__.py @@ -7,13 +7,13 @@ __revision__ = "$Id: __init__.py,v 1.3 2005/05/16 11:08:49 pearu Exp $"
distutils_all = [ #'build_py', 'clean', - 'install_clib', 'install_scripts', 'bdist', 'bdist_dumb', 'bdist_wininst', ]
+from numpy.distutils.command import install_clib __import__('distutils.command',globals(),locals(),distutils_all)
__all__ = ['build',
That does indeed solve the problem, thanks. However, I'm quite sure that 'rm numpy/distutils/command/__init__.py && touch numpy/distutils/command/__init__.py' works just as well - or probably better, in fact, as it allows 'from numpy.distutils.command import *' to run without error.
Ond??ej ??ert??k
Why doesn't PyUnicode_FromKindAndData return a subtype of PyUnicodeObject?
http://docs.python.org/dev/c-api/unicode.html#PyUnicode_FromKindAndData
Well, it would need a PyTypeObject * parameter to do that. I agree that many C-API functions would be more useful if they did this. Stefan Krah
On Fri, Jul 27, 2012 at 9:28 AM, David Cournapeau
On Fri, Jul 27, 2012 at 7:30 AM, Travis Oliphant
wrote: Hey all,
I'm wondering who has tried to make NumPy work with Python 3.3. The Unicode handling was significantly improved in Python 3.3 and the array-scalar code (which assumed a certain structure for UnicodeObjects) is not working now.
It would be nice to get 1.7.0 working with Python 3.3 if possible before the release. Anyone interested in tackling that little challenge? If someone has already tried it would be nice to hear your experience.
Given that we're late with 1.7, I would suggest passing this to the next release, unless the fix is simple (just a change of API).
IMO, it's not a regression so it's not a release blocker. Of course we should release the fix whenever it's ready (in 1.7 if it's ready by then, else in 1.7.1), but we shouldn't hold up the release for it. -n
participants (8)
-
Christoph Gohlke
-
David Cournapeau
-
Nathaniel Smith
-
Ondřej Čertík
-
Ralf Gommers
-
Ronan Lamy
-
Stefan Krah
-
Travis Oliphant