using python-ldap under pypy

Hi
I'm trying to run an application under pypy that authenticates user with LDAP.
It is using python-ldap module and it fails to lookup the users. The problem is in python-ldap's c extension code. When it converts the LDAP search query from python format to C, parts of the query are corrupted.
Is python-ldap supposed to work under pypy? How compatible is the python C API between cpython and pypy?
Right now I can't figure out if this is a bug in python-ldap code or an compatibility with Pypy C API.
Regards, Elmir Jagudin

Hi Elmir.
I would say that it should work, however, subtle bugs are a bit expected.
I'm happy to help you debug it, let me know how I can reproduce it.
On Fri, Dec 11, 2015 at 4:07 PM, Elmir Jagudin elmir@unity3d.com wrote:
Hi
I'm trying to run an application under pypy that authenticates user with LDAP.
It is using python-ldap module and it fails to lookup the users. The problem is in python-ldap's c extension code. When it converts the LDAP search query from python format to C, parts of the query are corrupted.
Is python-ldap supposed to work under pypy? How compatible is the python C API between cpython and pypy?
Right now I can't figure out if this is a bug in python-ldap code or an compatibility with Pypy C API.
Regards, Elmir Jagudin
pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev

On Fri, Dec 11, 2015 at 3:39 PM, Maciej Fijalkowski fijall@gmail.com wrote:
Hi Elmir.
I would say that it should work, however, subtle bugs are a bit expected.
Cool! We should try to fix the bug!
I'm happy to help you debug it, let me know how I can reproduce it.
The bug is pretty simple to reproduce, basically doing this query will show the bug:
l = ldap.initialize(SERVER) l.simple_bind() res = l.search_s(BASE_DN, ldap.SCOPE_SUBTREE, FILTER, ["uid", "cn"]) # <-- these string will be mangled
Here is the complete script which shows the bug: https://gist.github.com/elmirjagudin/6d7aadaa1825901ed73d
The error happens in the python-ldap C code that converts ["uid", "cn"] array to char **.
In this file: http://python-ldap.cvs.sourceforge.net/viewvc/python-ldap/python-ldap/Module...
in function attrs_from_List() there is this code (lines 289-290):
289: attrs[i] = PyString_AsString(item); 290: Py_DECREF(item);
On line 289 the assigned string is correct, however after executing line 290, the string will be corrupted.
I have noticed that under cpython, the refcount for 'item' is larger then 1. However under pypy it is always 1, and I guess after decreasing it, the 'item' is freed, and attrs[i] pointer becomes invalid.
I don't know enough about python extension C API to know if this is a problem in python-ldap C code, or in the pypy code. Any help is appreciated!
A general question, does pypy strive to be compatible with the API defined here: https://docs.python.org/2/c-api/ ?
Thanks in advance, Elmir
On Fri, Dec 11, 2015 at 4:07 PM, Elmir Jagudin elmir@unity3d.com wrote:
Hi
I'm trying to run an application under pypy that authenticates user with LDAP.
It is using python-ldap module and it fails to lookup the users. The
problem
is in python-ldap's c extension code. When it converts the LDAP search
query
from python format to C, parts of the query are corrupted.
Is python-ldap supposed to work under pypy? How compatible is the python
C
API between cpython and pypy?
Right now I can't figure out if this is a bug in python-ldap code or an compatibility with Pypy C API.
Regards, Elmir Jagudin
pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev

Hi Elmir,
On Sun, Dec 13, 2015 at 10:19 PM, Elmir Jagudin elmir@unity3d.com wrote:
The error happens in the python-ldap C code that converts ["uid", "cn"] array to char **.
In this file: http://python-ldap.cvs.sourceforge.net/viewvc/python-ldap/python-ldap/Module...
in function attrs_from_List() there is this code (lines 289-290):
289: attrs[i] = PyString_AsString(item); 290: Py_DECREF(item);
On line 289 the assigned string is correct, however after executing line 290, the string will be corrupted.
I have noticed that under cpython, the refcount for 'item' is larger then 1. However under pypy it is always 1, and I guess after decreasing it, the 'item' is freed, and attrs[i] pointer becomes invalid.
Ok. However the sentence "under CPython the refcount for 'item' is larger than 1" is not true in all cases. It is true for simple lists or tuples, but not for more complex types. That means that you can probably get already-freed strings under CPython too. Try for example:
class CustomSeq(object): def __getitem__(self, i): return str(i) # returns a refcount=1 result def __len__(self): return 2
res = l.search_s(BASE_DN, ldap.SCOPE_SUBTREE, FILTER, CustomSeq())
So it means it's really a bug of python-ldap, which just happens to crash more often on PyPy than on CPython. It should be fixed there.
A bientôt,
Armin.

Hi again,
On Mon, Dec 14, 2015 at 10:01 AM, Armin Rigo arigo@tunes.org wrote:
So it means it's really a bug of python-ldap, which just happens to crash more often on PyPy than on CPython. It should be fixed there.
Actually it's a known issue. See the comment line 255:
XXX the strings should live longer than the resulting attrs pointer.
A bientôt,
Armin.

On Mon, Dec 14, 2015 at 10:09 AM, Armin Rigo arigo@tunes.org wrote:
Hi again,
On Mon, Dec 14, 2015 at 10:01 AM, Armin Rigo arigo@tunes.org wrote:
So it means it's really a bug of python-ldap, which just happens to crash more often on PyPy than on CPython. It should be fixed there.
Actually it's a known issue. See the comment line 255:
XXX the strings should live longer than the resulting attrs pointer.
A bientôt,
Armin.
This bug have been fixed in python-ldap package, as of version 2.4.25:
http://python-ldap.cvs.sourceforge.net/viewvc/python-ldap/python-ldap/CHANGE...
Thanks again for info regarding this problem.
/Elmir

great! thanks for letting us know
On Wed, Jan 20, 2016 at 11:05 AM, Elmir Jagudin elmir@unity3d.com wrote:
On Mon, Dec 14, 2015 at 10:09 AM, Armin Rigo arigo@tunes.org wrote:
Hi again,
On Mon, Dec 14, 2015 at 10:01 AM, Armin Rigo arigo@tunes.org wrote:
So it means it's really a bug of python-ldap, which just happens to crash more often on PyPy than on CPython. It should be fixed there.
Actually it's a known issue. See the comment line 255:
XXX the strings should live longer than the resulting attrs pointer.
A bientôt,
Armin.
This bug have been fixed in python-ldap package, as of version 2.4.25:
http://python-ldap.cvs.sourceforge.net/viewvc/python-ldap/python-ldap/CHANGE...
Thanks again for info regarding this problem.
/Elmir
pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev

On Mon, Dec 14, 2015 at 10:01 AM, Armin Rigo arigo@tunes.org wrote:
Hi Elmir,
On Sun, Dec 13, 2015 at 10:19 PM, Elmir Jagudin elmir@unity3d.com wrote:
The error happens in the python-ldap C code that converts ["uid", "cn"] array to char **.
In this file:
http://python-ldap.cvs.sourceforge.net/viewvc/python-ldap/python-ldap/Module...
in function attrs_from_List() there is this code (lines 289-290):
289: attrs[i] = PyString_AsString(item); 290: Py_DECREF(item);
On line 289 the assigned string is correct, however after executing line 290, the string will be corrupted.
I have noticed that under cpython, the refcount for 'item' is larger
then 1.
However under pypy it is always 1, and I guess after decreasing it, the 'item' is freed, and attrs[i] pointer becomes invalid.
Ok. However the sentence "under CPython the refcount for 'item' is larger than 1" is not true in all cases. It is true for simple lists or tuples, but not for more complex types. That means that you can probably get already-freed strings under CPython too. Try for example:
class CustomSeq(object): def __getitem__(self, i): return str(i) # returns a refcount=1 result def __len__(self): return 2
res = l.search_s(BASE_DN, ldap.SCOPE_SUBTREE, FILTER, CustomSeq())
So it means it's really a bug of python-ldap, which just happens to crash more often on PyPy than on CPython. It should be fixed there.
Yepp, you are right. Following version of the code above clearly shows that it's broken under CPython as well:
class CustomSeq(object): def __getitem__(self, i): return str(i) # returns a refcount=1 result def __len__(self): return 20
The resulting query send over network will be wrong.
Thanks for clarification.
/Elmir
participants (3)
-
Armin Rigo
-
Elmir Jagudin
-
Maciej Fijalkowski