[Python-Dev] PyUnicode_EncodeDecimal

Victor Stinner victor.stinner at haypocalc.com
Tue Nov 22 02:02:05 CET 2011


Le lundi 21 novembre 2011 21:39:53, Victor Stinner a écrit :
> I'm trying to rewrite PyUnicode_EncodeDecimal() to upgrade it to the new
> Unicode API. The problem is that the function is not accessible in Python
> nor tested.

I added tests for this function in Python 2.7, 3.2 and 3.3.

> PyUnicode_EncodeDecimal() goes into an unlimited loop if there is more than
> one unencodable character. It's a known bug and there is a patch:
> http://bugs.python.org/issue13093

I fixed this issue. I was wrong: it was not possible to DoS Python, the bug was 
not an unlimited loop (but there was a bug on error handling).

> PyUnicode_EncodeDecimal() requires an output buffer and it has no argument
> for the size of the output buffer. It is unsafe: it leads to buffer
> overflow if the buffer is too small.

This function is broken by design if an error handler is specified: the caller 
cannot know the size of the output buffer, whereas the caller has to allocate 
this buffer.

I propose to raise an error if an error handler (different than "strict") is 
specified) and do this change in Python 2.7, 3.2 and 3.3.

In Python 2.7 code base, PyUnicode_EncodeDecimal() is always called with 
errors=NULL. In Python 3.x, the function is no more called.

> Should we document and test it, leave it unchanged and
> deprecate it, or simply remove it?

If we change PyUnicode_EncodeDecimal() to reject error handlers different than 
strict, we can keep this function for some release and deprecate it. The 
function is already deprecated beacuse it uses the deprecated Py_UNICODE type.

Victor


More information about the Python-Dev mailing list