Hi, I'm trying to rewrite PyUnicode_EncodeDecimal() to upgrade it to the new Unicode API. The problem is that the function is not accessible in Python nor tested. Should we document and test it, leave it unchanged and deprecate it, or simply remove it? -- Python has a PyUnicode_EncodeDecimal() function. It was used in Python 2 by int, long and complex constructors. In Python 3, the function is no more used: it has been replaced by PyUnicode_TransformDecimalToASCII() in Python <= 3.2 and _PyUnicode_TransformDecimalAndSpaceToASCII() in Python 3.3. PyUnicode_EncodeDecimal() goes into an unlimited loop if there is more than one unencodable character. It's a known bug and there is a patch: http://bugs.python.org/issue13093 PyUnicode_EncodeDecimal() is undocumented and not tested: http://bugs.python.org/issue8646 Stefan Krah uses PyUnicode_EncodeDecimal() in its cdecimal project. See also "Malformed error message from float()" issue: http://bugs.python.org/issue10557 Python 3.3 has now 3 encoders to decimal: - PyUnicode_EncodeDecimal() - PyUnicode_TransformDecimalToASCII() - _PyUnicode_TransformDecimalAndSpaceToASCII() (new in 3.3) _PyUnicode_TransformDecimalAndSpaceToASCII() replaces also Unicode spaces with ASCII spaces. PyUnicode_EncodeDecimal() and PyUnicode_TransformDecimalToASCII() take Py_UNICODE* strings. PyUnicode_EncodeDecimal() requires an output buffer and it has no argument for the size of the output buffer. It is unsafe: it leads to buffer overflow if the buffer is too small. Victor
Le lundi 21 novembre 2011 21:39:53, Victor Stinner a écrit :
I'm trying to rewrite PyUnicode_EncodeDecimal() to upgrade it to the new Unicode API. The problem is that the function is not accessible in Python nor tested.
I added tests for this function in Python 2.7, 3.2 and 3.3.
PyUnicode_EncodeDecimal() goes into an unlimited loop if there is more than one unencodable character. It's a known bug and there is a patch: http://bugs.python.org/issue13093
I fixed this issue. I was wrong: it was not possible to DoS Python, the bug was not an unlimited loop (but there was a bug on error handling).
PyUnicode_EncodeDecimal() requires an output buffer and it has no argument for the size of the output buffer. It is unsafe: it leads to buffer overflow if the buffer is too small.
This function is broken by design if an error handler is specified: the caller cannot know the size of the output buffer, whereas the caller has to allocate this buffer. I propose to raise an error if an error handler (different than "strict") is specified) and do this change in Python 2.7, 3.2 and 3.3. In Python 2.7 code base, PyUnicode_EncodeDecimal() is always called with errors=NULL. In Python 3.x, the function is no more called.
Should we document and test it, leave it unchanged and deprecate it, or simply remove it?
If we change PyUnicode_EncodeDecimal() to reject error handlers different than strict, we can keep this function for some release and deprecate it. The function is already deprecated beacuse it uses the deprecated Py_UNICODE type. Victor
Victor Stinner
Should we document and test it, leave it unchanged and deprecate it, or simply remove it?
If we change PyUnicode_EncodeDecimal() to reject error handlers different than strict, we can keep this function for some release and deprecate it. The function is already deprecated beacuse it uses the deprecated Py_UNICODE type.
I'd be fine with removing the function in 3.4. For consistency, it might be better to remove it in 4.0 together with all the other deprecated functions (at least I understood that this was the plan). Stefan Krah
Le mardi 22 novembre 2011 02:02:05, Victor Stinner a écrit :
This function is broken by design if an error handler is specified: the caller cannot know the size of the output buffer, whereas the caller has to allocate this buffer.
I propose to raise an error if an error handler (different than "strict") is specified) and do this change in Python 2.7, 3.2 and 3.3.
In Python 2.7 code base, PyUnicode_EncodeDecimal() is always called with errors=NULL. In Python 3.x, the function is no more called.
I opened the following issue for this point: http://bugs.python.org/issue13452 Victor
participants (2)
-
Stefan Krah
-
Victor Stinner