
Hi, In Python 3.2, PyUnicode_Resize() expects a number of Py_UNICODE units, whereas Python 3.3 expects a number of characters. It is tricky to convert a number of Py_UNICODE units to a number of characters, so it is diffcult to provide a backward compatibility PyUnicode_Resize() function taking a number of Py_UNICODE units in Python 3.3. Should we rename PyUnicode_Resize() in Python 3.3 to avoid surprising bugs? The issue only concerns Windows with non-BMP characters, so a very rare use case. The easiest solution is to do nothing in Python 3.3: the API changed, but it doesn't really matter. Developers just have to be careful on this particular issue (which is not well documented today). Victor

2011/11/22 Victor Stinner <victor.stinner@haypocalc.com>
Hi,
In Python 3.2, PyUnicode_Resize() expects a number of Py_UNICODE units, whereas Python 3.3 expects a number of characters.
It is tricky to convert a number of Py_UNICODE units to a number of characters, so it is diffcult to provide a backward compatibility PyUnicode_Resize() function taking a number of Py_UNICODE units in Python 3.3.
Should we rename PyUnicode_Resize() in Python 3.3 to avoid surprising bugs?
The issue only concerns Windows with non-BMP characters, so a very rare use case.
The easiest solution is to do nothing in Python 3.3: the API changed, but it doesn't really matter. Developers just have to be careful on this particular issue (which is not well documented today).
+1. A note in the "Porting C code" section of whatsnew/3.3 should be enough. -- Amaury Forgeot d'Arc

In Python 3.2, PyUnicode_Resize() expects a number of Py_UNICODE units, whereas Python 3.3 expects a number of characters.
Is that really the case? If the string is not ready (i.e. the kind is WCHAR_KIND), then it does count Py_UNICODE units, no? Callers are supposed to call PyUnicode_Resize only while the string is under construction, i.e. when it is not ready. If they resize it after it has been readied, changes to the Py_UNICODE representation wouldn't be reflected in the canonical representation, anyway.
Should we rename PyUnicode_Resize() in Python 3.3 to avoid surprising bugs?
IIUC (and please correct me if I'm wrong) this issue won't cause memory corruption: if they specify a new size assuming it's Py_UNICODE units, but interpreted as code points, then the actual Py_UNICODE buffer can only be larger than expected - right? If so, callers could happily play with Py_UNICODE representation. It won't have the desired effect if the string was ready, but it won't crash Python, either.
The easiest solution is to do nothing in Python 3.3: the API changed, but it doesn't really matter. Developers just have to be careful on this particular issue (which is not well documented today).
See above. I think there actually is no issue in the first place. Please do correct me if I'm wrong. Regards, Martin
participants (3)
-
"Martin v. Löwis"
-
Amaury Forgeot d'Arc
-
Victor Stinner