Modifying the PyUnicode_FromUnicode result
Currently, a number of routines assume that the result of PyUnicode_FromUnicode can be modified, i.e. they mutate the resulting unicode object. Look at unicodeobject.c:fixup for an example, and assume that fixfct is fixtitle (*). This is different from PyString_FromStringAndSize, whose result can be only modified if the str argument is NULL. These routines broke after I applied my caching patch, since now PyUnicode_FromUnicode may return an existing string. Is that difference intentional? My feeling is that it is an error to modify a unicode object, unless it is known not to be initialized. Regards, Martin P.S. This was actually the first failure case when running test_unicodedata under my patch.
"Martin v. Loewis" wrote:
Currently, a number of routines assume that the result of PyUnicode_FromUnicode can be modified, i.e. they mutate the resulting unicode object. Look at unicodeobject.c:fixup for an example, and assume that fixfct is fixtitle (*).
This is true for the APIs in unicodeobject.c: as long as the newly created object hasn't "left" the Unicode implementation, the APIs in there are free to modify the otherwise immutable object.
This is different from PyString_FromStringAndSize, whose result can be only modified if the str argument is NULL.
These routines broke after I applied my caching patch, since now PyUnicode_FromUnicode may return an existing string.
Is that difference intentional? My feeling is that it is an error to modify a unicode object, unless it is known not to be initialized.
It is an error, but only for code outside the implementation, i.e. programs using the public API may only modify the contents when calling PyUnicode_FromUnicode() with NULL as u argument. Sorry for not remembering this when reviewing your patch on SF. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Pages: http://www.lemburg.com/python/
This is true for the APIs in unicodeobject.c: as long as the newly created object hasn't "left" the Unicode implementation, the APIs in there are free to modify the otherwise immutable object.
That means that PyUnicode_FromUnicode does give a guarantee to return a fresh object, right? Then I cannot understand why it only gives this guarantee to callers inside unicodeobject.c, and not to other callers... Regards, Martin
"Martin v. Loewis" wrote:
This is true for the APIs in unicodeobject.c: as long as the newly created object hasn't "left" the Unicode implementation, the APIs in there are free to modify the otherwise immutable object.
That means that PyUnicode_FromUnicode does give a guarantee to return a fresh object, right?
Let's put it this way: the internals in unicodeobject.c are allowed to modify the contents of the object even if it was prefilled with data that came from an initializer. External caller are not allowed to do this though unless u is set to NULL (just like in the corresponding string call).
Then I cannot understand why it only gives this guarantee to callers inside unicodeobject.c, and not to other callers...
Because I want to reserve the right to change the semantics *inside* unicodeobject.c at some later point. Note that currently no caching of Unicode objects takes place, but this could change in the future and indeed your patch starts into this direction. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Pages: http://www.lemburg.com/python/
Because I want to reserve the right to change the semantics *inside* unicodeobject.c at some later point. Note that currently no caching of Unicode objects takes place, but this could change in the future and indeed your patch starts into this direction.
So would you accept a patch that corrects all calls to PyUnicode_FromUnicode which modify the result they get, without having passed a NULL str argument? Regards, Martin
"Martin v. Loewis" wrote:
Because I want to reserve the right to change the semantics *inside* unicodeobject.c at some later point. Note that currently no caching of Unicode objects takes place, but this could change in the future and indeed your patch starts into this direction.
So would you accept a patch that corrects all calls to PyUnicode_FromUnicode which modify the result they get, without having passed a NULL str argument?
Yes :) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Pages: http://www.lemburg.com/python/
participants (2)
-
M.-A. Lemburg
-
Martin v. Loewis