_PyUnicode_New/PyUnicode_Resize

should be exported as part of the unicode object API. Otherwise, external C codec developers have to jump through some useless and silly hoops in order to construct a PyUnicode object. Additionally, you mentioned to Andrew that the decoders don't have to return a tuple anymore. Thats currently incorrect with whats currently in CVS: Python\codecs.c:PyCodec_Decode() current requires, but ignores the integer returned in the tuple. Should this be fixed, or must codecs return the integer as Misc\unicode.txt says? Thanks, Bill

Bill Tutt wrote:
should be exported as part of the unicode object API.
Otherwise, external C codec developers have to jump through some useless and silly hoops in order to construct a PyUnicode object.
Hmm, resize would be useful, agreed. The reason I haven't made these public is that the internal allocation logic could be changed in some future version to more elaborate and faster techniques. Having the _PyUnicode_* API private makes these changes possible without breaking external C code. E.g. say Unicode gets interned someday, then resize will need to watch out not resizing a Unicode object which is already stored in the interning dict. Perhaps a wrapper with additional checks around _PyUnicode_Resize() would be useful. Note that you don't really need _PyUnicode_New(): call PyUnicode_FromUnicode() with NULL argument and then fill in the buffer using PyUnicode_AS_UNICODE()... works just like PyString_FromStringAndSize() with NULL argument.
Additionally, you mentioned to Andrew that the decoders don't have to return a tuple anymore. Thats currently incorrect with whats currently in CVS: Python\codecs.c:PyCodec_Decode() current requires, but ignores the integer returned in the tuple. Should this be fixed, or must codecs return the integer as Misc\unicode.txt says?
That was a misunderstanding on my part: I was thinking of the .read()/.write() methods which are now in synch with the other file objects. .read() previously returned a tuple and .write() an integer. .encode() and .decode() must return a tuple. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

E.g. say Unicode gets interned someday, then resize will need to watch out not resizing a Unicode object which is already stored in the interning dict.
Note that string objects deal with this by requiring that the reference count is 1 when a string is resized. This effectively enforces that resizes are only used when the original creator is still working on the string. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
E.g. say Unicode gets interned someday, then resize will need to watch out not resizing a Unicode object which is already stored in the interning dict.
Note that string objects deal with this by requiring that the reference count is 1 when a string is resized. This effectively enforces that resizes are only used when the original creator is still working on the string.
Nice trick ;-) The new PyUnicode_Resize() will have the same interface as _PyString_Resize() since this seems to be the most flexible way to implement it without giving away possibilities for future optimizations... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
participants (3)
-
Bill Tutt
-
Guido van Rossum
-
M.-A. Lemburg