[issue10435] Document unicode C-API in reST
report at bugs.python.org
Tue Nov 23 14:46:26 CET 2010
Marc-Andre Lemburg <mal at egenix.com> added the comment:
Alexander Belopolsky wrote:
> Alexander Belopolsky <belopolsky at users.sourceforge.net> added the comment:
> On Wed, Nov 17, 2010 at 5:20 PM, Marc-Andre Lemburg
> <report at bugs.python.org> wrote:
>> -/* Encodes a Unicode object and returns the result as Python string
>> +/* Encodes a Unicode object and returns the result as Python bytes
>> object. */
>> PyUnicode_AsEncodedObject() encodes the Unicode object to
>> whatever the codec returns, so the "bytes" is wrong in the
>> above line.
> The above line describes PyUnicode_AsEncodedString(), not
> PyUnicode_AsEncodedObject(). The former has PyBytes_Check(v) after
> calling v = PyCodec_Encode(..). As far as I can tell this is the
> only difference that makes PyUnicode_AsEncodedObject() not redundant.
In that case, the change is fine.
>> +.. c:function:: PyObject* PyUnicode_AsDecodedObject(PyObject *unicode, const char *encoding, const char *errors)
>> + Create a Unicode object by decoding the encoded Unicode object
>> + *unicode*.
>> The function does not guarantee that a Unicode object will be
>> returned. It merely passes a Unicode object to a codec's
>> decode function and returns whatever the codec returns.
> Good point. I am changing "Unicode object" to "Python object".
>> + Note that Python codecs do not accept Unicode objects for decoding,
>> + so this method is only useful with user or 3rd party codecs.
>> Please strike the last sentence. The codecs that were wrongly removed
>> from Python3 will get added back and provide such functionality.
> Would it be acceptable to keep this note, but add "as of version 3.2"
> or something like that? I don't think there is a chance that these
> codecs will be added in 3.2 given the current schedule.
Please remove the sentence or change it to:
Note that most Python codecs only accept Unicode objects for
>> This should read:
>> Decodes a Unicode object by passing the given Unicode object
>> *unicode* to the codec for *encoding*.
>> *encoding* and *errors* have the same meaning as the
>> parameters of the same name in the :func:`unicode` built-in
>> function. The codec to be used is looked up using the Python codec
>> registry. Return *NULL* if an exception was raised by the codec.
> Is the following better?
> Decodes a Unicode object by passing the given Unicode object
> *unicode* to the codec for *encoding*. *encoding* and *errors*
> have the same meaning as the parameters of the same name in the
> :func:`unicode` built-in function. The codec to be used is
> looked up using the Python codec registry. Return *NULL* if an
> exception was raised by the codec.
> As of Python 3.2, this method is only useful with user or 3rd
> party codec that encodes string into something other than bytes.
Same as above.
> For encoding to bytes, use c:func:`PyUnicode_AsEncodedString`
>> +.. c:function:: void PyUnicode_Append(PyObject **pleft, PyObject *right)
>> +.. c:function:: void PyUnicode_AppendAndDel(PyObject **pleft, PyObject *right)
>> Please don't document these two obscure APIs. Instead we should
>> make them private functions by prepending them with an underscore.
>> If you look at the implementations of those two APIs, they
>> are little more than a macros around PyUnicode_Concat().
> I don't agree that they are obscure. Python uses them in multiple
> places and developers seem to know about them. See patches submitted
> to issue4113 and issue7584.
I found these references:
so you're right: they are already in use in the wild. Too bad...
Please add these porting notes to the documentation:
PyUnicode_Append() works like the PyString_Concat(), while
PyUnicode_AppendAndDel() works like PyString_ConcatAndDel().
>> 3rd party extensions should use PyUnicode_Concat() to achieve
>> the same effect.
> Hmm. I would not be surprised if current 3rd party extensions used
> PyUnicode_AppendAndDel() more often than PyUnicode_Concat(). (I know
> that I learned about PyUnicode_AppendAndDel() before
Certainly not more often. PyUnicode_Concat() has been around much
longer than the other two APIs which are only available in Python3.
> Is there anything that makes PyUnicode_AppendAndDel() undesirable? I
> don't mind adding a recommendation to use PyUnicode_Concat() if there
> is a practical reason for it or even a warning that
> PyUnicode_AppendAndDel() may be deprecated in the future, but renaming
> it to _PyUnicode_AppendAndDel() seems premature.
Both APIs are just slight variants of the PyUnicode_Concat()
API. They change parameters in-place which is rather uncommon
for the Unicode API and don't return their result - in fact the
error reporting is somewhat broken: APIs which do in-place
modifcations usually return an integer for error reporting.
These APIs set the *pleft to NULL instead.
Finally, the naming is of PyUnicode_AppendAndDel() is not ideal.
"Del" would suggest that an object is deleted, but in reality
it is only decrefed. It is also not clear that the second argument
is affected, but not the first one.
>> [PyUnicode_InternImmortal(PyObject **p)]
>> I don't think it's a good idea to make this a public API.
>> 3rd party extensions should not need to make use of such
>> Instead, we should make this a private API.
> I agree, but isn't it prudent to document it as deprecated for 3rd
> party use first?
I don't think that's needed in this case. The API is not used
outside Python3, it seems. If people complain in beta phase,
we can always add a deprecation function wrapper instead.
Python tracker <report at bugs.python.org>
More information about the Python-bugs-list