[issue2799] Remove PyUnicode_AsString(), rework PyUnicode_AsStringAndSize(), add PyUnicode_AsChar()

Thu Jun 5 23:07:00 CEST 2008

Marc-Andre Lemburg <mal at egenix.com> added the comment:

On 2008-06-05 22:50, Martin v. Löwis wrote:
>> Note that the function *must* check the UTF-8 buffer for embedded
>> NUL bytes and then raise an exception if it finds one. Otherwise,
>> the API would silently cause truncations.
> 
> PyString_AsString doesn't check for null bytes, either, and will also
> silently truncate. This has never been a problem, so I fail to see why
> it is a problem for Unicode strings.

Just because a bug hasn't surfaced yet, doesn't make it a non-issue.

The problem is also somewhat different for Unicode:

Unlike PyString_AsString() a Unicode API PyUnicode_UTF8() would not
provide easy access to the length of the returned char*.

And there is no PyString_GET_SIZE() you could use to quickly verify that
there are no embedded NULs.

Which is why using PyUnicode_AsStringAndSize() is the overall better
and safer solution.

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue2799>
_______________________________________