[Python-bugs-list] [ python-Bugs-734170 ] PyArg_ParseTuple("u") inconstency

SourceForge.net noreply@sourceforge.net
Thu, 08 May 2003 08:28:12 -0700


Bugs item #734170, was opened at 2003-05-07 22:02
Message generated for change (Comment added) made by ronaldoussoren
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=734170&group_id=5470

Category: Unicode
Group: Python 2.3
Status: Closed
Resolution: Wont Fix
Priority: 5
Submitted By: Ronald Oussoren (ronaldoussoren)
Assigned to: M.-A. Lemburg (lemburg)
Summary: PyArg_ParseTuple("u") inconstency

Initial Comment:
PyArg_ParseTuple behaves strangly with the format
character "u" when a plain string is passed as the
argument. This seems to perform the conversion
value.decoded("utf-16"), which is very surprising and
inconsistent with the behaviour of the python
expression unicode(value) with is equivalent with
value.decode("ascii").

The current behaviour will confuse users that pass
plain strings to extension functions expecting unicode
arguments.

----------------------------------------------------------------------

>Comment By: Ronald Oussoren (ronaldoussoren)
Date: 2003-05-08 17:28

Message:
Logged In: YES 
user_id=580910

The conversion for plain is not mentioned in the the
documentation (section 5.5 of the in-development
documentation at python.org). Based on that documentation
I'd assume that plain strings will either not be accepted or
be converted to unicode objects.

As a user of functions that use the 'u' format specifier I
was very surprised when my plain string argument seemingly
got mangled. Only after close checking I noticed that the
string is interpreted as raw utf-16 data.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2003-05-07 22:37

Message:
Logged In: YES 
user_id=38388

Why is the current beahviour surprising ? The "u" parser
marker reads a buffer like object and interprets it
as Py_UNICODE array. It is very different from
what unicode(value) does. The "es" marker takes
care of unicode(value, encoding) kind of conversions.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=734170&group_id=5470