[ python-Bugs-1758804 ] unicode(None,charset) raise TypeError

Mon Jul 23 16:32:31 CEST 2007

Bugs item #1758804, was opened at 2007-07-23 12:39
Message generated for change (Comment added) made by lemburg
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1758804&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Unicode
Group: Python 2.5
Status: Pending
Resolution: Works For Me
Priority: 5
Private: No
Submitted By: Guillaume (guillaumb)
Assigned to: M.-A. Lemburg (lemburg)
Summary: unicode(None,charset) raise TypeError

Initial Comment:
Behavior of unicode() builtin is not the same with None as the first argument if we give  the second optional argument.

>>> unicode(None)
u'None'
>>> unicode(None,'ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: coercing to Unicode: need string or buffer, NoneType found

This is confusing.

----------------------------------------------------------------------

>Comment By: M.-A. Lemburg (lemburg)
Date: 2007-07-23 16:32

Message:
Logged In: YES 
user_id=38388
Originator: NO

I agree that it's confusing.

The reason is that unicode() with only one argument will use
PyObject_Unicode() for the conversion (which then applies some extra logic
to turn a non-string argument to a string), while the variant with an
encoding will interface to PyUnicode_FromEncodedObject() which only works
for strings and character buffers.

I think that all three APIs, PyUnicode_FromObject(),
PyUnicode_FromEncodedObject() and PyObject_Unicode() should be unified to
use the same logic in their way of converting an object to Unicode. In the
light of Py3k, it's probably best to then go with the PyObject_Unicode()
approach.

----------------------------------------------------------------------

Comment By: Georg Brandl (gbrandl)
Date: 2007-07-23 13:53

Message:
Logged In: YES 
user_id=849994
Originator: NO

Maybe, but it is at least documented:
http://docs.python.org/lib/built-in-funcs.html

"If encoding and/or errors are given, unicode()  will decode the object
which can either be an 8-bit string or a character buffer using the codec
for encoding."

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1758804&group_id=5470