[issue4328] "à" in u"foo" raises a misleading error

Sun Nov 16 04:50:58 CET 2008

Ezio Melotti <ezio.melotti at gmail.com> added the comment:

Usually, when you do operations involving unicode and normal strings,
the latter are coerced to unicode using the default encoding. If the
codec is not able to decode the string a UnicodeDecodeError is raised. E.g.:
>>> 'à' + u'foo'
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0x85 in position 0:
ordinal not in range(128)
The same error is raised with u'%s' % 'à'.

I think that 'à' in u'foo' should behave in the same way (i.e. try to
decode the string and possibly raise a UnicodeDecodeError). This is
probably the most coherent and backward-compatible solution, at least in
Python2.x. In Python2.x normal and unicode strings are often mixed and
having 'f' in u'foo' that raises a TypeError will probably break lot of
code.

In Python3.x it could make sense, the strings are unicode by default and
you are not supposed to mix byte strings and unicode strings so we may
require an explicit decoding.

The behavior should be consistent for all the operations, if we decide
to raise a TypeError with 'in' it should be raised with '+' and '%' (and
possibly others) as well.

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue4328>
_______________________________________