different encodings for unicode() and u''.encode(), bug?

Wed Jan 2 05:57:17 EST 2008

On Jan 2, 10:44 am, John Machin <sjmac... at lexicon.net> wrote:
>
> Two things for you to do:
>
> (1) Try these at the Python interactive prompt:
>
> unicode('', 'latin1')
> unicode('', 'mbcs')
> unicode('', 'raboof')
> unicode('abc', 'latin1')
> unicode('abc', 'mbcs')
> unicode('abc', 'raboof')

$ python
Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04)
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> unicode('', 'mbcs')
u''
>>> unicode('abc', 'mbcs')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
LookupError: unknown encoding: mbcs
>>>

Hmmn, strange. Same behaviour for "raboof".

> (2) Read what the manual (Library Reference -> codecs module ->
> standard encodings) has to say about mbcs.

Page at http://docs.python.org/lib/standard-encodings.html says that
mbcs "purpose":
Windows only: Encode operand according to the ANSI codepage (CP_ACP)

Do not know what the implications of encoding according to "ANSI
codepage (CP_ACP)" are. Windows only seems clear, but why does it only
complain when decoding a non-empty string (or when encoding the empty
unicode string) ?

mario