different encodings for unicode() and u''.encode(), bug?
mario
mario at ruggier.org
Wed Jan 2 05:57:17 EST 2008
On Jan 2, 10:44 am, John Machin <sjmac... at lexicon.net> wrote:
>
> Two things for you to do:
>
> (1) Try these at the Python interactive prompt:
>
> unicode('', 'latin1')
> unicode('', 'mbcs')
> unicode('', 'raboof')
> unicode('abc', 'latin1')
> unicode('abc', 'mbcs')
> unicode('abc', 'raboof')
$ python
Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04)
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> unicode('', 'mbcs')
u''
>>> unicode('abc', 'mbcs')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
LookupError: unknown encoding: mbcs
>>>
Hmmn, strange. Same behaviour for "raboof".
> (2) Read what the manual (Library Reference -> codecs module ->
> standard encodings) has to say about mbcs.
Page at http://docs.python.org/lib/standard-encodings.html says that
mbcs "purpose":
Windows only: Encode operand according to the ANSI codepage (CP_ACP)
Do not know what the implications of encoding according to "ANSI
codepage (CP_ACP)" are. Windows only seems clear, but why does it only
complain when decoding a non-empty string (or when encoding the empty
unicode string) ?
mario
More information about the Python-list
mailing list