unicode(s, enc).encode(enc) == s ?

"Martin v. Löwis" martin at v.loewis.de
Thu Dec 27 13:37:17 EST 2007


> Given no UnicodeErrors, are there any cases for the following not to
> be True?
> 
>     unicode(s, enc).encode(enc) == s

Certainly. ISO-2022 is famous for having ambiguous encodings. Try
these:

unicode("Hallo","iso-2022-jp")
unicode("\x1b(BHallo","iso-2022-jp")
unicode("\x1b(JHallo","iso-2022-jp")
unicode("\x1b(BHal\x1b(Jlo","iso-2022-jp")

or likewise

unicode("\x1b$@BB","iso-2022-jp")
unicode("\x1b$BBB","iso-2022-jp")

In iso-2022-jp-3, there are even more ways to encode the same string.

Regards,
Martin



More information about the Python-list mailing list