japanese encoding iso-2022-jp in python vs. perl
kettle
Josef.Robert.Novak at gmail.com
Tue Oct 23 06:37:30 EDT 2007
Hi,
I am rather new to python, and am currently struggling with some
encoding issues. I have some utf-8-encoded text which I need to
encode as iso-2022-jp before sending it out to the world. I am using
python's encode functions:
--
var = var.encode("iso-2022-jp", "replace")
print var
--
I am using the 'replace' argument because there seem to be a couple
of utf-8 japanese characters which python can't correctly convert to
iso-2022-jp. The output looks like this:
↓東京???日比谷線?北千住行
However if use perl's encode module to re-encode the exact same bit
of text:
--
$var = encode("iso-2022-jp", decode("utf8", $var))
print $var
--
I get proper output (no unsightly question-marks):
↓東京メトロ日比谷線・北千住行
So, what's the deal? Why can't python properly encode some of these
characters? I know there are a host of different iso-2022-jp
variants, could it be using a different one than I think (the
default)? I'm quite liking python at the moment for a variety of
different reasons (I suspect perl will forever win when it comes to
regular expressions but everything else is pretty darn nice), but this
is a bit worrying.
-Joe
More information about the Python-list
mailing list