problem with cjkcodecs on Mandrake linux +++
Anthony Liu
antonyliu2002 at yahoo.com
Wed Mar 17 12:03:11 EST 2004
Dear Skip,
Thank you so much. It is exactly the problem. And it
works by issuing print s.encode('gbk').
I got a good lecture from you about unicode and
encoding. The other people probably assumed that I
knew. :)
I appreciate it.
--- Skip Montanaro <skip at pobox.com> wrote:
>
> Anthony> s = 'abc'
> Anthony> unicode(s, 'gbk')
> Anthony> print s # prints 'abc'
>
> [fails]
>
> Anthony,
>
> The above is a bit nonsensical, since you didn't
> actually modify s. I
> assume you really meant:
>
> s = 'abc'
> s = unicode(s, 'gbk')
> print s
>
> Remember the basic rule of Unicode? If you don't
> know the encoding, you
> don't know nuthin'. Unicode objects themselves are
> encoding-neutral. The
> print statement has to encode s somehow (Unicode
> objects aren't displayed
> directly), so it uses the system's default encoding,
> which from your earlier
> messages appears to be "latin-1".
>
> Perhaps you're confused by
>
> s = unicode(s, 'gbk')
>
> This says, "Convert the string s to a Unicode object
> assuming the string is
> encoded using the 'gbk' charset, then bind the
> resulting object to s." Note
> the 'gbk' doesn't become an attribute of the Unicode
> object, so later on
> when you try to print it
>
> print s
>
> it needs to decide how to encode the object and for
> that it used the current
> default encoding, typically "ascii". In the case of
> "abc" that's no
> problem. For other code points in other character
> sets (I'm not sure I'm
> using the terminology quite right there) you need to
> be explicit:
>
> print s.encode('gbk')
>
> or use an appropriate system-wide default encoding.
>
> Skip
>
__________________________________
Do you Yahoo!?
Yahoo! Mail - More reliable, more storage, less spam
http://mail.yahoo.com
More information about the Python-list
mailing list