problem with cjkcodecs on Mandrake linux +++

Anthony Liu antonyliu2002 at yahoo.com
Wed Mar 17 12:03:11 EST 2004


Dear Skip,

Thank you so much.  It is exactly the problem.  And it
works by issuing print s.encode('gbk').

I got a good lecture from you about unicode and
encoding.  The other people probably assumed that I
knew.  :)

I appreciate it.

--- Skip Montanaro <skip at pobox.com> wrote:
> 
>     Anthony> s = 'abc'
>     Anthony> unicode(s, 'gbk')
>     Anthony> print s # prints 'abc'
> 
>     [fails]
> 
> Anthony,
> 
> The above is a bit nonsensical, since you didn't
> actually modify s.  I
> assume you really meant:
> 
>     s = 'abc'
>     s = unicode(s, 'gbk')
>     print s
> 
> Remember the basic rule of Unicode?  If you don't
> know the encoding, you
> don't know nuthin'.  Unicode objects themselves are
> encoding-neutral.  The
> print statement has to encode s somehow (Unicode
> objects aren't displayed
> directly), so it uses the system's default encoding,
> which from your earlier
> messages appears to be "latin-1".
> 
> Perhaps you're confused by
> 
>     s = unicode(s, 'gbk')
> 
> This says, "Convert the string s to a Unicode object
> assuming the string is
> encoded using the 'gbk' charset, then bind the
> resulting object to s." Note
> the 'gbk' doesn't become an attribute of the Unicode
> object, so later on
> when you try to print it
> 
>     print s
> 
> it needs to decide how to encode the object and for
> that it used the current
> default encoding, typically "ascii".  In the case of
> "abc" that's no
> problem.  For other code points in other character
> sets (I'm not sure I'm
> using the terminology quite right there) you need to
> be explicit:
> 
>     print s.encode('gbk')
> 
> or use an appropriate system-wide default encoding.
> 
> Skip
> 


__________________________________
Do you Yahoo!?
Yahoo! Mail - More reliable, more storage, less spam
http://mail.yahoo.com




More information about the Python-list mailing list