[Tutor] Re: Convert CJK text into HEX value string

Derrick 'dman' Hudson dman@dman.ddts.net
Sun, 14 Jul 2002 23:42:15 -0500


--FL5UXtIhxfXey3p5
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Jul 15, 2002 at 02:30:34AM +0000, Hy Python wrote:
| Does anyone know how to Convert CJK(Chinese/Japanese/Korean) text into HE=
X=20
| value string?

I know the process, but I have no experience with those particular
charsets.  Which CJK charset are you referring to?

| I need to pass CJK Text via POST/GET cgi method. I need to convert the te=
xt=20
| into a string of HEX values like "%BE%DF".
|=20
| Thanks a lot for your help.

The general method is to use the unicode() constructor to create a
unicode string object from whatever your input source is, and then use
the .encode() method to encode that in which ever encoding is
appropriate.  Then use the quote() method in the urllib module to
url encode it.  So, for example, using latin1 and utf-8 :

>>> l_str =3D '\xbe\xdf'   # raw data
>>> u_str =3D unicode( l_str , 'latin1' )  # raw data is treated as latin1
>>> print repr( u_str )
u'\xbe\xdf'
# convert the unicode characters to a utf-8 encoded stream
>>> utf_str =3D u_str.encode( 'utf-8' )
>>> print repr( utf_str )
'\xc2\xbe\xc3\x9f'
>>> import urllib
>>> print urllib.quote( utf_str )
%C2%BE%C3%9F

I think this shows the basic concepts involved, and also that the
exact steps to take depends on where you get your data from and in
what form it is in when you get it.

The file encodings/aliases.py has this note in it :

    # CJK
    #
    # The codecs for these encodings are not distributed with the
    # Python core, but are included here for reference, since the
    # locale module relies on having these aliases available.
    #
    'jis_7': 'jis_7',
    'iso_2022_jp': 'jis_7',
    'ujis': 'euc_jp',
    'ajec': 'euc_jp',
    'eucjp': 'euc_jp',
    'tis260': 'tactis',
    'sjis': 'shift_jis',

I presume you'll be using one of these encodings, and you'll need to
find a codec for them.

-D

--=20
If Microsoft would build a car...
=2E.. Occasionally your car would die on the freeway for no reason. You
would have to pull over to the side of the road, close all of the car
windows, shut it off, restart it, and reopen the windows before you
could continue. For some reason you would simply accept this.
=20
http://dman.ddts.net/~dman/

--FL5UXtIhxfXey3p5
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iEYEARECAAYFAj0yUqcACgkQO8l8XBKTpRTtkgCgwXGhJjcdgb3WpnuBNKTeaWEc
92IAoKzQg0PMYFura67nulJ8G9cE2AM/
=XFrV
-----END PGP SIGNATURE-----

--FL5UXtIhxfXey3p5--