[Tutor] Re: Convert CJK text into HEX value string

Derrick 'dman' Hudson dman@dman.ddts.net
Mon, 15 Jul 2002 19:31:49 -0500


--OXfL5xGRrasGEqWY
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Jul 15, 2002 at 07:47:02AM -0400, Erik Price wrote:
|=20
| On Monday, July 15, 2002, at 12:42  AM, Derrick 'dman' Hudson wrote:
|=20
| >The general method is to use the unicode() constructor to create a
| >unicode string object from whatever your input source is, and then use
| >the .encode() method to encode that in which ever encoding is
| >appropriate.  Then use the quote() method in the urllib module to
| >url encode it.  So, for example, using latin1 and utf-8 :
|=20
| Pardon me for jumping in on this thread, but I'm curious -- does the=20
| string have to be converted to unicode because that's the encoding that=
=20
| URL encodings must be taken from?  Or is there some other reason it's=20
| being converted to unicode first?

Unicode string objects have a '.encode()' method.

URLs merely need to have any "odd" character encoded as %XX where XX
is the hex code of the character.  Whether the encoded character is
treated as ascii, latin1, utf-8, or euc-jp is up to the sender and
receiver.

| And, sorta off-topic... why does it need to be converted to "utf-8"...=20
| that's something further than unicode?

Unicode is the definition of a table of characters.  Unicode
characters are 2 bytes in length.  When you send data through a
stream (eg in a file or through a socket) you send bytes.  Thus you
need some method for encoding the 2-byte Unicode characters into a
stream that is based on single bytes.  UTF-8 does just that, and has
the nice bonus of encoding the ascii subset of unicode as plain ascii.

You can, of course, encode the string in any encoding, as long as the
data you have is representable in that encoding.

-D

--=20
The heart is deceitful above all things
    and beyond cure.
    Who can understand it?
=20
I the Lord search the heart
    and examine the mind,
to reward a man according to his conduct,
    according to what his deeds deserve.
=20
        Jeremiah 17:9-10
=20
http://dman.ddts.net/~dman/

--OXfL5xGRrasGEqWY
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iEYEARECAAYFAj0zaXUACgkQO8l8XBKTpRSXSQCgp70OoAhxpLhYZoTNB7RBDNba
9M4AoIYx+LLEYFUQDI4/J7kdmMNkw2W5
=A1Op
-----END PGP SIGNATURE-----

--OXfL5xGRrasGEqWY--