Strings and Unicode

John Machin sjmachin at
Mon Jul 21 02:51:10 CEST 2003

madsurfer2000 at (-) wrote in message news:<fef0a228.0307200828.6c171de7 at>...
> I wrote:
> > I have a function that takes a string as an input parameter. This
> > function then urlencodes the string and sends it to a server with
> > telnetlib.Telnet
> > 
> > The problem is that the string gets converted into what seems to be
> > Unicode. How can I check to see if the input-string is Unicode and
> > convert it to a different character set (in this case ISO-Latin1).
> I see that my question may have been asked in the wrong way, so here's
> more details:
> I'm using Python 2.2.2. under Windows XP. The function I was talking
> about actually takes two strings, and I send both parameters to the
> urlencode function. The urlencode-function I use is imported from
> urllib.
> My function:
> def addPOSTParam(self,name,value):

insert here: print 'value:', type(value), repr(value)

>     param = urlencode({name: value})

insert here: print 'param:', type(param), repr(param)

>     self.POSTParameters.append(param)
> Example:
> client.addPOSTParam("param","abc æ")
> POSTParameters then looks like:
> ['param=abc+%C3%A6']
> Here, the 'æ' character is converted into what seems to be Unicode

Unlikely. U+C3A6 is a Hangul (Korean) syllable. 

> I would have expected the following:
> ['param=abc+%E6']

So would I. See below. However despite the fact that the last
character in your 'value' shows up as "small ae ligature" in MSIE, we
would really like to see some code of yours that *minimally* and
*unambiguously* shows the problem, and can be executed in the minimal
Python environment (i.e. sans gui, command prompt, again: see below).

Python 2.2.3 (#42, May 30 2003, 18:12:08) [MSC 32 bit (Intel)] on
>>> import urllib
>>> urllib.urlencode({'name':'abc \xe6'})
>>> urllib.quote_plus('abc \xe6')

More information about the Python-list mailing list