Unicode Hell

Paul Boddie paul at boddie.net
Fri Nov 7 10:57:45 EST 2003


"Stuart Forsyth" <stuartf at the-i-junction.com> wrote in message news:<mailman.514.1068195725.702.python-list at python.org>...
>
> The script then moaned about it being non-ascii and crashed.  The exact
> error is:
> 
> Error Type:
> Python ActiveX Scripting Engine (0x80020009)
> Traceback (most recent call last): File "<Script Block >", line 80, in ?
> FileContents =
> FileContents.replace('Repl learner',str(Request("learner"))) File
> "C:\Python23\lib\site-packages\win32com\client\dynamic.py", line 169, in
>   str   return str(self.  call  ()) UnicodeEncodeError: 'ascii' codec
> can't encode characters in position 5-9: ordinal not in range(128)

I suppose Request("learner") is a Unicode object, and what you're
trying to do here is a brute force conversion to a normal Python
string (ie. to what could be referred to as a sequence of bytes in a
particular encoding which should represent the contents of that
Unicode object). However, the conversion just assumes that when you do
that conversion, you want to encode the contents of that Unicode
object as ASCII. Clearly, you have characters in that Unicode object
which aren't representable in ASCII. Therefore, the conversion fails
and you get that error.

> Error Type:
> Python ActiveX Scripting Engine (0x80020009)
> Traceback (most recent call last): File "<Script Block >", line 80, in ?
> FileContents = FileContents.replace('Repl learner',Request("learner"))
> TypeError: expected a character buffer object 
> /certificate/pycreate.asp, line 104

Here, the replacement seems to fail because it relies on things being
Python strings, but you've supplied a Unicode object.

What you could do is convert the Unicode object to a Python string,
but do so in such a way that the characters that end up in the
resulting Python string are compatible with the ones in the
FileContents string. If I were to make a guess, I'd try a conversion
with an encoding of UTF-8:

  Request("learner").encode("UTF-8")

However, what you should ensure is that the FileContents string is
encoded in UTF-8, and that some indication of the encoding is also
sent to the browser.

An alternative might be to convert the FileContents string to a
Unicode object (again doing some research into what encoding was used
to make that string)...

  unicode(FileContents, encoding="UTF-8")

...do a replace using the Request("learner") object directly, then
encode the whole thing as UTF-8 or whatever, and then make sure that
the browser knows what it's getting.

Paul




More information about the Python-list mailing list