Unicode -> String problem

Jay Parlar jparlar at home.com
Sun Jul 8 16:42:09 EDT 2001


I'm having a problem converting unicode text to string type with str(). 

The code snippet causing the problem is

if type(pageText) == UnicodeType:
                newText = str(pageText)

and the error message I receive is

Traceback (most recent call last):
  File "<interactive input>", line 1, in ?
  File "D:\MyData\HOME\PWA\Scripts\filter.py", line 60, in parser
    newText = str(pageText)
UnicodeError: ASCII encoding error: ordinal not in range(128)

Now, I know there is a lot of precedence for these "...ordinal not in range(128)" questions, but I've looked around, and I 
haven't found anything that will explicitly do what I want, namely, completely remove any uncovertable unicode characters. I 
have to be able to parse this text afterwards, using a lot of Python's string functions, so I need 'newText' to be a string, but I'd 
really prefer not to have the various unicode strings (ie \xa0) showing up. Is there a simple way to convert the unicode text to 
StringType, removing the resulting unicode strings for unrepresentable characters?

Thanks in advance to anyone who's looking at this,
Jay P.






More information about the Python-list mailing list