UnicodeError with OCR Text

Fri May 23 14:38:31 EDT 2003

Paradox wrote:

> I am extracting OCR data from SQL Server Text field through ADO and
> putting it into a string called fileContent. For some reason it thinks
> that every record is a UNICODE string which it is not. 

You mean, the data that you get from SQL Server are Unicode strings
and shouldn't be? Why not? Unicode is a completely appropriate data
type to represent text, much better suited that the standard Python
byte string.

> For most
> records the following line of code will work to get it back to normal
> thinking 

Maybe you should change your thinking.

> fullText = fullText + fileContent.encode('ascii') + '\n'
> UnicodeError: ASCII encoding error: ordinal not in range(128)
> 
> I think I isolated it to the degree character "º" HEX is BA, ASCII is
> 186.

Most likely. ASCII does not support the degree character,
so you cannot convert that character to ASCII. If you absolutely need
byte strings, you should use an encoding that has that character,
such as iso-9959-1, cp1252, or UTF-8.

Regards,
Martin