Problems with unicode

David Opstad opstad at batnet.com
Sat Mar 13 17:39:33 EST 2004


In article <a091da2f.0403131345.5e82b07e at posting.google.com>,
 jamesl at appliedminds.com (James Laamnna) wrote:

> Apparently in the batch that I'm encoding there is one string with 
> non-ascii characters in it. Is there any way to just have it encode 
> everything as unicode and not ascii?

A better question to ask is this: where did the supposed ASCII data come 
from in the first place? If, for instance, it came from a Windows 
machine, then there's a chance it's actually ISO-8859-1 encoding, in 
which case you can preserve the 0x92 by encoding using that codec, 
instead of the 'ascii' one. Similarly, if the original text came from a 
Mac, then it's likely in Mac Roman, so if you use the 'mac-roman' codec 
you'll be able to preserve the correct character in your resulting 
Unicode.

Dave



More information about the Python-list mailing list