unicode compare errors

Nobody nobody at nowhere.com
Fri Dec 10 16:09:05 EST 2010


On Fri, 10 Dec 2010 11:51:44 -0800, Ross wrote:

> Since I can't control the encoding of the input file that users
> submit, how to I get past this?  How do I make such comparisons be
> True?

On Fri, 10 Dec 2010 12:07:19 -0800, Ross wrote:

> I found I could import codecs that allow me to read the file with my
> desired encoding. Huzzah!

> If I'm off-base and kludgey here and should be doing something

Er, do you know the file's encoding or don't you? Using:

    aFile = codecs.open(thisFile, encoding='utf-8')

is telling Python that the file /is/ in utf-8. If it isn't in utf-8,
you'll get decoding errors.

If you are given a file with no known encoding, then you can't reliably
determine what /characters/ it contains, and thus can't reliably compare
the contents of the file against strings of characters, only against
strings of bytes.

About the best you can do is to use an autodetection library such as:

	http://chardet.feedparser.org/





More information about the Python-list mailing list