Ascii to Unicode.

Wed Jul 28 14:32:44 EDT 2010

Hi,

I've got an Ascii file with some latin characters. Specifically \xe1 and
\xfc.  I'm trying to import it into a Postgresql database that's running in
Unicode mode. The Unicode converter chokes on those two characters.

I could just manually replace those to characters with something valid but
if any other invalid characters show up in later versions of the file, I'd
like to handle them correctly.

I've been playing with the Unicode stuff and I found out that I could
convert both those characters correctly using the latin1 encoder like this;

	import unicodedata

	s = '\xe1\xfc'
	print unicode(s,'latin1')

The above works.  When I try to convert my file however, I still get an
error;

	import unicodedata

	input = file('ascii.csv', 'r')
	output = file('unicode.csv','w')

	for line in input.xreadlines():
		output.write(unicode(line,'latin1'))

	input.close()
	output.close()

Traceback (most recent call last):
  File "C:\Users\jgold\CloudmartFiles\UnicodeTest.py", line 10, in __main__
    output.write(unicode(line,'latin1'))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position
295: ordinal not in range(128)

I'm stuck using Python 2.4.4 which may be handling the strings differently
depending on if they're in the program or coming from the file.  I just
haven't been able to figure out how to get the Unicode conversion working
from the file data.

Can anyone explain what is going on?