Problem reading file with umlauts
Claus Hausberger
CHausberger at gmx.de
Tue Jul 7 09:59:49 EDT 2009
Hello
I have a text file with is encoding in Latin1 (ISO-8859-1). I can't change that as I do not create those files myself.
I have to read those files and convert the umlauts like ö to stuff like &oumol; as the text files should become html files.
I have this code:
#!/usr/bin/python
# -*- coding: latin1 -*-
import codecs
f = codecs.open('abc.txt', encoding='latin1')
for line in f:
print line
for c in line:
if c == "ö":
print "oe"
else:
print c
and I get this error message:
$ ./read.py
Abc
./read.py:11: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
if c == "ö":
A
b
c
Traceback (most recent call last):
File "./read.py", line 9, in <module>
print line
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
I checked the web and tried several approaches but I also get some strange encoding errors.
Has anyone ever done this before?
I am currently using Python 2.5 and may be able to use 2.6 but I cannot yet move to 3.1 as many libs we use don't yet work with Python 3.
any help more than welcome. This has been driving me crazy for two days now.
best wishes
Claus
--
Neu: GMX Doppel-FLAT mit Internet-Flatrate + Telefon-Flatrate
für nur 19,99 Euro/mtl.!* http://portal.gmx.net/de/go/dsl02
More information about the Python-list
mailing list