tim.arnold at sas.com
Thu Jul 26 21:16:50 CEST 2007
Hi, I'm beginning to understand the encode/decode string methods, but I'd
like confirmation that I'm still thinking in the right direction:
I have a file of latin1 encoded text. Let's say I put one line of that file
into a string variable 'tocline', as follows:
tocline = 'Ficha Datos de p\xe9rdida AND acci\xf3n'
tocFile = codecs.open('mytoc.htm','wb',encoding='utf8',errors='replace')
tocline = tocline.decode('latin1','replace')
What I think is that tocFile is wrapped to insure that anything written to
it is in utf8
I decode the latin1 string into python's internal unicode encoding and that
gets written out as utf8.
what exactly is the tocline when it's read in with that \xe9 and \xed in the
string? A latin1 encoded string?
Is my method the right way to write such a line out to a file with utf8
If I read in the latin1 file using
codecs.open(filename,encoding='latin1') and write out the utf8 file by
codecs.open(othername,encoding='utf8'), would I no longer have a problem --
I could just read in latin1 and write out utf8 with no more worries about
More information about the Python-list