encode/decode misunderstanding

Tim Arnold tim.arnold at sas.com
Thu Jul 26 21:16:50 CEST 2007

Hi, I'm beginning to understand the encode/decode string methods, but I'd 
like confirmation that I'm still thinking in the right direction:

I have a file of latin1 encoded text. Let's say I put one line of that file 
into a string variable 'tocline', as follows:
tocline = 'Ficha Datos de p\xe9rdida AND acci\xf3n'

import codecs
tocFile = codecs.open('mytoc.htm','wb',encoding='utf8',errors='replace')
tocline = tocline.decode('latin1','replace')

What I think is that tocFile is wrapped to insure that anything written to 
it is in utf8
I decode the latin1 string into python's internal unicode encoding and that 
gets written out as utf8.

what exactly is the tocline when it's read in with that \xe9 and \xed in the 
string? A latin1 encoded string?
Is my method the right way to write such a line out to a file with utf8 

If I read in the latin1 file using
codecs.open(filename,encoding='latin1') and write out the utf8 file by 
opening with
codecs.open(othername,encoding='utf8'), would I no longer have a problem --  
I could just read in latin1 and write out utf8 with no more worries about 


More information about the Python-list mailing list