encoding misunderstanding

Tim Arnold a_jtim at bellsouth.net
Fri Jul 27 18:04:35 CEST 2007

Hi,  I'm beginning to understand the encode/decode string methods, but
I'd like confirmation that I'm still thinking in the right direction:

I have a file of latin1 encoded text. Let's say I put one line of that
into a string variable 'tocline', as follows:
tocline = 'Ficha Datos de p\xe9rdida AND acci\xf3n'

import codecs
tocFile =
tocline = tocline.decode('latin1','replace')

What I think is that tocFile is wrapped to insure that anything
written to it is in utf8
I decode the latin1 string into python's internal unicode encoding and
that gets written out as utf8.

what exactly is the tocline when it's read in with that \xe9 and \xed
in the string? A latin1 encoded string?
Is my method the right way to write such a line out to a file with

If I read in the latin1 file using
codecs.open(filename,encoding='latin1') and write out the utf8 file
opening with
codecs.open(othername,encoding='utf8'), would I no longer have a
problem --  I could just read in latin1 and write out utf8 with no
more worries about

p.s. sorry if you see this twice--my newsreader is flaky right now.

More information about the Python-list mailing list