Re: string u'hyv\xe4' to file as 'hyvä'
Alex Willmer
alex at moreati.org.uk
Mon Dec 27 04:55:47 EST 2010
On Dec 27, 6:47 am, "Mark Tolonen" <metolone+gm... at gmail.com> wrote:
> "gintare" <g.statk... at gmail.com> wrote in message
> > In file i find 'hyv\xe4' instead of hyv .
>
> When you open a file with codecs.open(), it expects Unicode strings to be
> written to the file. Don't encode them again. Also, .writelines() expects
> a list of strings. Use .write():
>
> import codecs
> item=u'hyv\xe4'
> F=codecs.open('/opt/finnish.txt', 'w+', 'utf8')
> F.write(item)
> F.close()
Gintare, Mark's code is correct. When you are reading the file back
make sure you understand what you are seeing:
>>> F2 = codecs.open('finnish.txt', 'r', 'utf8')
>>> item2 = F2.read()
>>> item2
u'hyv\xe4'
That might like as though item2 is 7 characters long, and it contains
a backslash followed by x, e, 4. However item2 is identical to item,
they both contain 4 characters - the final one being a-umlaut. Python
has shown the string using a backslash escape, because printing a non-
ascii character might fail. You can see this directly, if your Python
session is running in a terminal (or GUI) that can handle non-ascii
characters:
>>> print item2
hyvä
More information about the Python-list
mailing list