unicode text file
Mark Tolonen
metolone+gmane at gmail.com
Sun Sep 27 10:39:26 EDT 2009
"Junaid" <junu.pv at gmail.com> wrote in message
news:0267bef9-9548-4c43-bcdf-b624350c8f15 at p23g2000vbl.googlegroups.com...
>I want to do replacements in a utf-8 text file. example
>
> f=open("test.txt","r") #this file is uft-8 encoded
> raw = f.read()
> txt = raw.decode("utf-8")
You can use the codecs module to open and decode the file in one step
>
> txt.replace{'English', ur'ഇംഗ്ലീഷ്') #replacing raw unicode string,
> but not working
The replace method returns the altered string. It does not modify it in
place. You also should use Unicode strings for both the arguments (although
it doesn't matter in this case). Using a raw Unicode string is also
unnecessary in this case.
txt = txt.replace(u'English', u'ഇംഗ്ലീഷ്')
> f.write(txt)
You opened the file for writing. You'll need to close the file and reopen
it for writing.
> f.close()
> f.flush()
Flush isn't required. close() will flush.
Also to have text like ഇംഗ്ലീഷ് in a file you'll need to declare the
encoding of the file at the top and be sure to actually save the file in the
encoding.
In summary:
# coding: utf-8
import codecs
f = codecs.open('test.txt','r','utf-8')
txt = f.read()
txt = txt.replace(u'English', u'ഇംഗ്ലീഷ്')
f.close()
f = codecs.open('test.txt','w','utf-8')
f.write(txt)
f.close()
-Mark
More information about the Python-list
mailing list