unicode text file

Mark Tolonen metolone+gmane at gmail.com
Sun Sep 27 10:39:26 EDT 2009


"Junaid" <junu.pv at gmail.com> wrote in message 
news:0267bef9-9548-4c43-bcdf-b624350c8f15 at p23g2000vbl.googlegroups.com...
>I want to do replacements in a utf-8 text file. example
>
> f=open("test.txt","r") #this file is uft-8 encoded
> raw = f.read()
> txt = raw.decode("utf-8")

You can use the codecs module to open and decode the file in one step

>
> txt.replace{'English', ur'ഇംഗ്ലീഷ്') #replacing raw unicode string,
> but not working

The replace method returns the altered string.  It does not modify it in 
place.  You also should use Unicode strings for both the arguments (although 
it doesn't matter in this case).  Using a raw Unicode string is also 
unnecessary in this case.

    txt = txt.replace(u'English', u'ഇംഗ്ലീഷ്')

> f.write(txt)

You opened the file for writing.  You'll need to close the file and reopen 
it for writing.

> f.close()
> f.flush()

Flush isn't required.  close() will flush.

Also to have text like ഇംഗ്ലീഷ് in a file you'll need to declare the 
encoding of the file at the top and be sure to actually save the file in the 
encoding.

In summary:

    # coding: utf-8
    import codecs
    f = codecs.open('test.txt','r','utf-8')
    txt = f.read()
    txt = txt.replace(u'English', u'ഇംഗ്ലീഷ്')
    f.close()
    f = codecs.open('test.txt','w','utf-8')
    f.write(txt)
    f.close()

-Mark





More information about the Python-list mailing list