[Tutor] appending to a utf-16 encoded text file

Mark Tolonen metolone+gmane at gmail.com
Wed Oct 22 04:05:44 CEST 2008


"Tim Golden" <mail at timgolden.me.uk> wrote in message 
news:48FDF742.5080907 at timgolden.me.uk...
> Tim Brown wrote:
>> Hi,
>> I'm trying to create and append unicode strings to a utf-16 text file.
>> The best I could come up with was to use codecs.open() with an encoding 
>> of 'utf-16' but when I do an append I get another UTF16 BOM put into the 
>> file which other programs do not expect to see :-(
>> Is there some way to stop codecs from doing this or is there a better
>> way to create and add data to a utf-16 text file?
>
>
> Well, there's nothing to stop you opening it "raw", as it were,
> and just appending unicode encoded as utf16.
>
> <code>
> s = u"The cat sat on the mat"
> f = open ("utf16.txt", "wb")
> for word in s.split ():
>  f.write (word.encode ("utf16") + " ")
>
> f.close ()
>
> </code>
>
> TJG

Result: The@揾愀琀 sat@濾渀 the@淾愀琀 

word.encode('utf16') adds a BOM every time, and the space wasn't encoded.

utf-16-le and utf-16-be don't add the BOM.  This works:

import codecs
s = u"The cat sat on the mat"
f = codecs.open("utf16.txt","wb","utf-16-le")
f.write(u'\ufeff') # if you want the BOM
for word in s.split ():
    f.write (word + u' ')
f.close()

-Mark




More information about the Tutor mailing list