[Tutor] appending to a utf-16 encoded text file
Tim Golden
mail at timgolden.me.uk
Wed Oct 22 09:52:54 CEST 2008
Mark Tolonen wrote:
>
> "Tim Golden" <mail at timgolden.me.uk> wrote in message
> news:48FDF742.5080907 at timgolden.me.uk...
>> Tim Brown wrote:
>>> Hi,
>>> I'm trying to create and append unicode strings to a utf-16 text file.
>>> The best I could come up with was to use codecs.open() with an
>>> encoding of 'utf-16' but when I do an append I get another UTF16 BOM
>>> put into the file which other programs do not expect to see :-(
>>> Is there some way to stop codecs from doing this or is there a better
>>> way to create and add data to a utf-16 text file?
>>
>>
>> Well, there's nothing to stop you opening it "raw", as it were,
>> and just appending unicode encoded as utf16.
>>
>> <code>
>> s = u"The cat sat on the mat"
>> f = open ("utf16.txt", "wb")
>> for word in s.split ():
>> f.write (word.encode ("utf16") + " ")
>>
>> f.close ()
>>
>> </code>
>>
>> TJG
>
> Result: The@揾愀琀 sat@濾渀 the@淾愀琀
>
> word.encode('utf16') adds a BOM every time, and the space wasn't encoded.
>
> utf-16-le and utf-16-be don't add the BOM. This works:
>
> import codecs
> s = u"The cat sat on the mat"
> f = codecs.open("utf16.txt","wb","utf-16-le")
> f.write(u'\ufeff') # if you want the BOM
> for word in s.split ():
> f.write (word + u' ')
> f.close()
My apologies. I did run the code before posting, but I did no
more than glance at the result in Notepad. Sorry. Should have
used le/be as you've done.
TJG
More information about the Tutor
mailing list