writing \feff at the begining of a file
Nobody
nobody at nowhere.com
Fri Aug 13 14:04:10 EDT 2010
On Fri, 13 Aug 2010 11:45:28 +0200, Jean-Michel Pichavant wrote:
> I'm trying to update the content of a $Microsoft$ VC2005 project files
> using a python application.
> Since those files are XML data, I assumed I could easily do that.
>
> My problem is that VC somehow thinks that the file is corrupted and
> update the file like the following:
>
> -<?xml version='1.0' encoding='UTF-8'?>
> +?<feff><?xml version="1.0" encoding="UTF-8"?>
>
>
> Actually, <feff> is displayed in a different color by vim, telling me
> that this is some kind of special caracter code (I'm no familiar with
> such thing).
U+FEFF is a "byte order mark" or BOM. Each Unicode-based encoding (UTF-8,
UTF-16, UTF-16-LE, etc) will encode it differently, so it enables a
program reading the file to determine the encoding before reading any
actual data.
> My problem is however simplier : how do I add such character at the
> begining of the file ?
> I tried
Either:
1. Open the file as binary and write '\xef\xbb\xbf' to the file:
f = open('foo.txt', 'wb')
f.write('\xef\xbb\xbf')
[You can also use the constant BOM_UTF8 from the codecs module.]
2. Open the file as utf-8 and write u'\ufeff' to the file:
import codecs
f = codecs.open('foo.txt', 'w', 'utf-8')
f.write(u'\ufeff')
3. Open the file as utf-8-sig and don't write anything (or write an empty
string):
import codecs
f = codecs.open('foo.txt', 'w', 'utf-8-sig')
f.write('')
The utf-8-sig codec automatically writes a BOM at the beginning of the
file. It is present in Python 2.5 and later.
More information about the Python-list
mailing list