writing \feff at the begining of a file

Nobody nobody at nowhere.com
Fri Aug 13 14:04:10 EDT 2010


On Fri, 13 Aug 2010 11:45:28 +0200, Jean-Michel Pichavant wrote:

> I'm trying to update the content of a $Microsoft$ VC2005 project files 
> using a python application.
> Since those files are XML data, I assumed I could easily do that.
> 
> My problem is that VC somehow thinks that the file is corrupted and 
> update the file like the following:
> 
> -<?xml version='1.0' encoding='UTF-8'?>
> +?<feff><?xml version="1.0" encoding="UTF-8"?>
> 
> 
> Actually, <feff> is displayed in a different color by vim, telling me 
> that this is some kind of special caracter code (I'm no familiar with 
> such thing).

U+FEFF is a "byte order mark" or BOM. Each Unicode-based encoding (UTF-8,
UTF-16, UTF-16-LE, etc) will encode it differently, so it enables a
program reading the file to determine the encoding before reading any
actual data.

> My problem is however simplier : how do I add such character at the 
> begining of the file ?
> I tried

Either:

1. Open the file as binary and write '\xef\xbb\xbf' to the file:

	f = open('foo.txt', 'wb')
	f.write('\xef\xbb\xbf')

[You can also use the constant BOM_UTF8 from the codecs module.]

2. Open the file as utf-8 and write u'\ufeff' to the file:

	import codecs
	f = codecs.open('foo.txt', 'w', 'utf-8')
	f.write(u'\ufeff')

3. Open the file as utf-8-sig and don't write anything (or write an empty
string):

	import codecs
	f = codecs.open('foo.txt', 'w', 'utf-8-sig')
	f.write('')

The utf-8-sig codec automatically writes a BOM at the beginning of the
file. It is present in Python 2.5 and later.




More information about the Python-list mailing list