remove BOM from string read from utf-8 file

Piet van Oostrum piet at
Fri Feb 27 15:51:35 CET 2004

>>>>> "Achim Domma" <domma at> (AD) wrote:

AD> Hi,
AD> I read some text from a utf-8 encoded text file like this:

AD> text ='example.txt','r','utf8').read()

AD> If I pass this text to a COM object, I can see that there is still the BOM
AD> in the file, which marks the file as utf-8. Simply removing the first
AD> character in the string is not ok, because the BOM is optional. So I tried
AD> something like this:

The BOM is in the file, but not in the string 'text'
text is a unicode string which consists of Unicode characters and the BOM
is not a Unicode character.

Check text[0] and len(text) to verify.

Moreover BOM_UTF8 is a (non-ASCII) byte string, not a Unicode string, that
is the reason for the complaint.
Piet van Oostrum <piet at>
Private email: P.van.Oostrum at

More information about the Python-list mailing list