remove BOM from string read from utf-8 file
domma at procoders.net
Fri Feb 27 14:07:44 CET 2004
I read some text from a utf-8 encoded text file like this:
text = codecs.open('example.txt','r','utf8').read()
If I pass this text to a COM object, I can see that there is still the BOM
in the file, which marks the file as utf-8. Simply removing the first
character in the string is not ok, because the BOM is optional. So I tried
something like this:
print "found BOM"
but then I get the following error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0:
ordinal not in range(128)
What's the right way to remove the BOM from the string?
More information about the Python-list