codec for UTF-8 with BOM
Chris Rebert
clp2 at rebertia.com
Mon May 2 05:47:45 EDT 2011
On Mon, May 2, 2011 at 1:34 AM, Ulrich Eckhardt
<ulrich.eckhardt at dominolaser.com> wrote:
> Hi!
>
> I want to write a file starting with the BOM and using UTF-8, and stumbled
> across some problems:
>
> 1. I would have expected one of the codecs to be 'UTF-8 with BOM' or
> something like that, but I can't find the correct name. Also, I can't find a
> way to get a list of the supported codecs at all, which strikes me as odd.
If nothing else, there's
http://docs.python.org/library/codecs.html#standard-encodings
The correct name, as you found below and as is corroborated by the
webpage, seems to be "utf_8_sig":
>>> u"FOøbar".encode('utf_8_sig')
'\xef\xbb\xbfFO\xc3\xb8bar'
This could definitely be documented more straightforwardly.
<snip>
> 3. The docs mention encodings.utf_8_sig, available since 2.5, but I can't
> locate that thing there either. What's going on here?
Works for me™:
Python 2.6.6 (r266:84292, Jan 12 2011, 13:35:00)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from encodings import utf_8_sig
>>>
Cheers,
Chris
--
http://rebertia.com
More information about the Python-list
mailing list