Guessing the encoding from a BOM

Tim Chase python.list at tim.thechases.com
Thu Jan 16 13:50:12 EST 2014


On 2014-01-17 05:06, Chris Angelico wrote:
> > You might want to add the utf8 bom too: '\xEF\xBB\xBF'.  
> 
> I'd actually rather not. It would tempt people to pollute UTF-8
> files with a BOM, which is not necessary unless you are MS Notepad.

If the intent is to just sniff and parse the file accordingly, I get
enough of these junk UTF-8 BOMs at $DAY_JOB that I've had to create
utility-openers much like Steven is doing here.  It's particularly
problematic for me in combination with csv.DictReader, where I go
looking for $COLUMN_NAME and get KeyError exceptions because it wants
me to ask for $UTF_BOM+$COLUMN_NAME for the first column.

-tkc






More information about the Python-list mailing list