[Python-Dev] Improve open() to support reading file starting with an unicode BOM

Walter Dörwald walter at livinglogic.de
Mon Jan 11 11:37:56 CET 2010


On 09.01.10 14:38, Victor Stinner wrote:

> Le samedi 09 janvier 2010 12:18:33, Walter Dörwald a écrit :
>>> Good idea, I choosed open(filename, encoding="BOM").
>>
>> On the surface this looks like there's an encoding named "BOM", but
>> looking at your patch I found that the check is still done in
>> TextIOWrapper. IMHO the best approach would to the implement a *real*
>> codec named "BOM" (or "sniff"). This doesn't require *any* changes to
>> the IO library. It could even be developed as a standalone project and
>> published in the Cheeseshop.
> 
> Why not, this is another solution to the point (2) (Check for a BOM while 
> reading or detect it before?). Which encoding would be used if there is not 
> BOM? UTF-8 sounds like a good choice.

UTF-8 might be a good choice, are the failback could be specified in the
encoding name, i.e.

   open("file.txt", encoding="BOM-UTF-8")

falls back to UTF-8, if there's no BOM at the start.

This could be implemented via a custom codec search function (see
http://docs.python.org/library/codecs.html#codecs.register for more info).

Servus,
   Walter



More information about the Python-Dev mailing list