[Python-Dev] Improve open() to support reading file starting with an unicode BOM

"Martin v. Löwis" martin at v.loewis.de
Sat Jan 9 21:09:27 CET 2010


Antoine Pitrou wrote:
> Walter Dörwald <walter <at> livinglogic.de> writes:
>> On the surface this looks like there's an encoding named "BOM", but 
>> looking at your patch I found that the check is still done in 
>> TextIOWrapper. IMHO the best approach would to the implement a *real* 
>> codec named "BOM" (or "sniff"). This doesn't require *any* changes to 
>> the IO library. It could even be developed as a standalone project and 
>> published in the Cheeseshop.
> 
> Sorry but this is missing the point. The point here is to improve the open()
> function. I'm sure people who know about encodings are able to install the
> chardet library or even whip up their own BOM detection routine...

How does the requirement that it be implemented as a codec miss the
point?

FWIW, I agree with Walter that if it is provided through the encoding=
argument, it should be a codec. If it is built into the open function
(for whatever reason), it must be provided by some other parameter.

I do see the point that it becomes available to end users only when
released as part of Python. However, this *also* means that applications
won't be using it for another three years or so, since they'll have to
support older Python versions as well (unless it is integrated in the
case where no encoding is specified).

Regards,
Martin



More information about the Python-Dev mailing list