[Python-Dev] Improve open() to support reading file starting with an unicode BOM

Walter Dörwald walter at livinglogic.de
Mon Jan 11 13:29:04 CET 2010

On 10.01.10 00:40, "Martin v. Löwis" wrote:
>>> How does the requirement that it be implemented as a codec miss the
>>> point?
>> If we want it to be the default, it must be able to fallback on the current
>> locale-based algorithm if no BOM is found. I don't think it would be easy for a
>> codec to do that.
> Yes - however, Victor currently apparently *doesn't* want it to be the
> default, but wants the user to specify encoding="BOM". If so, it isn't
> the default, and it is easy to implement as a codec.
>>> FWIW, I agree with Walter that if it is provided through the encoding=
>>> argument, it should be a codec. If it is built into the open function
>>> (for whatever reason), it must be provided by some other parameter.
>> Why not simply encoding=None?
> I don't mind. Please re-read Walter's message - it only said that
> *if* this is activated through encoding="BOM", *then* it must be
> a codec, and could be on PyPI. I don't think Walter was talking about
> the case "it is not activated through encoding='BOM'" *at all*.

However if this autodetection feature is useful in other cases (no
matter how it's activated), it should be a codec, because as part of the
open() function it isn't reusable.

>> The default value should provide the most useful
>> behaviour possible. Forcing users to choose between two different autodetection
>> strategies (encoding=None and another one) is a little insane IMO.

And encoding="mbcs" is a third option on Windows.

> That wouldn't disturb me much. There are a lot of things in that area
> that are a little insane, starting with Microsoft Windows :-)



