[Python-Dev] Improve open() to support reading file starting with an unicode BOM
fuzzyman at voidspace.org.uk
Sun Jan 10 00:25:18 CET 2010
On 09/01/2010 22:14, Lennart Regebro wrote:
> On Sat, Jan 9, 2010 at 21:28, Antoine Pitrou<solipsis at pitrou.net> wrote:
>> If we want it to be the default, it must be able to fallback on the current
>> locale-based algorithm if no BOM is found. I don't think it would be easy for a
>> codec to do that.
> Right. It seems like encoding=None is the right way to go there.
> encoding='BOM' would probably only work if 'BOM' isn't an encoding but
> a special tag, which is ugly.
I would rather see it as the default behavior for open without an
I know Guido has expressed a preference against this so I won't continue
to flog it.
The current behavior however is that we have a 'guessing' algorithm
based on the platform default. Currently if you open a text file in read
mode that has a UTF-8 signature, but the platform default is something
other than UTF-8, then we open the file using what is likely to be the
incorrect encoding. Looking for the signature seems to be better
behaviour in that case.
All the best,
More information about the Python-Dev