[Python-Dev] Improve open() to support reading file starting with an unicode BOM

M.-A. Lemburg mal at egenix.com
Fri Jan 8 17:25:22 CET 2010

Guido van Rossum wrote:
> On Fri, Jan 8, 2010 at 6:34 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>> Victor Stinner <victor.stinner <at> haypocalc.com> writes:
>>> I wrote a new version of my patch (version 3):
>>>  * don't change the default behaviour: use open(filename, encoding="BOM") to
>>> check the BOM is there is any
>> Well, I think if we implement this the default behaviour *should* be changed.
>> It looks a bit senseless to have two different "auto-choose" options, one with
>> encoding=None and one with encoding="BOM".
> Well there *are* two different auto options: use the environment
> variables (LANG etc.) or inspect the contents of the file. I think it
> would be useful to have ways to specify both.

Shouldn't this encoding guessing be a separate function that you call
on either a file or a seekable stream ?

After all, detecting encodings is just as useful to have for non-file
streams. You'd then avoid having to stuff everything into
a single function call and also open up the door for more complex
application specific guess work or defaults.

The whole process would then have two steps:

 1. guess encoding

  import codecs
  encoding = codecs.guess_file_encoding(filename)

 2. open the file with the found encoding

  f = open(filename, encoding=encoding)

For seekable streams f, you'd have:

 1. guess encoding

  import codecs
  encoding = codecs.guess_stream_encoding(f)

 2. wrap the stream with a reader for the found encoding

  reader_class = codecs.getreader(encoding)
  g = reader_class(f)

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Jan 08 2010)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

More information about the Python-Dev mailing list