[Python-Dev] Improve open() to support reading file starting with an unicode BOM
M.-A. Lemburg
mal at egenix.com
Fri Jan 8 17:25:22 CET 2010
Guido van Rossum wrote:
> On Fri, Jan 8, 2010 at 6:34 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>> Victor Stinner <victor.stinner <at> haypocalc.com> writes:
>>>
>>> I wrote a new version of my patch (version 3):
>>>
>>> * don't change the default behaviour: use open(filename, encoding="BOM") to
>>> check the BOM is there is any
>>
>> Well, I think if we implement this the default behaviour *should* be changed.
>> It looks a bit senseless to have two different "auto-choose" options, one with
>> encoding=None and one with encoding="BOM".
>
> Well there *are* two different auto options: use the environment
> variables (LANG etc.) or inspect the contents of the file. I think it
> would be useful to have ways to specify both.
Shouldn't this encoding guessing be a separate function that you call
on either a file or a seekable stream ?
After all, detecting encodings is just as useful to have for non-file
streams. You'd then avoid having to stuff everything into
a single function call and also open up the door for more complex
application specific guess work or defaults.
The whole process would then have two steps:
1. guess encoding
import codecs
encoding = codecs.guess_file_encoding(filename)
2. open the file with the found encoding
f = open(filename, encoding=encoding)
For seekable streams f, you'd have:
1. guess encoding
import codecs
encoding = codecs.guess_stream_encoding(f)
2. wrap the stream with a reader for the found encoding
reader_class = codecs.getreader(encoding)
g = reader_class(f)
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Jan 08 2010)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
::: Try our new mxODBC.Connect Python Database Interface for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
More information about the Python-Dev
mailing list