[Python-Dev] Improve open() to support reading file starting with an unicode BOM

Victor Stinner victor.stinner at haypocalc.com
Fri Jan 8 23:10:32 CET 2010


Le vendredi 08 janvier 2010 22:40:47, Eric Smith a écrit :
> >> Shouldn't this encoding guessing be a separate function that you call
> >> on either a file or a seekable stream ?
> >>
> >> After all, detecting encodings is just as useful to have for non-file
> >> streams.
> >
> > Other stream sources typically have out-of-band ways to signal the
> > encoding:  only when reading from the filesystem do we pretty much
> > *have* to guess, and in that case the BOM / signature is the best
> > heuristic we have.  Also, some non-file streams are not seekable, and so
> > can't be guessed via a pre-pass.
> 
> But what if the file were in (for example) a zip file? I think you
> definitely want to have access to this functionality outside of open().

FYI my patch (encoding="BOM") is implemented in TextIOWrapper, and 
TextIOWrapper takes a binary stream as input, not a filename.

-- 
Victor Stinner
http://www.haypocalc.com/



More information about the Python-Dev mailing list