[Python-Dev] Improve open() to support reading file starting with an unicode BOM

Tres Seaver tseaver at palladion.com
Fri Jan 8 22:19:10 CET 2010

Hash: SHA1

M.-A. Lemburg wrote:

> Shouldn't this encoding guessing be a separate function that you call
> on either a file or a seekable stream ?
> After all, detecting encodings is just as useful to have for non-file
> streams.

Other stream sources typically have out-of-band ways to signal the
encoding:  only when reading from the filesystem do we pretty much
*have* to guess, and in that case the BOM / signature is the best
heuristic we have.  Also, some non-file streams are not seekable, and so
can't be guessed via a pre-pass.

> You'd then avoid having to stuff everything into
> a single function call and also open up the door for more complex
> application specific guess work or defaults.
> The whole process would then have two steps:
>  1. guess encoding
>   import codecs
>   encoding = codecs.guess_file_encoding(filename)

Filename is not enough information:  or do you mean that API to actually
open the stream?

>  2. open the file with the found encoding
>   f = open(filename, encoding=encoding)
> For seekable streams f, you'd have:
>  1. guess encoding
>   import codecs
>   encoding = codecs.guess_stream_encoding(f)
>  2. wrap the stream with a reader for the found encoding
>   reader_class = codecs.getreader(encoding)
>   g = reader_class(f)

- --
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


More information about the Python-Dev mailing list