[Python-Dev] Improve open() to support reading file starting with an unicode BOM

Tres Seaver tseaver at palladion.com
Fri Jan 8 22:59:04 CET 2010

Hash: SHA1

Eric Smith wrote:
>>> Shouldn't this encoding guessing be a separate function that you call
>>> on either a file or a seekable stream ?
>>> After all, detecting encodings is just as useful to have for non-file
>>> streams.
>> Other stream sources typically have out-of-band ways to signal the
>> encoding:  only when reading from the filesystem do we pretty much
>> *have* to guess, and in that case the BOM / signature is the best
>> heuristic we have.  Also, some non-file streams are not seekable, and so
>> can't be guessed via a pre-pass.
> But what if the file were in (for example) a zip file? I think you
> definitely want to have access to this functionality outside of open().

If the application expects a possibly-BOM-signature-marked file, but you
pass it mismatched garbage:

  >>> f = open('some.zip', encoding='BOM")

the error handling should be the same as if you passed any other
mismatched encoding:

  >>> f = open('some.zip', encoding='UTF8')

i.e., you discover the error when you try to read from the (non)encoded
stream, not when you open it.

- --
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


More information about the Python-Dev mailing list