[Python-Dev] Improve open() to support reading file starting with an unicode BOM
tseaver at palladion.com
Fri Jan 8 22:59:04 CET 2010
-----BEGIN PGP SIGNED MESSAGE-----
Eric Smith wrote:
>>> Shouldn't this encoding guessing be a separate function that you call
>>> on either a file or a seekable stream ?
>>> After all, detecting encodings is just as useful to have for non-file
>> Other stream sources typically have out-of-band ways to signal the
>> encoding: only when reading from the filesystem do we pretty much
>> *have* to guess, and in that case the BOM / signature is the best
>> heuristic we have. Also, some non-file streams are not seekable, and so
>> can't be guessed via a pre-pass.
> But what if the file were in (for example) a zip file? I think you
> definitely want to have access to this functionality outside of open().
If the application expects a possibly-BOM-signature-marked file, but you
pass it mismatched garbage:
>>> f = open('some.zip', encoding='BOM")
the error handling should be the same as if you passed any other
>>> f = open('some.zip', encoding='UTF8')
i.e., you discover the error when you try to read from the (non)encoded
stream, not when you open it.
Tres Seaver +1 540-429-0999 tseaver at palladion.com
Palladion Software "Excellence by Design" http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
-----END PGP SIGNATURE-----
More information about the Python-Dev