[Python-Dev] Improve open() to support reading file starting with an unicode BOM
tseaver at palladion.com
Fri Jan 8 22:09:54 CET 2010
-----BEGIN PGP SIGNED MESSAGE-----
Guido van Rossum wrote:
> On Thu, Jan 7, 2010 at 10:12 PM, Tres Seaver <tseaver at palladion.com> wrote:
>> The BOM should not be seekeable if the file is opened with the proposed
>> "guess encoding from BOM" mode: it isn't properly part of the stream at
>> all in that case.
> This feels about right to me. There are still questions though:
> immediately after opening a file with a BOM, what should .tell()
> return? And regardless of that, .seek(0) should put the file in that
> same initial state.
I think the behavior should be something like:
>>> f = open('/path/to/maybe-BOM-encoded-file', 'r', encoding='BOM')
>>> f.tell() # count of unicode chars in decoded stream
>>> f.read(1) # read first unicode char decoded from stream.
In other words, the BOM is not readable / seekable at all: it is
invisible to the consumer of the decoded stream.
Tres Seaver +1 540-429-0999 tseaver at palladion.com
Palladion Software "Excellence by Design" http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
-----END PGP SIGNATURE-----
More information about the Python-Dev