Python 3.0 automatic decoding of UTF16
sjmachin at lexicon.net
Sat Dec 6 01:26:36 CET 2008
On Dec 6, 10:35 am, Steven D'Aprano <st... at REMOVE-THIS-
> On Fri, 05 Dec 2008 12:00:59 -0700, Joe Strout wrote:
> >> So UTF-16 has an explicit EOF marker within the text?
> > No, it does not. I don't know what Terry's thinking of there, but text
> > files do not have any EOF marker. They start at the beginning
> > (sometimes including a byte-order mark), and go till the end of the
> > file, period.
> Windows text files still interpret ctrl-Z as EOF, or at least Windows XP
> does. Vista, who knows?
This applies only to files being read in an 8-bit text mode. It is
inherited from MS-DOS, which followed the CP/M convention, which was
necessary because CP/M's file system recorded only the physical file
length in 128-byte sectors, not the logical length. It is likely to
continue in perpetuity, just as standard railway gauge is (allegedly)
based on the axle-length of Roman chariots.
None of this is relevant to the OP's problem; his file appears to have
been truncated rather than having spurious data appended to it.
More information about the Python-list