[Python-3000] Pre-PEP: Easy Text File Decoding
paul at prescod.net
Mon Sep 11 06:31:00 CEST 2006
On 9/10/06, David Hopwood <david.nospam.hopwood at blueyonder.co.uk> wrote:
> Josiah Carlson wrote:
> ... if you think that guessing based on content is a good idea -- I don't.
> In any case, such guessing necessarily depends on the expected file format,
> so it should be done by the application itself, or by a library that knows
> more about the format.
I disagree. If a non-trivial file can be decoded as a UTF-* encoding
it probably is that encoding. I don't see how it matters whether the
file represents Latex or an .htaccess file. XML is a special case
because it is specially designed to make encoding detection (not
guessing, but detection) easy.
> If the encoding of a text stream were settable after it had been opened,
> then it would be easy for anyone to implement whatever guessing algorithm
> they needed, without having to write an encoding implementation or include
> any other support for guessing in the I/O library itself.
But this defeats the whole purpose of the PEP which is to accelerate
the writing of quick and dirty text processing scripts.
More information about the Python-3000