[Python-3000] Pre-PEP: Easy Text File Decoding

Jim Jewett jimjjewett at gmail.com
Mon Sep 11 23:53:40 CEST 2006


On 9/10/06, Paul Prescod <paul at prescod.net> wrote:

> encodingdetection.setdefaultfileencoding
> encodingdetection. registerencodingdetector
> encodingdetection.guessfileencoding(filename)
> encodingdetection.guessfileencoding(bytestream)

This demonstrates two of problems with requiring an explicit decision.

(1)  You still won't actually get one; you'll just lose the
information that it wasn't even considered.

(2)  You'll add so much boilerplate that you invite other bugs.


Suddenly,

    >>> f=open("runlist.txt")

turns into something more like

    >>> import encodingdetection
    ...
    >>> f=open("runlist.txt",
encoding=encodingdetection.guessfileencoding("runlast.txt"))

I certainly wouldn't read a line like that without a good reason; I
wouldn't even notice that the encoding guess was based on a different
file.

It will be an annoying amount of typing though, during which time I'll
be thinking:

"It doesn't really matter what encoding is used; if there is anything
outside of ASCII, it is because the user put it there, and all I have
to do is copy it around unchanged."

For situations like that, if there were *ever* a reason to specify a
particular encoding, I *still* wouldn't get it right, because it is
something that hasn't occurred to me.  I guess the explicitness means
that the error is now my fault instead of python's, but the error is
still there, and someone else is more reluctant to fix it.  (Well,
this *was* an explicit choice -- maybe I had a reason?)

But since the encoding is mandatory, I do still have to deal with it,
by making my code longer and uglier.  In the end, packages will end up
distributing their own non-standard convenience wrappers, so that the
equivalent of

>>> f=open("runlist.txt")

can still be used -- but you'll have to read the whole module and the
imports (and the star-imports) to figure out what it means/whether it
is shadowed.  You can't even scan for "open" because someone may have
named their convenience wrapper get_file.  If packages A and B
disagree about the default encoding, it will be even harder to find
and fix than it is today.

-jJ


More information about the Python-3000 mailing list