[Python-3000] Pre-PEP: Easy Text File Decoding

Paul Prescod paul at prescod.net
Tue Sep 12 00:09:02 CEST 2006


I think that the basis of your concern is a misunderstanding of the
proposal (at least as documented in the PEP).

On 9/11/06, Jim Jewett <jimjjewett at gmail.com> wrote:
> On 9/10/06, Paul Prescod <paul at prescod.net> wrote:
>
> > encodingdetection.setdefaultfileencoding
> > encodingdetection. registerencodingdetector
> > encodingdetection.guessfileencoding(filename)
> > encodingdetection.guessfileencoding(bytestream)

Those last two are helper functions exposing the functionality of the
"guess" keyword through a different means.

> This demonstrates two of problems with requiring an explicit decision.
>
> (1)  You still won't actually get one; you'll just lose the
> information that it wasn't even considered.

I frankly don't think that that makes any sense. If there is a default
then how can I know whether someone thought about it and decided to
use the default or did not think it through and decided to use the
default.

> (2)  You'll add so much boilerplate that you invite other bugs.
>
>
> Suddenly,
>
>     >>> f=open("runlist.txt")
>
> turns into something more like
>
>     >>> import encodingdetection
>     ...
>     >>> f=open("runlist.txt",
> encoding=encodingdetection.guessfileencoding("runlast.txt"))

No, that was never the proposal. The proposal is:

f = open("runlist.txt", "guess")

> "It doesn't really matter what encoding is used; if there is anything
> outside of ASCII, it is because the user put it there, and all I have
> to do is copy it around unchanged."

Yes, if you are doing something utterly trivial with the text as
opposed to the normal case where you are comparing it with some other
input, combining it with some other input, putting it in a database,
serving it up over the Web etc. Even Unix "cat" would need to be
encoding aware if it were created today and designed to be i18n
friendly.

> For situations like that, if there were *ever* a reason to specify a
> particular encoding, I *still* wouldn't get it right, because it is
> something that hasn't occurred to me. I guess the explicitness means
> that the error is now my fault instead of python's, but the error is
> still there, and someone else is more reluctant to fix it.  (Well,
> this *was* an explicit choice -- maybe I had a reason?)

The documentation for the "guess" keyword will be clear that it is
NEVER the correct choice for production-quality software. That's one
of the virtues of having an explicit keyword for the quick and dirty
mode (as opposed to making it the default as you seem to wish).

> But since the encoding is mandatory, I do still have to deal with it,
> by making my code longer and uglier.  In the end, packages will end up
> distributing their own non-standard convenience wrappers, so that the
> equivalent of
>
> >>> f=open("runlist.txt")

No, I don't think they'll do that to avoid typing 7 extra characters.

 Paul Prescod


More information about the Python-3000 mailing list