[Python-3000] Pre-PEP: Easy Text File Decoding

Antoine Pitrou solipsis at pitrou.net
Sun Sep 10 12:31:24 CEST 2006

Le samedi 09 septembre 2006 à 20:29 -0700, Paul Prescod a écrit :
> The type could be a true encoding or one of a small set of additional
> symbolic values. The two main symbolic values are:

Actually your proposal has three ;)

> For example, a Japanese school teacher using Windows might default
> "site" to Shift-JIS.

I think a Japanese school teacher using Windows shouldn't have to
configure anything specifically in Python, encoding-wise. 
I've never seen a tool (e.g. text editor) refuse to work before you had
explicitly configured an encoding *for the tool*. Those tools either
choose system-wide default aka "locale" (if they want to play fair with
other apps) or their own (if they think utf-8 is the future).

I see two cases where refusing to use a default is even more unhelpful:
- on the growing number of systems which have utf-8 as default
- when the programmer simply wants to open a pure-ascii text file (e.g.
configuration file), and opening it as text allows him to read it
line-by-line, or use whatever other facilities text files provide that
binary files don't

So, here is an alternative proposal :
Make it so that textfile() doesn't recognize system-wide defaults (as in
your proposal), but also provide autotextfile() which would recognize
those defaults (with a by_content=False optional argument to enable
content-based guessing).

textfile() being clearly marked for use by large well thought-out
applications, and autotextfile() for small scripts and the like.
Different names make it clear that they are for different uses, and
allow to spot them easily when looking at source code (either by a human
reader or a quality measurement tool).



More information about the Python-3000 mailing list