[Python-3000] Pre-PEP: Easy Text File Decoding

Paul Prescod paul at prescod.net
Mon Sep 11 15:58:42 CEST 2006


On 9/11/06, Oleg Broytmann <phd at phd.pp.ru> wrote:
>
> On Sun, Sep 10, 2006 at 12:02:44PM -0700, Paul Prescod wrote:
> > * Eastern Unix/Linux users using UTF-8 apps like gedit or apps "saving
> as"
> > UTF-8
>
>    Finally I've got the definitive answer for "is Russia Europe or Asia?"
> It is an Eastern country! At last! ;)


For these purposes, Russia is European, isn't it? Russian text can be
subsumed by UTF-8 with relatively minor expansion, right? If so, then I
would guess that UTF-8 would replace KOI8-R and iso8859-? for Russian
eventually.

> Maybe the guessing algorithm should read the WHOLE FILE.
>
>    Zen: "In the face of ambiguity, refuse the temptation to guess."
>
>    Unfortunately this contradicts to not the only idea how much to read
> but the to whole idea to guess encoding. So may be we are going in the
> wrong direction. IMHO the right direction is to include a guessing script
> in Tools directory.


That was the position I started with. Guido wanted a guessing mode. So I
designed what seemed to me to be the least dangerous guessing mode possible:

 1. Off by default.
 2. Turned on by the keyword "guess".
 3. Decodes the full text to check for encoding correctness.

Given these safeguards, I think that the feature is not only safe enough but
also helpful.

Moving it to a script would not meet the central goal that it be easily
usable by people who do not know much about encodings or Python.

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060911/9b38a31b/attachment.html 


More information about the Python-3000 mailing list