[Python-3000] Pre-PEP: Easy Text File Decoding
Paul Prescod
paul at prescod.net
Mon Sep 11 15:58:42 CEST 2006
On 9/11/06, Oleg Broytmann <phd at phd.pp.ru> wrote:
>
> On Sun, Sep 10, 2006 at 12:02:44PM -0700, Paul Prescod wrote:
> > * Eastern Unix/Linux users using UTF-8 apps like gedit or apps "saving
> as"
> > UTF-8
>
> Finally I've got the definitive answer for "is Russia Europe or Asia?"
> It is an Eastern country! At last! ;)
For these purposes, Russia is European, isn't it? Russian text can be
subsumed by UTF-8 with relatively minor expansion, right? If so, then I
would guess that UTF-8 would replace KOI8-R and iso8859-? for Russian
eventually.
> Maybe the guessing algorithm should read the WHOLE FILE.
>
> Zen: "In the face of ambiguity, refuse the temptation to guess."
>
> Unfortunately this contradicts to not the only idea how much to read
> but the to whole idea to guess encoding. So may be we are going in the
> wrong direction. IMHO the right direction is to include a guessing script
> in Tools directory.
That was the position I started with. Guido wanted a guessing mode. So I
designed what seemed to me to be the least dangerous guessing mode possible:
1. Off by default.
2. Turned on by the keyword "guess".
3. Decodes the full text to check for encoding correctness.
Given these safeguards, I think that the feature is not only safe enough but
also helpful.
Moving it to a script would not meet the central goal that it be easily
usable by people who do not know much about encodings or Python.
Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060911/9b38a31b/attachment.html
More information about the Python-3000
mailing list