On 9/11/06, <b class="gmail_sendername">Oleg Broytmann</b> <<a href="mailto:firstname.lastname@example.org">email@example.com</a>> wrote:<div><span class="gmail_quote"></span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
On Sun, Sep 10, 2006 at 12:02:44PM -0700, Paul Prescod wrote:<br>> * Eastern Unix/Linux users using UTF-8 apps like gedit or apps "saving as"<br>> UTF-8<br><br> Finally I've got the definitive answer for "is Russia Europe or Asia?"
<br>It is an Eastern country! At last! ;)</blockquote><div><br>For these purposes, Russia is European, isn't it? Russian text can be subsumed by UTF-8 with relatively minor expansion, right? If so, then I would guess that UTF-8 would replace KOI8-R and iso8859-? for Russian eventually.
<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">> Maybe the guessing algorithm should read the WHOLE FILE.<br><br> Zen: "In the face of ambiguity, refuse the temptation to guess."
<br><br> Unfortunately this contradicts to not the only idea how much to read<br>but the to whole idea to guess encoding. So may be we are going in the<br>wrong direction. IMHO the right direction is to include a guessing script
<br>in Tools directory.</blockquote><div><br>That was the position I started with. Guido wanted a guessing mode. So I designed what seemed to me to be the least dangerous guessing mode possible:<br><br> 1. Off by default.
<br> 2. Turned on by the keyword "guess".<br> 3. Decodes the full text to check for encoding correctness.<br><br>Given these safeguards, I think that the feature is not only safe enough but also helpful.<br><br>
Moving it to a script would not meet the central goal that it be easily usable by people who do not know much about encodings or Python.<br><br> Paul Prescod<br><br></div></div>