Try this

Steve Holden steve at holdenweb.com
Mon Sep 17 01:25:05 CEST 2007


mensanator at aol.com wrote:
> On Sep 16, 5:28?pm, John Machin <sjmac... at lexicon.net> wrote:
>> On Sep 17, 7:54 am, "mensana... at aol.com" <mensana... at aol.com> wrote:
>>
>>
>>
>>
>>
>>> On Sep 16, 2:22?pm, Steve Holden <st... at holdenweb.com> wrote:
>>>> mensana... at aol.com wrote:
>>>>> On Sep 16, 1:10?pm, Dennis Lee Bieber <wlfr... at ix.netcom.com> wrote:
>>>>>> On Sun, 16 Sep 2007 01:46:34 -0700, GeorgeRXZ <george... at gmail.com>
>>>>>> declaimed the following in comp.lang.python:
>>>>>>> Then Open the Notepad and type the following sentence, and save the
>>>>>>> file and close the notepad. Now reopen the file and you will find out
>>>>>>> that, Notepad is not able to save the following text line.
>>>>>>> Well you are speed
>>>>>>> This occurs not only with above sentence but any sentence that has
>>>>>>> 4 3 3 5 (sequence of characters: Well=4 you=3 are=3 speed=5)
>>>>>>         I tried. I also opened the saved file in SciTE...
>>>>>> And the text WAS there...
>>>>>>         It is Notepad that can not properly render what it,
>>>>>> itself, saved.
>>>>> C:\Documents and Settings\mensanator\My Documents>type huh.txt
>>>>> Well you are speed
>>>>> Yes, file was saved correctly.
>>>>> But reopening it shows 9 unprintable characters.
>>>>> If I copy those to a new file (huh1.txt):
>>>>> C:\Documents and Settings\mensanator\My Documents>type huh1.txt
>>>>> ?????????
>>>>> But wait...the new file is 20 characters, not 9.
>>>>> 09/16/2007  01:44 PM                18 huh.txt
>>>>> 09/16/2007  01:54 PM                20 huh1.txt
>>>>> C:\Documents and Settings\mensanator\My Documents>dump huh.txt
>>>>> huh.txt:
>>>>> 00000000  5765 6c6c 2079 6f75 2061 7265 2073 7065 Well you are spe
>>>>> 00000010  6564                                    ed
>>>>> Here's what it's actually doing:
>>>>> C:\Documents and Settings\mensanator\My Documents>dump huh1.txt
>>>>> huh1.txt:
>>>>> 00000000  fffe 5765 6c6c 2079 6f75 2061 7265 2073 .~Well you are s
>>>>> 00000010  7065 6564                               peed
>>>> One word: Unicode.
>>>> The "open" and "save" dialogs allow you to specify an encoding.
>>> And the encoding specified was ANSI.
>>>> If you
>>>> specify Unicode the you will get what you see above.
>>> And if you specify ANSI _before_ you click the file name,
>>> the specification switches to Unicode and has to then
>>> be manually switched back to ANSI.
>>>> If you specify ANSI
>>>> you will get the text you entered.
>>> It's still a bug in the "open" dialog.
>> It's more like a bug/feature in its encoding detector.
> 
> It is NOT a feature. If I save something as ANSI,
> there is no excuse for it not to re-open in ANSI.
> 
>> I can get it to
>> switch to Unicode only if there's an even number of characters AND the
>> line is NOT terminated by CRLF -- add/remove one alpha character, or
>> hit the enter key at the end of the line, and it won't detect it as
>> Unicode when you open it again.
>>
>> You only get the BOM (0xfffe) if you are silly enough to save it while
>> it's open in Unicode mode.
> 
> That was a test. I wasn't so stupid as to save
> to the original file, but to make a copy.
> 
>>
>>
>>>> By the way, this has precisely what to do with Python?
>>> I've been known to use Notepad to create Python
>>> source code.
>> Your source code would have to be trivially short to trigger the
>> strange behaviour.
> 
> Makes you wonder what other edge cases aren't
> handled properly.
> 
> Makes you wonder why Microsoft doesn't employ
> professional programmers.
> 
Makes *me* wonder why you haven't got better things to do with your time.

regards
  Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC/Ltd           http://www.holdenweb.com
Skype: holdenweb      http://del.icio.us/steve.holden

Sorry, the dog ate my .sigline




More information about the Python-list mailing list