bad data from urllib when run from MS .bat file

John J. Lee jjl at pobox.com
Sun Sep 19 22:25:33 CEST 2004


"Stuart McGraw" <smcg4191 at frii.RimoovThisToReply.com> writes:
[...]
> 2. Create a batch file that will run test.py:
> test.bat:
> ----------------
> python test.py http://etext.lib.virginia.edu/cgi-local/breen/wwwjdic?1W%BF%A9%A4%D9%A4%EB_v1
> ----------------
> 
> 3. In a cmd.exe window run the following two commands:
>   python test.py http://etext.lib.virginia.edu/cgi-local/breen/wwwjdic?1W%BF%A9%A4%D9%A4%EB_v1 >out1.txt
>   test.bat >out2.txt
> 
> 4. out1.txt and out2.txt should be identical.  But they are not.
[...]
> Running with a debugger shows that the corruption is in the text 
> received from urllib; it is not a result of the euc-jp decoding,
> UTF-8 encoding, or writing to the output file.

Hmm...


> So it looks like some bad mojo between urllib and the Windows
> batch environment.

Just a guess, without actually bothering to think about the numerology
in detail:

test.bat:
----------------
python -u test.py http://etext.lib.virginia.edu/cgi-local/breen/wwwjdic?1W%BF%A9%A4%D9%A4%EB_v1
----------------

Note the -u switch (for 'unbuffered', but also 'um, binary mode'
<wink>).


John



More information about the Python-list mailing list