[Tutor] string.count in Windows vs UNIX

UNIX Guru unixguru@mac.com
Tue, 23 Jul 2002 15:47:25 -0700


On Tuesday, July 23, 2002, at 03:12 , Tim Peters wrote:

> [UNIX Guru]
>> I've been dabbling with Python for a bit, and use the following
>> script-excerpt to go through a large file checking for specific text. 
>> On
>> UNIX it finds the correct number of occurances (6665 - double-checked
>> with grep -e "Subject: Results:" mail.file | wc -l) but when run on
>> Windows (2K/XP) it stops finding, consistently,  after 4195 occurances.
>> ...
>
>> mailfile = open('mail.file', 'r')
>
> Use 'rb' instead.  Python makes the same distinction between text-mode 
> and
> binary-mode files as C makes, since Python file objects are just a thin
> wrapper around C stdio streams (FILE*).  As a UNIX Guru <wink>, you're 
> used
> to systems where text- and binary-mode files act identically.  They 
> don't on
> Windows, and some non-printable characters in Windows text-mode files 
> have
> meta-meanings (chiefly that for first occurrence of chr(26) acts as an 
> EOF
> marker in files opened in text mode on Windows).

Yep, that appears to have done the trick. Thanks! It would never have 
dawned on me that Windows, which was generating the file that would be 
parsed, would insert non-printable characters, even though the source 
was plain-text, too.

That'll teach me to develop scripts on UNIX and deploy on Windows. :-/