[Tutor] bogus characters in a windows file

Dave Angel d at davea.name
Thu Feb 9 03:17:34 CET 2012


On 02/08/2012 08:46 PM, Garry Willgoose wrote:
> I'm reading a file output by the system utility WMIC in windows (so I can track CPU usage by process ID) and the text file WMIC outputs seems to have extra characters in I've not seen before.
>
> I use os.system('WMIC /OUTPUT:c:\cpu.txt PROCESS GET ProcessId') to output the file and parse file c:\cpu.txt

First mistake.  If you use backslash inside a python literal string, you 
need to do one of two things:
        1) use a raw string
        2) double the backslash
It so happens that \c is not a python escape sequence, so you escaped 
this particular bug.

> The first few lines of the file look like this in notepad
>
> ProcessId
> 0
> 4
> 568
> 624
> 648
>
>
> I input the data with the lines
>
> infile = open('c:\cpu.txt','r')
Same thing.  You should either make it r'c:\cpu.txt'   or   
'c:\\cpu.txt'  or  even 'c:/cpu.txt'
> infile.readline()
> infile.readline()
> infile.readline()
>
OK, so you throw away the first 3 lines of the file.

> the readline()s yield the following output
>
> '\xff\xfeP\x00r\x00o\x00c\x00e\x00s\x00s\x00I\x00d\x00 \x00 \x00\r\x00\n'
> '\x000\x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00\r\x00\n'
> '\x004\x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00\r\x00\n'
>
Now, how did you get those bytes displayed;  they've already been thrown 
out.
> Now for the first line the title 'ProcessId' is in this string but the individual characters are separated by '\x00' and at least for the first line of the file there is an extra '\xff\xfe'. For subsequent its just '\x00. Now I can just replace the '\x**' with '' but that seems a bit inelegant. I've tried various options on the open 'rU' and 'rb' but no effect.
>
> Does anybody know what the rubbish characters are and what has caused the. I'm using the latest Enthought python if that matters.
It matters, but it'd save each of us lots of trouble if you told us what 
version that was;  especially which version of Python.  The latest 
Enthought I see is called EPD 7.2.  But after 10 minutes on the site, I 
can't see whether there actually is a Python on there or not.  it seems 
to be just a bunch of libraries for Python.  But whether they're for 
CPython, IronPython, or something else, who knows?


I don't see any rubbish characters.  What I see is some unicode strings, 
displayed as though they were byte strings.  the first two bytes are the 
BOM code, commonly put at the beginning of a file encoded in UTF-16.  
The remaining pairs of bytes are UTF-16 encodings for ordinary 
characters.  Notepad would recognize the UTF-16 encoding, and display 
the characters correctly.  Perhaps you need to do the same.

You showed us a fragment of code which would throw away the first 3 
lines of the file.  You don't show us any code indicating what you mean 
by "yield the following output."

So you want us to read your mind, and tell you what's there?



-- 

DaveA



More information about the Tutor mailing list