Tim Golden mail at timgolden.me.uk
Thu Jan 24 12:20:31 CET 2008

[Tim Golden]
>> wxPython is trying to interpret your byte stream as a Unicode
>> text stream encoded as cp1252. But it's not, so it gives up
>> in a heap. One solution is to pass the repr of file_content.
>> Another solution is for you to prefilter the text, replacing
>> non-printables by their hex value or by some marker. Not much
>> in it, really.
>> <code>
>> import random
>> file_content = "".join (
>>    chr (random.randint (0, 255)) for i in range (1000)
>> )
>> munged_text = "".join (
>>    c if 32 <= ord (c) <= 126 else hex (ord (c)) for c in file_content
>> )
>> print repr (file_content)
>> print munged_text
>> </code>
>> TJG

[joe jacob]
> If I open an exe file in notepad, I can see some junk characters. I'm
> trying to develop a program like an editor which can encrypt a file
> opened by it. I need to open files like exe or jpeg etc in the editor
> and after encryption the encrypted data should be displayed in the
> editor. I developed the editor but when I tried to open an windows exe
> file in it I am getting the above mentioned error as the characters
> contained in it are non unicode.

Hi, Joe. I've copied this back to the list (and I encourage you to
do the same when replying) since the more people see the issue,
the more chances you've got of a useful answer!

To try to address what you're saying here: notepad.exe makes some
choice or other when confronted by the byte stream in a file which
you open. I don't know what that choice is, or how it tries to cope
with encoded unicode text, but whatever it does by the choice of
its developers. The "some junk characters" are one of several
possible representations of the not-ordinary-characters which your
file contains.

If you're writing your own editor or file display, you have to make
similar choices, which includes understanding what possibilities are
offered by the toolset you're employing -- in this case, wxPython.

<slight guesswork - I haven't checked the source>
The wxPython wx.TextCtrl expects to be fed with Unicode text. If
you pass it a string instead, it tries to decode it according to
the system's default encoding, here cp1252. If it can't, it doesn't
display "junk characters": it just gives an error.
</slight guesswork>

If you want junk characters, then you'll either have to explicitly
pass in the characters you want displayed or do the kind of thing
I suggested above to indicate their hex value, or find another
control (or an option of the wx.TextCtrl) which will attempt to do
the work for you when displaying raw bytes.

I'm afraid I'm not a wxPython expert so maybe someone else has 
suggestions. But the most important thing is that you understand
what's happening here and why you can't just say "I want it to
do what Notepad does".


More information about the Python-list mailing list