Convert raw binary file to ascii

Peter Otten __peter__ at web.de
Mon Jul 27 20:07:37 CEST 2009


r2 wrote:

> On Jul 27, 9:06 am, Peter Otten <__pete... at web.de> wrote:
>> r2 wrote:
>> > I have a memory dump from a machine I am trying to analyze. I can view
>> > the file in a hex editor to see text strings in the binary code. I
>> > don't see a way to save these ascii representations of the binary, so
>> > I went digging into Python to see if there were any modules to help.
>>
>> > I found one I think might do what I want it to do - the binascii
>> > module. Can anyone describe to me how to convert a raw binary file to
>> > an ascii file using this module. I've tried? Boy, I've tried.
>>
>> That won't work because a text editor doesn't need any help to convert
>> the bytes into characters. If it expects ascii it just will be puzzled by
>> bytes that are not valid ascii. Also, it will happily display byte
>> sequences that are valid ascii, but that you as a user will see as
>> gibberish because they were meant to be binary data by the program that
>> wrote them.
>>
>> > Am I correct in assuming I can get the converted binary to ascii text
>> > I see in a hex editor using this module? I'm new to this forensics
>> > thing and it's quite possible I am mixing technical terms. I am not
>> > new to Python, however. Thanks for your help.
>>
>> Unix has the "strings" commandline tool to extract text from a binary.
>> Get hold of a copy of the MinGW tools if you are on windows.
>>
>> Peter
> 
> Okay. Thanks for the guidance. I have a machine with Linux, so I
> should be able to do what you describe above. Could Python extract the
> strings from the binary as well? Just wondering.

As a special service for you here is a naive implementation to build upon:

#!/usr/bin/env python
import sys

wanted_chars = ["\0"]*256
for i in range(32, 127):
    wanted_chars[i] = chr(i)
wanted_chars[ord("\t")] = "\t"
wanted_chars = "".join(wanted_chars)

THRESHOLD = 4

for s in sys.stdin.read().translate(wanted_chars).split("\0"):
    if len(s) >= THRESHOLD:
        print s

Peter




More information about the Python-list mailing list