[Tutor] string to binary and back... Python 3

Dave Angel d at davea.name
Fri Jul 20 08:00:18 CEST 2012


You forgot to include the list.  So I'm forwarding it now, with new remarks.

On 07/20/2012 01:47 AM, wolfrage8765 at gmail.com wrote:
> On Fri, Jul 20, 2012 at 12:33 AM, Dave Angel <d at davea.name> wrote:
> 
>> On 07/19/2012 05:55 PM, Prasad, Ramit wrote:
>>>> I am not sure how to answer that question because all files are binary,
>>>> but the files that I will parse have an encoding that allows them to be
>>>> read in a non-binary output. But my program will not use the in a
>>>> non-binary way, that is why I plan to open them with the 'b' mode to
>>>> open them as binary with no encoding assumed by python. I just not have
>>>> tested this new technique that you gave me on a binary file yet as I was
>>>> still implementing it for strings.
>>> As far as I know, even in binary mode, python will convert the
>>> binary data to read and write strings. So there is no reason
>>> this technique would not work for binary. Note, I was able to use
>>> the string representation of a PDF file to write another PDF file.
>>> So you do not need to worry about the conversion of binary to strings.
>>> All you need to do is convert the string to int, encrypt, decrypt,
>>> convert back to string, and write out again.
>>>
>>> Note Python3 being Unicode might change things a bit. Not sure if
>>> you will need to convert to bytes or some_string.decode('ascii').
>>
>> In Python 3, if you open the file  with "b"  (as Jordan has said), it
>> creates a bytes object.  No use of strings needed or wanted.  And no
>> assumptions of ascii, except for the output of the % operator on a hex
>> conversion.
>>
>>
>> myfile = open(filename, "b")
>> data = myfile.read(size)
>>
>> At that point, convert it to hex with:
>>    hexdata = binascii.hexlify(data)
>> then convert that to an integer:
>>    numdata = int(hexdata, 16)
>>
>> At that point, it's ready to xor with the one-time key, which had better
>> be the appropriate size to match the data length.
>>
>> newhexdata = bytes("%x" % numdata, "ascii")
>> newdata = binascii.unhexlify(newhexdata)
>>
>> If the file is bigger than the key, you have to get a new key. If the
>> keys are chosen with a range of 2**200, then you'd read and convert the
>> file 25 bytes at a time.
>>
> Thanks I will give this a try. Can you explian a little further for me what
> exactly this:
> newhexdata = bytes("%x" % numdata, "ascii")
> line is doing? I don't quite understand the use of the "%x" % on numdata.
> 

When you see an expression you don't understand, then try to decompose it.

"%x" % numdata is just one of many ways of converting an integer into a
hex string.  The %x is a format specifier, and says to use hex.
Actually, with Python 3, the format method is supposed to be preferred,
but I've been doing it the old way to long to learn format quickly.

bytes() is how we encode a string into bytes.  Since I know all the
characters created by my %x expression will be valid ASCII, I can safely
tell it to encode it using ASCII.

This isn't the only way to solve the original problem by a long shot,
but you did say the one-time pad was a numeric int.  If it had been a
byte string, the problem would have been much simpler, though probably
not as quick.

-- 

DaveA




More information about the Tutor mailing list