[Tutor] Help understanding base64 decoding

Ryan Smith ryan at allwegot.net
Thu Sep 13 08:23:10 EDT 2018


Hi Peter,

Thank you for the explanation! I have been banging my head around this
for almost two days. I'm still getting familiar with all of the
different encodings at play. For example the way I currently
understand things is that python supports unicode which ultimately
defaults to being encoded in UTF-8. Hence I'm guessing is  the reason
for converting strings to a bytes object in the first place. Again
thank you for the assistance!

Ryan

On Thu, Sep 13, 2018 at 2:57 AM, Peter Otten <__peter__ at web.de> wrote:
> Ryan Smith wrote:
>
>> Hello All,
>>
>> I am currently working on a small utility that finds any base64
>> encoded strings in files and decodes them. I am having issue
>> understanding how the Base64 module actually works. The regular
>> expression that I am using correctly matches on the encoded strings. I
>> simply want to be able to convert the match of the encoded ascii
>> string to it's decoded ascii equivalent. For example the base64
>> encoded ascii string 'UwB5AHMAdABlAG0ALgBkAGwAbAA=' will decode to
>> 'System.dll' if I use an online base64 decoder. However I get a
>> completely different output when trying to codify this using python
>> 3.6.5:
>>
>>>>>import base64
>>>>>import binascii
>>
>>>>>test_str = 'UwB5AHMAdABlAG0ALgBkAGwAbAA='
>>>>> base64.b64decode(test_str)
>> b'S\x00y\x00s\x00t\x00e\x00m\x00.\x00d\x00l\x00l\x00'
>>
>>>>>temp = base64.b64decode(test_str)
>>>>>binascii.b2a_base64(temp)
>> b'UwB5AHMAdABlAG0ALgBkAGwAbAA=\n'
>>
>> I understand that when decoding and encoding you have to use bytes
>> objects but what I don't understand is why I can't get the proper
>> conversion of the original ascii string. Can someone please point me
>> in the right direction?
>
> Look closely at the odd bytes in
>
>> b'S\x00y\x00s\x00t\x00e\x00m\x00.\x00d\x00l\x00l\x00'
>
> or just do
>
>>>> b'S\x00y\x00s\x00t\x00e\x00m\x00.\x00d\x00l\x00l\x00'[::2]
> b'System.dll'
>
> The even bytes are all NUL:
>
>>>> b'S\x00y\x00s\x00t\x00e\x00m\x00.\x00d\x00l\x00l\x00'[1::2]
> b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
>
> This means that your byte string already *is* the original string, encoded
> as UTF-16. You can convert it into a string with
>
>>>> b'S\x00y\x00s\x00t\x00e\x00m\x00.\x00d\x00l\x00l\x00'.decode("utf-16")
> 'System.dll'
>
> which will handle non-ascii characters correctly, too.
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor


More information about the Tutor mailing list