[Tutor] Help understanding base64 decoding

Peter Otten __peter__ at web.de
Thu Sep 13 02:57:13 EDT 2018


Ryan Smith wrote:

> Hello All,
> 
> I am currently working on a small utility that finds any base64
> encoded strings in files and decodes them. I am having issue
> understanding how the Base64 module actually works. The regular
> expression that I am using correctly matches on the encoded strings. I
> simply want to be able to convert the match of the encoded ascii
> string to it's decoded ascii equivalent. For example the base64
> encoded ascii string 'UwB5AHMAdABlAG0ALgBkAGwAbAA=' will decode to
> 'System.dll' if I use an online base64 decoder. However I get a
> completely different output when trying to codify this using python
> 3.6.5:
> 
>>>>import base64
>>>>import binascii
> 
>>>>test_str = 'UwB5AHMAdABlAG0ALgBkAGwAbAA='
>>>> base64.b64decode(test_str)
> b'S\x00y\x00s\x00t\x00e\x00m\x00.\x00d\x00l\x00l\x00'
> 
>>>>temp = base64.b64decode(test_str)
>>>>binascii.b2a_base64(temp)
> b'UwB5AHMAdABlAG0ALgBkAGwAbAA=\n'
> 
> I understand that when decoding and encoding you have to use bytes
> objects but what I don't understand is why I can't get the proper
> conversion of the original ascii string. Can someone please point me
> in the right direction?

Look closely at the odd bytes in 

> b'S\x00y\x00s\x00t\x00e\x00m\x00.\x00d\x00l\x00l\x00'

or just do

>>> b'S\x00y\x00s\x00t\x00e\x00m\x00.\x00d\x00l\x00l\x00'[::2]
b'System.dll'

The even bytes are all NUL:

>>> b'S\x00y\x00s\x00t\x00e\x00m\x00.\x00d\x00l\x00l\x00'[1::2]
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

This means that your byte string already *is* the original string, encoded 
as UTF-16. You can convert it into a string with

>>> b'S\x00y\x00s\x00t\x00e\x00m\x00.\x00d\x00l\x00l\x00'.decode("utf-16")
'System.dll'

which will handle non-ascii characters correctly, too.



More information about the Tutor mailing list