[Tutor] Help understanding base64 decoding
Peter Otten
__peter__ at web.de
Thu Sep 13 02:57:13 EDT 2018
Ryan Smith wrote:
> Hello All,
>
> I am currently working on a small utility that finds any base64
> encoded strings in files and decodes them. I am having issue
> understanding how the Base64 module actually works. The regular
> expression that I am using correctly matches on the encoded strings. I
> simply want to be able to convert the match of the encoded ascii
> string to it's decoded ascii equivalent. For example the base64
> encoded ascii string 'UwB5AHMAdABlAG0ALgBkAGwAbAA=' will decode to
> 'System.dll' if I use an online base64 decoder. However I get a
> completely different output when trying to codify this using python
> 3.6.5:
>
>>>>import base64
>>>>import binascii
>
>>>>test_str = 'UwB5AHMAdABlAG0ALgBkAGwAbAA='
>>>> base64.b64decode(test_str)
> b'S\x00y\x00s\x00t\x00e\x00m\x00.\x00d\x00l\x00l\x00'
>
>>>>temp = base64.b64decode(test_str)
>>>>binascii.b2a_base64(temp)
> b'UwB5AHMAdABlAG0ALgBkAGwAbAA=\n'
>
> I understand that when decoding and encoding you have to use bytes
> objects but what I don't understand is why I can't get the proper
> conversion of the original ascii string. Can someone please point me
> in the right direction?
Look closely at the odd bytes in
> b'S\x00y\x00s\x00t\x00e\x00m\x00.\x00d\x00l\x00l\x00'
or just do
>>> b'S\x00y\x00s\x00t\x00e\x00m\x00.\x00d\x00l\x00l\x00'[::2]
b'System.dll'
The even bytes are all NUL:
>>> b'S\x00y\x00s\x00t\x00e\x00m\x00.\x00d\x00l\x00l\x00'[1::2]
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
This means that your byte string already *is* the original string, encoded
as UTF-16. You can convert it into a string with
>>> b'S\x00y\x00s\x00t\x00e\x00m\x00.\x00d\x00l\x00l\x00'.decode("utf-16")
'System.dll'
which will handle non-ascii characters correctly, too.
More information about the Tutor
mailing list