need help understanding: converting text to binary
Eli the Bearded
* at eli.users.panix.com
Mon Apr 22 20:54:24 EDT 2019
Here's some code I wrote today:
------ cut here 8< ------
HEXCHARS = (b'0', b'1', b'2', b'3', b'4', b'5', b'6', b'7', b'8', b'9',
b'A', b'B', b'C', b'D', b'E', b'F',
b'a', b'b', b'c', b'd', b'e', b'f')
# decode a single hex digit
def hord(c):
c = ord(c)
if c >= ord(b'a'):
return c - ord(b'a') + 10
elif c >= ord(b'A'):
return c - ord(b'a') + 10
else:
return c - ord(b'0')
# decode quoted printable, specifically the MIME-encoded words
# variant which is slightly different than the body text variant
def decodeqp(v):
out = b''
state = '' # used for =XY decoding
for c in list(bytes(v,'ascii')):
c = bytes((c,))
if c == b'=':
if state == '':
state = '='
else:
raise ValueError
continue
if c == b'_': # underscore is space only for MIME words
if state == '':
out += b' '
else:
raise ValueError
continue
if c in HEXCHARS:
if state == '':
out += c
elif state == '=':
state = hord(c)
else:
state *= 16
state += hord(c)
out += bytes((state,))
state = ''
continue
if state == '':
out += c
else:
raise ValueError
continue
if state != '':
raise ValueError
return out
------ >8 cut here ------
It works, in the sense that
print(decodeqp("=21_yes"))
will output
b'! yes'
But the bytes() thing is really confusing me. Most of this is translated
from C code I wrote some time ago. I'm new to python and did spend some
time reading:
https://docs.python.org/3/library/stdtypes.html#bytes-objects
Why does "bytes((integertype,))" work? I'll freely admit to stealing
that trick from /usr/lib/python3.5/quopri.py on my system. (Why am I not
using quopri? Well, (a) I want to learn, (b) it decodes to a file
not a variable, (c) I want different error handling.)
Is there a more python-esque way to convert what should be plain ascii
into a binary "bytes" object? In the use case I'm working towards the
charset will not be ascii or UTF-8 all of the time, and the charset
isn't the responsibility of the python code. Think "decode this if
charset matches user-specified value, then output in that same charset;
otherwise do nothing."
Elijah
------
has yet to warm up to this language
More information about the Python-list
mailing list