need help understanding: converting text to binary

Chris Angelico rosuav at
Mon Apr 22 21:17:49 EDT 2019

On Tue, Apr 23, 2019 at 10:58 AM Eli the Bearded <*> wrote:
> Here's some code I wrote today:
> ------ cut here 8< ------
> HEXCHARS = (b'0', b'1', b'2', b'3', b'4', b'5', b'6', b'7', b'8', b'9',
>             b'A', b'B', b'C', b'D', b'E', b'F',
>             b'a', b'b', b'c', b'd', b'e', b'f')
> # decode a single hex digit
> def hord(c):
>     c = ord(c)
>     if c >= ord(b'a'):
>         return c - ord(b'a') + 10
>     elif c >= ord(b'A'):
>         return c - ord(b'a') + 10
>     else:
>         return c - ord(b'0')
> # decode quoted printable, specifically the MIME-encoded words
> # variant which is slightly different than the body text variant
> def decodeqp(v):

Have you checked to see if Python can already do this? You mention
quopri from the stdlib (that's for those following
along at home), so I'm curious which ways your code differs from that;
it might be that the easiest way is to use that module, and then add
some extra framing around the outside of it.

> But the bytes() thing is really confusing me. Most of this is translated
> from C code I wrote some time ago. I'm new to python and did spend some
> time reading:
> Why does "bytes((integertype,))" work? I'll freely admit to stealing
> that trick from /usr/lib/python3.5/ on my system. (Why am I not
> using quopri? Well, (a) I want to learn, (b) it decodes to a file
> not a variable, (c) I want different error handling.)

The bytes constructor will take a sequence of integers and return a
byte string with those values. For instance, bytes([1, 2, 3, 4, 5]) is
the same as bytes(range(1, 6)) and is the same as b"\1\2\3\4\5". In
this case, the iterable is a tuple of one byte value.

> Is there a more python-esque way to convert what should be plain ascii

What does "plain ASCII" actually mean, though?

> into a binary "bytes" object? In the use case I'm working towards the
> charset will not be ascii or UTF-8 all of the time, and the charset
> isn't the responsibility of the python code. Think "decode this if
> charset matches user-specified value, then output in that same charset;
> otherwise do nothing."

I'm not sure what this means, but I would strongly recommend just
encoding and decoding regardless. Use text internally and bytes at the


More information about the Python-list mailing list