need help understanding: converting text to binary

Mon Apr 22 21:17:49 EDT 2019

On Tue, Apr 23, 2019 at 10:58 AM Eli the Bearded <*@eli.users.panix.com> wrote:
>
> Here's some code I wrote today:
>
> ------ cut here 8< ------
> HEXCHARS = (b'0', b'1', b'2', b'3', b'4', b'5', b'6', b'7', b'8', b'9',
>             b'A', b'B', b'C', b'D', b'E', b'F',
>             b'a', b'b', b'c', b'd', b'e', b'f')
>
>
> # decode a single hex digit
> def hord(c):
>     c = ord(c)
>     if c >= ord(b'a'):
>         return c - ord(b'a') + 10
>     elif c >= ord(b'A'):
>         return c - ord(b'a') + 10
>     else:
>         return c - ord(b'0')
>
>
> # decode quoted printable, specifically the MIME-encoded words
> # variant which is slightly different than the body text variant
> def decodeqp(v):

Have you checked to see if Python can already do this? You mention
quopri from the stdlib (that's
https://docs.python.org/3/library/quopri.html for those following
along at home), so I'm curious which ways your code differs from that;
it might be that the easiest way is to use that module, and then add
some extra framing around the outside of it.

> But the bytes() thing is really confusing me. Most of this is translated
> from C code I wrote some time ago. I'm new to python and did spend some
> time reading:
>
> https://docs.python.org/3/library/stdtypes.html#bytes-objects
>
> Why does "bytes((integertype,))" work? I'll freely admit to stealing
> that trick from /usr/lib/python3.5/quopri.py on my system. (Why am I not
> using quopri? Well, (a) I want to learn, (b) it decodes to a file
> not a variable, (c) I want different error handling.)

The bytes constructor will take a sequence of integers and return a
byte string with those values. For instance, bytes([1, 2, 3, 4, 5]) is
the same as bytes(range(1, 6)) and is the same as b"\1\2\3\4\5". In
this case, the iterable is a tuple of one byte value.

> Is there a more python-esque way to convert what should be plain ascii

What does "plain ASCII" actually mean, though?

> into a binary "bytes" object? In the use case I'm working towards the
> charset will not be ascii or UTF-8 all of the time, and the charset
> isn't the responsibility of the python code. Think "decode this if
> charset matches user-specified value, then output in that same charset;
> otherwise do nothing."

I'm not sure what this means, but I would strongly recommend just
encoding and decoding regardless. Use text internally and bytes at the
outside.

ChrisA