need help understanding: converting text to binary
Chris Angelico
rosuav at gmail.com
Mon Apr 22 21:17:49 EDT 2019
On Tue, Apr 23, 2019 at 10:58 AM Eli the Bearded <*@eli.users.panix.com> wrote:
>
> Here's some code I wrote today:
>
> ------ cut here 8< ------
> HEXCHARS = (b'0', b'1', b'2', b'3', b'4', b'5', b'6', b'7', b'8', b'9',
> b'A', b'B', b'C', b'D', b'E', b'F',
> b'a', b'b', b'c', b'd', b'e', b'f')
>
>
> # decode a single hex digit
> def hord(c):
> c = ord(c)
> if c >= ord(b'a'):
> return c - ord(b'a') + 10
> elif c >= ord(b'A'):
> return c - ord(b'a') + 10
> else:
> return c - ord(b'0')
>
>
> # decode quoted printable, specifically the MIME-encoded words
> # variant which is slightly different than the body text variant
> def decodeqp(v):
Have you checked to see if Python can already do this? You mention
quopri from the stdlib (that's
https://docs.python.org/3/library/quopri.html for those following
along at home), so I'm curious which ways your code differs from that;
it might be that the easiest way is to use that module, and then add
some extra framing around the outside of it.
> But the bytes() thing is really confusing me. Most of this is translated
> from C code I wrote some time ago. I'm new to python and did spend some
> time reading:
>
> https://docs.python.org/3/library/stdtypes.html#bytes-objects
>
> Why does "bytes((integertype,))" work? I'll freely admit to stealing
> that trick from /usr/lib/python3.5/quopri.py on my system. (Why am I not
> using quopri? Well, (a) I want to learn, (b) it decodes to a file
> not a variable, (c) I want different error handling.)
The bytes constructor will take a sequence of integers and return a
byte string with those values. For instance, bytes([1, 2, 3, 4, 5]) is
the same as bytes(range(1, 6)) and is the same as b"\1\2\3\4\5". In
this case, the iterable is a tuple of one byte value.
> Is there a more python-esque way to convert what should be plain ascii
What does "plain ASCII" actually mean, though?
> into a binary "bytes" object? In the use case I'm working towards the
> charset will not be ascii or UTF-8 all of the time, and the charset
> isn't the responsibility of the python code. Think "decode this if
> charset matches user-specified value, then output in that same charset;
> otherwise do nothing."
I'm not sure what this means, but I would strongly recommend just
encoding and decoding regardless. Use text internally and bytes at the
outside.
ChrisA
More information about the Python-list
mailing list