Michael Torrie torriem at
Mon Jun 13 11:15:05 EDT 2016

On 06/12/2016 11:16 PM, Steven D'Aprano wrote:
> "Safe to transmit in text protocols" surely should mean "any Unicode code
> point", since all of Unicode is text. What's so special about the base64
> ones?
> Well, that depends on your context. For somebody who cares about sending
> bits over a physical wire, their idea of "text" is not Unicode, but a
> subset of ASCII *bytes*.

Not necessarily.  The encoding of the text containing the results of the
base64 encoding does not matter provided the letters and numbers used in
base64 can be represented.  I could take the text and paste it in an
email and send it via UTF-8, or UTF-16.  Won't make a difference
provided the decoder can deal decode that specific unicode encoding.
The other end could even cut and paste the base64 letters and numbers
out of his email body and paste it into a decoder. How the letters and
numbers got to him is immaterial and irrelevant.

Sure in the context of email base64 data is usually sent using UTF-8
encoding these days.  But there's no requirement that base64 data always
has to be encoded in ASCII, UTF-8, or LATIN1.

> The end result is that after you've base64ed your "binary" data, to
> get "text" data, what are you going to do with is? Treat it as Unicode code
> points? Probably not. 

Sure. Why not?  Write it to a text file.  Put it in an email.  Place it
in a word doc.  Print it.  Whatever.

> Squirt it down a wire as bytes? Almost certainly.

Sometimes yes.  But not always.

> Looking at this from the high-level perspective of Python, that makes it
> conceptually bytes not text.

I don't see how this is always the case.  From a high-level python
perspective it's definitely text.  That's the whole point of base64!

More information about the Python-list mailing list