Cult-like behaviour [was Re: Kindness]

Terry Reedy tjreedy at udel.edu
Sun Jul 15 16:06:35 EDT 2018


On 7/15/2018 7:37 AM, Marko Rauhamaa wrote:

> One of the classic Unix and Internet tenets is that text is bytes is
> text.

Tenets of a faith may be wrong ;-).  An informatic paradigm from more 
than 45 years ago may be outdated and in need of revision.

On byte storage and on the Internet, **everything** is (encoded) bytes, 
so saying 'text is bytes' says nothing because it is trivially true.  On 
the other hand, 'bytes is text' is wrong unless one uses a character 
encoding that assigns a visible character (including <space>) to every 
byte.  I believe both PCs and Macs had 1 or more such encodings.  (I am 
only uncertain as to whether b'\x00' was mapped.)

Images are bytes as much as text is.  I suggest that 'bytes is image' is 
more true than 'bytes is text'.  Every byte can be mapped, for instance, 
into an 8 x 1 or 1 x 8 pixel image after deciding which end gets the 
high and low bits.  Bit mapping is likely older than Unix.  Bar codes 
and QR codes are commonplace as international machine-readable images of 
bytes.

In a context where 'everything is bytes', then 'bytes is everything' or 
'bytes can be anything' are the proper reverses.

> Of course, much of it was naïve, but UTF-8 has miraculously given
> it a new life.  

UTF-8 makes 'bytes is text' even less true.  Not only are some leading 
bytes not text, but some byte sequences are illegal.  Bytes are not 
UTF-8 text.  As n increases, the probability that a string of n random 
bytes will be utf-8 text approaches 0 faster than interpreting the same 
bytes as Latin1.

-- 
Terry Jan Reedy




More information about the Python-list mailing list