PEP 263 comments

Mon Mar 4 22:42:30 EST 2002

>>>>> "Huaiyu" == Huaiyu Zhu <huaiyu at gauss.almadan.ibm.com> writes:

    Huaiyu> To produce a kind of tar file, perhaps?

Those are tarbytes.  Explicit is better than implicit.

    Huaiyu> To send something through socket?

Verbatim low-level I/O would be permitted.  I would want them to be
restricted from being displayed literally by character-oriented
high-level functions like print, though.  If you want to use print on
them, convert them to something with an appropriate (trivial) __repr__
operation defined.

    Huaiyu> But if it in the end restricts the kind of operations that
    Huaiyu> can be performed on raw strings, it's simply untenable.

No.  Perhaps I haven't presented this well, but I don't care what you
do with blocks of raw memory, as long as they are a different type
from character strings and cannot be used as though they were
character strings.  _It's the use of character strings that I want to
restrict._  Confounding octets with characters is a horrible disease---
at XEmacs, we call it "Ebola."  It crashes editors and MUAs, it
corrupts text and destroys data, it may even cause your daughter to
decide to become a dentist and marry a marketing VP.

In the transition we have to decide what to do with the current
undifferentiated notation.  I suspect that people who use raw strings
to access data representation are far more aware of the issues than
people who think of them as "human readable text" that "just works."
Thus by the principle of "least surprise" I advocate that raw string
users use a new notation, while the (much larger and much more naive)
group of people who expect

    print 'And now for something completely different'

to do something useful would notice nothing new.

    print o'she turned me into a newt ... it got bettah'

would either error (my preference) or print something like a tuple of
integers.

    print Code86(o'spam, spam, eggs, and spam').disassemble()

would do what you think, if class Code86 were defined appropriately.

Yes, for people who use raw strings for access to representation
extensively, this would be a one-time PITA for the conversion to the
o'' notation (or whatever it turns out to be).

And it's probable that the representation in source would not be
one-to-one, ie the bytes would be encoded in uninterpreted UTF-8.
A raw string would be turned into an array of 16-bit integers by
the lexer, and the parser would then convert to an array of bytes
internally based on the o'' syntax.

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
              Don't ask how you can "do" free software business;
              ask what your business can "do for" free software.