[Python-Dev] PEP 393 Summer of Code Project

Thu Sep 1 09:59:19 CEST 2011

Glenn Linderman writes:

 > We can either artificially constrain ourselves to minor tweaks of
 > the legal conforming bytestreams,

It's not artificial.  Having the internal representation be the same
as a standard encoding is very useful for a large number of minor
usages (urgently saving buffers in a text editor that knows its
internal state is inconsistent, viewing strings in the debugger, PEP
393-style space optimization is simpler if text properties are
out-of-band, etc).

 > or we can invent a representation (whether called str or something
 > else) that is useful and efficient in practice.

Bring on the practice, then.  You say that a bit to identify lone
surrogates might be useful or efficient.  In what application?  How
much time or space does it save?  You say that a bit to cache a
property might be useful or efficient.  In what application?  Which
properties?  Are those properties a set fixed by the language, or
would some bits be available for application-specific property
caching?  How much time or space does that save?

What are the costs to applications that don't want the cache?  How is
the bit-cache affected by PEP 393?

I know of no answers (none!) to those questions that favor
introduction of a bit-cache representation now.  And those bits aren't
going anywhere; it will always be possible to use a "wide" build and
change the representation later, if the optimization is valuable
enough.  Now, I'm aware that my experience is limited to the
implementations of one general-purpose language (Emacs Lisp) of
retricted applicability.  But its primary use *is* in text processing,
so I'm moderately expert.

*Moderately*.  Always interested in learning more, though.  If you
know of relevant use cases, I'm listening!  Even if Guido doesn't find
them convincing for Python, we might find them interesting at XEmacs.