[Python-ideas] Smoothing transition to Python 3

Fri Jun 3 20:41:14 EDT 2016

On Fri, Jun 3, 2016, at 20:13, Giampaolo Rodola' wrote:
> On Fri, Jun 3, 2016 at 3:17 PM, Neil Schemenauer <
> nas-pythonideas at arctrix.com> wrote:
> > - mixing of unicode strings with byte strings: decode/encode
> >   using latin-1
> >
> 
> To me this sounds more like going back to Python 2 rather than
> facilitating
> the transition, especially #3.

What about moving forward to unify the types? For example, we could go
with the Emacs way: A single string abstract type*, which is a sequence
whose elements can be Unicode characters, or raw non-ASCII bytes, all of
which are distinct from each other (Emacs' representation is to assign
the high bytes "code points" between "U+3FFF80" and "U+3FFFFF").

*Emacs has two concrete types: "byte strings" which can contain no
non-ASCII characters, and "unicode strings" which use UTF-8 (plus those
extra code points) underlying representation [various indexing
operations are O(N)]. Python would use the FSR as it is now, along with
perhaps a "byte string" type which likewise can contain only ASCII
characters and high bytes. It might be worthwhile to have a 16-bit mode
for "BMP + high bytes" which omits, say, surrogates.

With only one type, mixing them becomes the easiest thing in the world.