On Fri, Jun 3, 2016, at 20:13, Giampaolo Rodola' wrote:
On Fri, Jun 3, 2016 at 3:17 PM, Neil Schemenauer < nas-pythonideas@arctrix.com> wrote:
- mixing of unicode strings with byte strings: decode/encode using latin-1
To me this sounds more like going back to Python 2 rather than facilitating the transition, especially #3.
What about moving forward to unify the types? For example, we could go with the Emacs way: A single string abstract type*, which is a sequence whose elements can be Unicode characters, or raw non-ASCII bytes, all of which are distinct from each other (Emacs' representation is to assign the high bytes "code points" between "U+3FFF80" and "U+3FFFFF"). *Emacs has two concrete types: "byte strings" which can contain no non-ASCII characters, and "unicode strings" which use UTF-8 (plus those extra code points) underlying representation [various indexing operations are O(N)]. Python would use the FSR as it is now, along with perhaps a "byte string" type which likewise can contain only ASCII characters and high bytes. It might be worthwhile to have a 16-bit mode for "BMP + high bytes" which omits, say, surrogates. With only one type, mixing them becomes the easiest thing in the world.