[Python-Dev] Internal representation of strings and Micropython

Daniel Holth dholth at gmail.com
Thu Jun 5 20:41:28 CEST 2014


On Thu, Jun 5, 2014 at 11:59 AM, Paul Moore <p.f.moore at gmail.com> wrote:
> On 5 June 2014 14:15, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> As I've said before in other contexts, find me Windows, Mac OS X and
>> JVM developers, or educators and scientists that are as concerned by
>> the text model changes as folks that are primarily focused on Linux
>> system (including network) programming, and I'll be more willing to
>> concede the point.
>
> There is once again a strong selection bias in this discussion, by its
> very nature. People who like the new model don't have anything to
> complain about, and so are not heard.
>
> Just to support Nick's point, I for one find the Python 3 text model a
> huge benefit, both in practical terms of making my programs more
> robust, and educationally, as I have a far better understanding of
> encodings and their issues than I ever did under Python 2. Whenever a
> discussion like this occurs, I find it hard not to resent the people
> arguing that the new model should be taken away from me and replaced
> with a form of the old error-prone (for me) approach - as if it was in
> my best interests.
>
> Internal details don't bother me - using UTF8 and having indexing be
> potentially O(N) is of little relevance. But make me work with a
> string type that *doesn't* abstract a string as a sequence of Unicode
> code points and I'll get very upset.

Once you get past whether str + bytes throws an exception which seems
to be the divide most people focus on, you can discover new things
like dance-encoded strings, bytes decoded using an incorrect encoding
intended to be transcoded into the correct encoding later, surrogates
that work perfectly until .encode(), str(bytes), APIs that disagree
with you about whether the result should be str or bytes, APIs that
return either string or bytes depending on their initializers and so
on. Unicode can still be complicated in Python 3 independent of any
judgement about whether it is worse, better, or different than Python
2.


More information about the Python-Dev mailing list