On Thu, Jun 5, 2014 at 11:59 AM, Paul Moore <p.f.moore@gmail.com> wrote:
On 5 June 2014 14:15, Nick Coghlan <ncoghlan@gmail.com> wrote:
As I've said before in other contexts, find me Windows, Mac OS X and JVM developers, or educators and scientists that are as concerned by the text model changes as folks that are primarily focused on Linux system (including network) programming, and I'll be more willing to concede the point.
There is once again a strong selection bias in this discussion, by its very nature. People who like the new model don't have anything to complain about, and so are not heard.
Just to support Nick's point, I for one find the Python 3 text model a huge benefit, both in practical terms of making my programs more robust, and educationally, as I have a far better understanding of encodings and their issues than I ever did under Python 2. Whenever a discussion like this occurs, I find it hard not to resent the people arguing that the new model should be taken away from me and replaced with a form of the old error-prone (for me) approach - as if it was in my best interests.
Internal details don't bother me - using UTF8 and having indexing be potentially O(N) is of little relevance. But make me work with a string type that *doesn't* abstract a string as a sequence of Unicode code points and I'll get very upset.
Once you get past whether str + bytes throws an exception which seems to be the divide most people focus on, you can discover new things like dance-encoded strings, bytes decoded using an incorrect encoding intended to be transcoded into the correct encoding later, surrogates that work perfectly until .encode(), str(bytes), APIs that disagree with you about whether the result should be str or bytes, APIs that return either string or bytes depending on their initializers and so on. Unicode can still be complicated in Python 3 independent of any judgement about whether it is worse, better, or different than Python 2.