python for everyday tasks

Michael Torrie torriem at
Mon Nov 25 16:11:22 CET 2013

I only respond here, as unicode in general is an important concept that
the OP will to make sure his students understand in Python, and I don't
want you to dishonestly sow the seeds of uncertainty and doubt.

On 11/25/2013 03:12 AM, wxjmfauth at wrote:
> Your paragraph is mixing different concepts.

On the contrary, it appears you are the one mixing the concepts, and
confusing a byte-encoding scheme with unicode.

In an ideal world, the programmer should not need to know or care about
what encoding scheme the language is using internally to store strings.
 And it does not matter whether the internal encoding scheme is endorsed
by the unicode commission or not, provided it can handle all the valid
unicode constructs.

A string is unicode.  Period.  Hence you must concern yourself with
encoding only when reading or writing a byte stream.

Inside the language itself, the encoding is irrelevant.  Ideally.  In
python 3.3+ anyway.  Of course reality is different in other languages
which is why programmers are used to worrying about things like exposing
surrogate pairs (as Javascript does), or having to tweak your algorithms
to deal with the fact that UTF-8 indexing is not O(1).  To claim that a
programmer has to concern himself with internal language encoding in
Python 3 is not only untrue, it's ingenuousness at best, given the OP's

> When it comes to save memory, utf-8 is the choice. It
> beats largely the FSR on the side of memory and on
> the side of performances.

So you would condemn everyone to use an O(n) encoding for a string when
FSR offers full unicode compliance that optimizes both speed and memory?

No, D'Aprano is correct.  Python 3.3+ indeed does unicode right.  It
offers O(1) slicing, is memory efficient, and never exposes things like
surrogate pairs.

> How and why? I suggest, you have a deeper understanding
> of unicode.

Indeed I'd say D'Aprano does have a deeper understanding of unicode.

> May I recall, it is one of the coding scheme endorsed
> by "" and it is intensively used. This is not
> by chance.

Yes, you keep saying this.  Have you encountered a real-world situation
where you are impacted by Python's FSR? You keep posting silly
benchmarks that prove nothing, and continue arguing, yet presumably you
are still using Python.  Why haven't you switched to Google Go or
another language that implements unicode strings in UTF-8?

More information about the Python-list mailing list