[Python-Dev] Internal representation of strings and Micropython

Chris Angelico rosuav at gmail.com
Thu Jun 5 01:05:33 CEST 2014

On Thu, Jun 5, 2014 at 8:52 AM, Paul Sokolovsky <pmiscml at gmail.com> wrote:
> "Well" is subjective (or should be defined formally based on the
> requirements). With my MicroPython hat on, an implementation which
> receives a string, transcodes it, leading to bigger size, just to
> immediately transcode back and send out - is awful, environment
> unfriendly implementation ;-).

Be careful of confusing correctness and performance, though. The
transcoding you describe is inefficient, but (presumably) correct;
something that's fast but wrong is straight-up buggy. You can always
fix inefficiency in a later release, but buggy behaviour sometimes is
relied on (which is why ECMAScript still exposes UTF-16 to scripts,
and why Windows window messages have a WPARAM and an LPARAM, and why
Python's threading module has duplicate names for a lot of functions,
because it's just not worth changing). I'd be much more comfortable
releasing something where "everything works fine, but if you use
astral characters in your strings, memory usage blows out by a factor
of four" (or "... the len() function takes O(N) time") than one where
"everything works fine as long as you use BMP only, but SMP characters
result in tests failing".


More information about the Python-Dev mailing list