[Python-Dev] Internal representation of strings and Micropython

Thu Jun 5 00:52:53 CEST 2014

Hello,

On Wed, 04 Jun 2014 18:04:52 -0400
Terry Reedy <tjreedy at udel.edu> wrote:

> On 6/4/2014 5:14 PM, Paul Sokolovsky wrote:
> 
> > That said, and unlike previous attempts to develop a small Python
> > implementations (which of course existed), we're striving to be
> > exactly a Python language implementation, not a Python-like language
> > implementation. As there's no formal, implementation-independent
> > language spec, what constitutes a compatible language
> > implementation is subject to opinions, and we welcome and
> > appreciate independent review, like this thread did.
> >
> >> Realistically, most Python code that works on Python 3.4 won't work
> >> on Micropython (for various reasons, not just the string behavior)
> >> and neither does it need to.
> >
> > That's true. However, as was said, we're striving to provide a
> > compatible implementation, and compatibility claims must be
> > validated. While we have simple "in-house" testsuite, more serious
> > compatibility validation requires running a testsuite for reference
> > implementation (CPython), and that's gradually being approached.
> 
> I would call what you are doing a 'Python 3.n subset, with

Thanks, that's what we call it ourselves in the docs linked in the
original message, and use n=4. Note that being a subset is not a design
requirement, but there's higher-priority requirement of staying lean,
so realistically uPy will always stay a subset.

> limitations', where n should be a specific number, which I would urge
> should be at least 3, if not 4 ('yield from'). To me, that would mean
> that every Micropython program (that does not use a clearly
> non-Python addon like inline assembly) would run the same* on CPython
> 3.n. Conversely, a Python 3.n program should either run the same* on
> MicroPython as CPython, or raise. What most to avoid is giving
> different* answers.

That's nice aim, to implement which we don't have enough resources, so
would appreciate any help from interested parties.

> *'same' does not include timing differences or normal float
> variations or bug fixes in MicroPython not in CPython.
> 
> As for unicode: I would see ascii-only (very limited codepoints) or
> bare utf-8 (limited speed == expanded time) as possibly fitting the 
> definition above. Just be clear what the limitations are. And accept 
> that there will be people who do not bother to read the limitations
> and then complain when they bang into them.
> 
> PS. You do not seem to be aware of how well the current PEP393 
> implementation works. If you are going to write any more about it, I 
> suggest you run Tools/Stringbench/stringbench.py for timings.

"Well" is subjective (or should be defined formally based on the
requirements). With my MicroPython hat on, an implementation which
receives a string, transcodes it, leading to bigger size, just to
immediately transcode back and send out - is awful, environment
unfriendly implementation ;-).

-- 
Best regards,
 Paul                          mailto:pmiscml at gmail.com