
Hello,
On Fri, 6 Jun 2014 21:48:41 +1000 Tim Delaney timothy.c.delaney@gmail.com wrote:
On 6 June 2014 21:34, Paul Sokolovsky pmiscml@gmail.com wrote:
On Fri, 06 Jun 2014 20:11:27 +0900 "Stephen J. Turnbull" stephen@xemacs.org wrote:
Paul Sokolovsky writes:
That kinda means "string is atomic", instead of your "characters are atomic".
I would be very surprised if a language that behaved that way was called a "Python subset". No indexing, no slicing, no regexps, no .split(), no .startswith(), no sorted() or .sort(), ...!?
If that's not what you mean by "string is atomic", I think you're using very confusing terminology.
I'm sorry if I didn't mention it, or didn't make it clear enough - it's all about layering.
On level 0, you treat strings verbatim, and can write some subset of apps (my point is that even this level allows to write lot enough apps). Let's call this set A0.
On level 1, you accept that there's some universal enough conventions for some chars, like space or newline. And you can write set of apps A1 > A0.
At heart, this is exactly what the Python 3 "str" type is. The universal convention is "code points".
Yes. Except for one small detail - Python3 specifies these code points to be Unicode code points. And Unicode is a very bloated thing.
But if we drop that "Unicode" stipulation, then it's also exactly what MicroPython implements. Its "str" type consists of codepoints, we don't have pet names for them yet, like Unicode does, but their numeric values are 0-255. Note that it in no way limits encodings, characters, or scripts which can be used with MicroPython, because just like Unicode, it support concept of "surrogate pairs" (but we don't call it like that) - specifically, smaller code points may comprise bigger groupings. But unlike Unicode, we don't stipulate format, value or other constraints on how these "surrogate pairs"-alikes are formed, leaving that to users.