[Python-Dev] len(chr(i)) = 2?
glyph at twistedmatrix.com
Fri Nov 26 08:21:26 CET 2010
On Nov 24, 2010, at 4:03 AM, Stephen J. Turnbull wrote:
> You end up proliferating types that all do the same kind of thing. Judicious use of inheritance helps, but getting the fundamental abstraction right is hard. Or least, Emacs hasn't found it in 20 years of trying.
Emacs hasn't even figured out how to do general purpose iteration in 20 years of trying either. The easiest way I've found to loop across an arbitrary pile of 'stuff' is the CL 'loop' macro, which you're not even supposed to use. Even then, you still have to make the arcane and pointless distinction of using 'across' or 'in' or 'on'. Python, on the other hand, has iteration pretty well tied up nicely in a bow.
I don't know how to respond to the rest of your argument. Nothing you've said has in any way indicated to me why having code-point offsets is a good idea, only that people who know C and elisp would rather sling around piles of integers than have good abstract types.
> I think it more likely that markers are very expense to create and use compared to integers.
What? When you do 'for x in str' in python, you are already creating an iterator object, which has to store the exact same amount of state that our proposed 'marker' or 'character pointer' would have to store. The proposed UTF-8 marker would have to do a tiny bit more work when iterating because it would have to combine multibyte characters, but in exchange for that you get to skip a whole ton of copying when encoding and decoding. How is this expensive to create and use? For every application I have ever designed, encountered, or can even conjecture about, this would be cheaper. (Assuming not just a UTF-8 string type, but one for UTF-16 as well, where native data is in that format already.)
For what it's worth, not wanting to use abstract types in Emacs makes sense to me: I've written my share of elisp code, and it is hard to create reasonable abstractions in Emacs, because the facilities for defining types and creating polymorphic logic are so crude. It's a lot easier to just assume your underlying storage is an array, because at the end of the day you're going to need to call some functions on it which care whether it's an array or an alist or a list or a vector anyway, so you might as well just say so up front. But in Python we could just call 'mystring.by_character()' or 'mystring.by_codepoint()' and get an iterator object back and forget about all that junk.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-Dev