Oh look, another language (ceylon)
steve at pearwood.info
Tue Nov 19 03:13:17 CET 2013
On Tue, 19 Nov 2013 10:25:00 +1100, Chris Angelico wrote:
> But the problem is also with strings coming back from JS.
Just because you call it a "string" in Ceylon, doesn't mean you have to
which operations potentially add surrogates. Since strings are immutable
in Ceylon, a slice of a BMP-only string is also BMP-only; concatenating
two BMP-only strings gives a BMP-only string. I expect that uppercasing
or lowercasing such strings will also keep the same invariant, but if
not, well, you already have to walk the string to convert it, walking it
again should be no more expensive.
The point is not that my off-the-top-of-my-head pseudo-implementation was
optimal in all details, but that *text strings* should be decent data
structures with smarts, not dumb arrays of variable-width characters. If
that means avoiding dumb-array-of-char naive implementations, and writing
your own, that's part of the compiler writers job.
Python strings can include null bytes, unlike C, even when built on top
of C. They know their length, unlike C, even when built on top of C. Just
> - as opposed to simply saying "string
> indexing can be slow on large strings", which puts the cost against a
> visible line of code.
For all we know, Ceylon already does something like this, but merely
doesn't advertise the fact that while it *can* be slow, it can *also* be
fast. It's an implementation detail, perhaps, much like string
concatenation in Python officially requires building a new string, but in
CPython sometimes it can append to the original string.
Still, given that Pike and Python have already solved this problem, and
have O(1) string indexing operations and length for any Unicode string,
SMP and BMP, it is a major disappointment that Ceylon doesn't.
More information about the Python-list