Nimrod programming language

Wed May 13 04:44:11 EDT 2009

On Tue, May 12, 2009 at 3:10 PM,  <rumpf_a at web.de> wrote:
>> You can certainly have a string type that uses byte arrays in UTF-8
>> encoding internally, but your string functions should be aware of that
>> and treat it as a unicode string. The len function and index operators
>> should count characters, not bytes. Add a byte array data type for
>> byte arrays instead.
>>
> It's not easy. I think Python3's byte arrays have an "upper" method
> (and a string literal syntax b"abc") which is quite alarming to me
> that they chose the wrong default.

I suppose that is to make it possible to use the 'bytes' data type for
text strings if you really want to (and for backwards-compatibility).
Default text strings should use Unicode (as in Python 3), and that
should be supported by the language.

> Eventually the "rope" data structure (that the compiler uses heavily)
> will become a proper part of the library: By "rope" I mean an
> immutable string implemented as a tree, so concatenation is O(1). For
> immutable strings there is no ``[]=`` operation, so using UTF-8 and
> converting it to a 32bit char works better.

Consider a string class that keeps track of its own encoding and can
change it on the fly as needed.

-- 
martin at librador.com
http://www.librador.com