RE Module Performance

Ian Kelly ian.g.kelly at gmail.com
Thu Jul 25 22:45:38 CEST 2013


On Thu, Jul 25, 2013 at 12:18 PM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> On Fri, 26 Jul 2013 01:36:07 +1000, Chris Angelico wrote:
>
>> On Fri, Jul 26, 2013 at 1:26 AM, Steven D'Aprano
>> <steve+comp.lang.python at pearwood.info> wrote:
>>> On Thu, 25 Jul 2013 14:36:25 +0100, Jeremy Sanders wrote:
>>>> "To conserve memory, Emacs does not hold fixed-length 22-bit numbers
>>>> that are codepoints of text characters within buffers and strings.
>>>> Rather, Emacs uses a variable-length internal representation of
>>>> characters, that stores each character as a sequence of 1 to 5 8-bit
>>>> bytes, depending on the magnitude of its codepoint[1]. For example,
>>>> any ASCII character takes up only 1 byte, a Latin-1 character takes up
>>>> 2 bytes, etc. We call this representation of text multibyte.
>>>
>>> Well, you've just proven what Vim users have always suspected: Emacs
>>> doesn't really exist.
>>
>> ... lolwut?
>
>
> JMF has explained that it is impossible, impossible I say!, to write an
> editor using a flexible string representation. Since Emacs uses such a
> flexible string representation, Emacs is impossible, and therefore Emacs
> doesn't exist.
>
> QED.

Except that the described representation used by Emacs is a variant of
UTF-8, not an FSR.  It doesn't have three different possible encodings
for the letter 'a' depending on what other characters happen to be in
the string.

As I understand it, jfm would be perfectly happy if Python used UTF-8
(or presumably the Emacs variant) as its internal string
representation.



More information about the Python-list mailing list