RE Module Performance

MRAB python at mrabarnett.plus.com
Tue Jul 30 18:13:40 CEST 2013


On 30/07/2013 15:38, Antoon Pardon wrote:
> Op 30-07-13 16:01, wxjmfauth at gmail.com schreef:
>>
>> I am pretty sure that once you have typed your 127504 ascii
>> characters, you are very happy the buffer of your editor does not
>> waste time in reencoding the buffer as soon as you enter an €, the
>> 125505th char. Sorry, I wanted to say z instead of euro, just to
>> show that backspacing the last char and reentering a new char
>> implies twice a reencoding.
>
> Using a single string as an editor buffer is a bad idea in python for
> the simple reason that strings are immutable.

Using a single string as an editor buffer is a bad idea in _any_
language because an insertion would require all the following
characters to be moved.

> So adding characters would mean continuously copying the string
> buffer into a new string with the next character added. Copying
> 127504 characters into a new string will not make that much of a
> difference whether the octets are just copied to octets or are
> unpacked into 32 bit words.
>
>> Somebody wrote "FSR" is just an optimization. Yes, but in case of
>> an editor à la FSR, this optimization take place everytime you
>> enter a char. Your poor editor, in fact the FSR, is finally
>> spending its time in optimizing and finally it optimizes nothing.
>> (It is even worse).
>
> Even if you would do it this way, it would *not* take place every
> time you enter a char. Once your buffer would contain a wide
> character, it would just need to convert the single character that is
> added after each keystroke. It would not need to convert the whole
> buffer after each key stroke.
>
>> If you type correctly a z instead of an €, it is not necessary to
>> reencode the buffer. Problem, you do you know that you do not have
>> to reencode? simple just check it, and by just checking it wastes
>> time to test it you have to optimized or not and hurt a little bit
>> more what is supposed to be an optimization.
>
> Your scenario is totally unrealistic. First of all because of the
> immutable nature of python strings, second because you suggest that
> real time usage would result in frequent conversions which is highly
> unlikely.
>
What you would have is a list of mutable chunks.

Inserting into a chunk would be fast, and a chunk would be split if
it's already full. Also, small adjacent chunks would be joined together.

Finally, a chunk could use FSR to reduce memory usage.



More information about the Python-list mailing list