RE Module Performance
wxjmfauth at gmail.com
wxjmfauth at gmail.com
Fri Jul 26 09:19:42 EDT 2013
Le jeudi 25 juillet 2013 22:45:38 UTC+2, Ian a écrit :
> On Thu, Jul 25, 2013 at 12:18 PM, Steven D'Aprano
>
> <steve+comp.lang.python at pearwood.info> wrote:
>
> > On Fri, 26 Jul 2013 01:36:07 +1000, Chris Angelico wrote:
>
> >
>
> >> On Fri, Jul 26, 2013 at 1:26 AM, Steven D'Aprano
>
> >> <steve+comp.lang.python at pearwood.info> wrote:
>
> >>> On Thu, 25 Jul 2013 14:36:25 +0100, Jeremy Sanders wrote:
>
> >>>> "To conserve memory, Emacs does not hold fixed-length 22-bit numbers
>
> >>>> that are codepoints of text characters within buffers and strings.
>
> >>>> Rather, Emacs uses a variable-length internal representation of
>
> >>>> characters, that stores each character as a sequence of 1 to 5 8-bit
>
> >>>> bytes, depending on the magnitude of its codepoint[1]. For example,
>
> >>>> any ASCII character takes up only 1 byte, a Latin-1 character takes up
>
> >>>> 2 bytes, etc. We call this representation of text multibyte.
>
> >>>
>
> >>> Well, you've just proven what Vim users have always suspected: Emacs
>
> >>> doesn't really exist.
>
> >>
>
> >> ... lolwut?
>
> >
>
> >
>
> > JMF has explained that it is impossible, impossible I say!, to write an
>
> > editor using a flexible string representation. Since Emacs uses such a
>
> > flexible string representation, Emacs is impossible, and therefore Emacs
>
> > doesn't exist.
>
> >
>
> > QED.
>
>
>
> Except that the described representation used by Emacs is a variant of
>
> UTF-8, not an FSR. It doesn't have three different possible encodings
>
> for the letter 'a' depending on what other characters happen to be in
>
> the string.
>
>
>
> As I understand it, jfm would be perfectly happy if Python used UTF-8
>
> (or presumably the Emacs variant) as its internal string
>
> representation.
------
And emacs it probably working smoothly.
Your comment summarized all this stuff very correctly and
very shortly.
utf8/16/32? I do not care. There are all working correctly,
smoothly and efficiently. In fact, these utf's are already
doing correctly, what this FSR is doing in a wrong way.
My preference? utf32. Why? It is the most simple and
consequently performing choice. I'm not a narrow minded
ascii user. (I do not pretend to belong to those who
are solving the quadrature of the circle, I pretend to
belong to those who know, the quadrature of the circle
is not solvable).
Note: text processing tools or tools that have to process
characters — and the tools to build these tools — are all
moving to utf32, if not already done. There are technical
reasons behind this, which are going beyond the
pure raw unicode. There are however still 100% Unicode
compliant.
jmf
More information about the Python-list
mailing list