<html>

  <head>

    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#330033">

    On 8/24/2011 1:18 AM, "Martin v. Löwis" wrote:

    <blockquote cite="mid:4E54B3CC.9040900@v.loewis.de" type="cite">

      <blockquote type="cite">

        <pre wrap="">So am I correctly reading between the lines when, after reading this

thread so far, and the complete issue discussion so far, that I see a

PEP 393 revision or replacement that has the following characteristics:

1) Narrow builds are dropped.

</pre>

      </blockquote>

      <pre wrap="">

PEP 393 already drops narrow builds.</pre>

    </blockquote>

    <br>

    I'd forgotten that.<br>

    <br>

    <blockquote cite="mid:4E54B3CC.9040900@v.loewis.de" type="cite"><br>

      <blockquote type="cite">

        <pre wrap="">2) There are more, or different, internal kinds of strings, which affect

the processing patterns.

</pre>

      </blockquote>

      <pre wrap="">

This is the basic idea of PEP 393.</pre>

    </blockquote>

    <br>

    Agreed.<br>

    <blockquote cite="mid:4E54B3CC.9040900@v.loewis.de" type="cite"><br>

      <blockquote type="cite">

        <pre wrap="">a) all ASCII

b) latin-1 (8-bit codepoints, the first 256 Unicode codepoints) This

kind may not be able to support a "mostly" variation, and may be no more

efficient than case b).  But it might also be popular in parts of Europe

</pre>

      </blockquote>

      <pre wrap="">

This two cases are already in PEP 393.</pre>

    </blockquote>

    Sure.  Wanted to enumerate all, rather than just add-ons.<br>

    <br>

    <blockquote cite="mid:4E54B3CC.9040900@v.loewis.de" type="cite">

      <blockquote type="cite">

        <pre wrap="">c) mostly ASCII (utf8) with clever indexing/caching to be efficient

d) UTF-8 with clever indexing/caching to be efficient

</pre>

      </blockquote>

      <pre wrap="">

I see neither a need nor a means to consider these.</pre>

    </blockquote>

    <br>

    The discussion about "mostly ASCII" strings seems convincing that

    there could be a significant space savings if such were implemented.<br>

    <br>

    <blockquote cite="mid:4E54B3CC.9040900@v.loewis.de" type="cite">

      <blockquote type="cite">

        <pre wrap="">e) 16-bit codepoints

</pre>

      </blockquote>

      <pre wrap="">

These are in PEP 393.

</pre>

      <blockquote type="cite">

        <pre wrap="">f) UTF-16 with clever indexing/caching to be efficient

</pre>

      </blockquote>

      <pre wrap="">

Again, -1.</pre>

    </blockquote>

    <br>

    This is probably the one I would pick as least likely to be useful

    if the rest were implemented.<br>

    <br>

    <blockquote cite="mid:4E54B3CC.9040900@v.loewis.de" type="cite">

      <blockquote type="cite">

        <pre wrap="">g) 32-bit codepoints

</pre>

      </blockquote>

      <pre wrap="">

This is in PEP 393.

</pre>

      <blockquote type="cite">

        <pre wrap="">h) UTF-32

</pre>

      </blockquote>

      <pre wrap="">

What's that, as opposed to g)?</pre>

    </blockquote>

    <br>

    g) would permit codes greater than u+10ffff and would permit the

    illegal codepoints and lone surrogates.  h) would be strict Unicode

    conformance.  Sorry that the 4 paragraphs of explanation that you

    didn't quote didn't make that clear.<br>

    <br>

    <blockquote cite="mid:4E54B3CC.9040900@v.loewis.de" type="cite">

      <pre wrap="">

I'm not open to revise PEP 393 in the direction of adding more

representations.

</pre>

    </blockquote>

    It's your PEP.<br>

  </body>

</html>