<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#330033">
On 8/24/2011 1:18 AM, "Martin v. Löwis" wrote:
<blockquote cite="mid:4E54B3CC.9040900@v.loewis.de" type="cite">
<blockquote type="cite">
<pre wrap="">So am I correctly reading between the lines when, after reading this
thread so far, and the complete issue discussion so far, that I see a
PEP 393 revision or replacement that has the following characteristics:
1) Narrow builds are dropped.
</pre>
</blockquote>
<pre wrap="">
PEP 393 already drops narrow builds.</pre>
</blockquote>
<br>
I'd forgotten that.<br>
<br>
<blockquote cite="mid:4E54B3CC.9040900@v.loewis.de" type="cite"><br>
<blockquote type="cite">
<pre wrap="">2) There are more, or different, internal kinds of strings, which affect
the processing patterns.
</pre>
</blockquote>
<pre wrap="">
This is the basic idea of PEP 393.</pre>
</blockquote>
<br>
Agreed.<br>
<blockquote cite="mid:4E54B3CC.9040900@v.loewis.de" type="cite"><br>
<blockquote type="cite">
<pre wrap="">a) all ASCII
b) latin-1 (8-bit codepoints, the first 256 Unicode codepoints) This
kind may not be able to support a "mostly" variation, and may be no more
efficient than case b). But it might also be popular in parts of Europe
</pre>
</blockquote>
<pre wrap="">
This two cases are already in PEP 393.</pre>
</blockquote>
Sure. Wanted to enumerate all, rather than just add-ons.<br>
<br>
<blockquote cite="mid:4E54B3CC.9040900@v.loewis.de" type="cite">
<blockquote type="cite">
<pre wrap="">c) mostly ASCII (utf8) with clever indexing/caching to be efficient
d) UTF-8 with clever indexing/caching to be efficient
</pre>
</blockquote>
<pre wrap="">
I see neither a need nor a means to consider these.</pre>
</blockquote>
<br>
The discussion about "mostly ASCII" strings seems convincing that
there could be a significant space savings if such were implemented.<br>
<br>
<blockquote cite="mid:4E54B3CC.9040900@v.loewis.de" type="cite">
<blockquote type="cite">
<pre wrap="">e) 16-bit codepoints
</pre>
</blockquote>
<pre wrap="">
These are in PEP 393.
</pre>
<blockquote type="cite">
<pre wrap="">f) UTF-16 with clever indexing/caching to be efficient
</pre>
</blockquote>
<pre wrap="">
Again, -1.</pre>
</blockquote>
<br>
This is probably the one I would pick as least likely to be useful
if the rest were implemented.<br>
<br>
<blockquote cite="mid:4E54B3CC.9040900@v.loewis.de" type="cite">
<blockquote type="cite">
<pre wrap="">g) 32-bit codepoints
</pre>
</blockquote>
<pre wrap="">
This is in PEP 393.
</pre>
<blockquote type="cite">
<pre wrap="">h) UTF-32
</pre>
</blockquote>
<pre wrap="">
What's that, as opposed to g)?</pre>
</blockquote>
<br>
g) would permit codes greater than u+10ffff and would permit the
illegal codepoints and lone surrogates. h) would be strict Unicode
conformance. Sorry that the 4 paragraphs of explanation that you
didn't quote didn't make that clear.<br>
<br>
<blockquote cite="mid:4E54B3CC.9040900@v.loewis.de" type="cite">
<pre wrap="">
I'm not open to revise PEP 393 in the direction of adding more
representations.
</pre>
</blockquote>
It's your PEP.<br>
</body>
</html>