On 8/31/06, <b class="gmail_sendername">Guido van Rossum</b> &lt;<a href="mailto:guido@python.org">guido@python.org</a>&gt; wrote:<div><span class="gmail_quote"></span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

On 8/31/06, Paul Prescod &lt;<a href="mailto:paul@prescod.net">paul@prescod.net</a>&gt; wrote:<br>&gt; On 8/31/06, Guido van Rossum &lt;<a href="mailto:guido@python.org">guido@python.org</a>&gt; wrote:<br>&gt; &gt; (Adding back py3k list assuming you just forgot it)

<br>&gt;<br>&gt; Yes, thanks. Gmail's UI really optimizes the &quot;Reply To&quot; operation of &quot;Reply<br>&gt; To All.&quot;<br>&gt;<br>&gt; &gt; &gt; Plus, it sounds like you're proposing that the encodings of the<br>

&gt; underlying<br>&gt; &gt; &gt; data would leak through to the application. As I understood Fredrick's<br>&gt; &gt; &gt; model, the intention was to treat the encoding as an implementation<br>&gt; detail.<br>&gt; &gt; &gt; If it works well, this could be an important differentiator for Python

<br>&gt; &gt; &gt; (versus Java) as Unicode already is (versus Ruby).<br>&gt; &gt;<br>&gt; &gt; *Only* for UTF-16, which I consider a necessary evil since we can't<br>&gt; &gt; rewrite the Java and .NET standards.<br>&gt;

<br>&gt; I see what you're getting at.<br>&gt;<br>&gt; I'd say that decoding UTF-16 data in CPython and PyPy should (by default)<br>&gt; create true Unicode characters. Jython and IronPython could create<br>&gt; surrogates and characters when necessary. When you run the program in

<br>&gt; CPython you'll get better behaviour than in Jython/IronPython. Maybe there<br>&gt; could be a way to make CPython run like Jython and IronPython if you wanted<br>&gt; 100% absolute compatibility between the environments. I think that we agree

<br>&gt; that it would be unfortunate if CPython copied Java and .NET to its own<br>&gt; detriment. It's also not inconceivable that Java and .NET might evolve a<br>&gt; 4-byte mode in the long term.<br><br>I think it would be best to do this as a CPython configuration option

<br>just like it's done today. You can choose 4-byte or 2-byte Unicode<br>(essentially UCS-4 or UTF-16) in order to be compatible with other<br>packages on the platform. Yes, 4-byte gives better Unicode support.<br>But 2-bytes may be more compatible with other stuff on the platform.

<br>Too bad .NET and Java don't have this option. :-)</blockquote><div><br>The current model is a hack (and I wrote the PEP!).<br><br>If you decide to go to all of the effort and expense of polymorphic

strings, I cannot understand why a user should be forced to choose

between 16 and 32 bit strings AT BUILD TIME. PEP 261 says that reason

for the build-time solution is: <br>

<pre>&quot;[The alternate solutions] ... would require a much more <br>complex implementation than the accepted solution. ... <br>Guido is not willing to undertake the implementation right <br>now. ...This PEP represents least-effort solution.&quot;

</pre>

</div>Fair enough. A world of finite resouces. But I would be very annoyed if my ISP had installed a Python version that could magically handle 8-bit and 16-bit strings efficiently but I had to ask them to install a special version to handle 32 bit strings at all. Obviously build-time configuration is the least flexible of all available options.

<br><br>&nbsp;Paul Prescod<br><br></div>