On 8/31/06, <b class="gmail_sendername">Guido van Rossum</b> <<a href="mailto:guido@python.org">guido@python.org</a>> wrote:<div><span class="gmail_quote"></span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
On 8/31/06, Paul Prescod <<a href="mailto:paul@prescod.net">paul@prescod.net</a>> wrote:<br>> On 8/31/06, Guido van Rossum <<a href="mailto:guido@python.org">guido@python.org</a>> wrote:<br>> > (Adding back py3k list assuming you just forgot it)
<br>><br>> Yes, thanks. Gmail's UI really optimizes the "Reply To" operation of "Reply<br>> To All."<br>><br>> > > Plus, it sounds like you're proposing that the encodings of the<br>
> underlying<br>> > > data would leak through to the application. As I understood Fredrick's<br>> > > model, the intention was to treat the encoding as an implementation<br>> detail.<br>> > > If it works well, this could be an important differentiator for Python
<br>> > > (versus Java) as Unicode already is (versus Ruby).<br>> ><br>> > *Only* for UTF-16, which I consider a necessary evil since we can't<br>> > rewrite the Java and .NET standards.<br>>
<br>> I see what you're getting at.<br>><br>> I'd say that decoding UTF-16 data in CPython and PyPy should (by default)<br>> create true Unicode characters. Jython and IronPython could create<br>> surrogates and characters when necessary. When you run the program in
<br>> CPython you'll get better behaviour than in Jython/IronPython. Maybe there<br>> could be a way to make CPython run like Jython and IronPython if you wanted<br>> 100% absolute compatibility between the environments. I think that we agree
<br>> that it would be unfortunate if CPython copied Java and .NET to its own<br>> detriment. It's also not inconceivable that Java and .NET might evolve a<br>> 4-byte mode in the long term.<br><br>I think it would be best to do this as a CPython configuration option
<br>just like it's done today. You can choose 4-byte or 2-byte Unicode<br>(essentially UCS-4 or UTF-16) in order to be compatible with other<br>packages on the platform. Yes, 4-byte gives better Unicode support.<br>But 2-bytes may be more compatible with other stuff on the platform.
<br>Too bad .NET and Java don't have this option. :-)</blockquote><div><br>The current model is a hack (and I wrote the PEP!).<br><br>If you decide to go to all of the effort and expense of polymorphic
strings, I cannot understand why a user should be forced to choose
between 16 and 32 bit strings AT BUILD TIME. PEP 261 says that reason
for the build-time solution is: <br>
<pre>"[The alternate solutions] ... would require a much more <br>complex implementation than the accepted solution. ... <br>Guido is not willing to undertake the implementation right <br>now. ...This PEP represents least-effort solution."
</pre>
</div>Fair enough. A world of finite resouces. But I would be very annoyed if my ISP had installed a Python version that could magically handle 8-bit and 16-bit strings efficiently but I had to ask them to install a special version to handle 32 bit strings at all. Obviously build-time configuration is the least flexible of all available options.
<br><br> Paul Prescod<br><br></div>