Re: [XML-SIG] Re: [I18n-sig] Re: [Python-Dev] Unicode debate
Guido van Rossum <guido@python.org> writes: <snip/>
My ASCII proposal is a compromise that tries to be fair to both uses for strings. Introducing byte arrays as a more fundamental type has been on the wish list for a long time -- I see no way to introduce this into Python 1.6 without totally botching the release schedule (June 1st is very close already!). I'd like to be able to move on, there are other important things still to be added to 1.6 (Vladimir's malloc patches, Neil's GC, Fredrik's completed sre...).
For 1.7 (which should happen later this year) I promise I'll reopen the discussion on byte arrays.
I think I hear a moderate consensus developing that the 'ASCII proposal' is a reasonable compromise given the time constraints. But let's not fail to come back to this ASAP -- it _really_ narcs me that every time I load XML into my Python-based editor I'm going to convert large amounts of wide-string data into UTF-8 just so Tk can convert it back to wide-strings in order to display it! ht -- Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh W3C Fellow 1999--2001, part-time member of W3C Team 2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk URL: http://www.ltg.ed.ac.uk/~ht/
I think I hear a moderate consensus developing that the 'ASCII proposal' is a reasonable compromise given the time constraints. But let's not fail to come back to this ASAP -- it _really_ narcs me that every time I load XML into my Python-based editor I'm going to convert large amounts of wide-string data into UTF-8 just so Tk can convert it back to wide-strings in order to display it!
Thanks -- but that's really Tcl's fault, since the only way to get character data *into* Tcl (or out of it) is through the UTF-8 encoding. And is your XML really stored on disk in its 16-bit format? --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum <guido@python.org> wrote:
Thanks -- but that's really Tcl's fault, since the only way to get character data *into* Tcl (or out of it) is through the UTF-8 encoding.
from http://dev.scriptics.com/man/tcl8.3/TclLib/StringObj.htm Tcl_NewUnicodeObj(Tcl_UniChar* unicode, int numChars) Tcl_NewUnicodeObj and Tcl_SetUnicodeObj create a new object or modify an existing object to hold a copy of the Unicode string given by unicode and numChars. (Tcl_UniChar* is currently the same thing as Py_UNICODE*) </F>
Henry S. Thompson <ht@cogsci.ed.ac.uk> wrote:
I think I hear a moderate consensus developing that the 'ASCII proposal' is a reasonable compromise given the time constraints.
agreed. (but even if we settle for "7-bit unicode" in 1.6, there are still a few issues left to sort out before 1.6 final. but it might be best to get back to that after we've added SRE and GC to 1.6a3. we might all need a short break...)
But let's not fail to come back to this ASAP
first week in june, promise ;-) </F>
At 11:02 PM +0200 04-05-2000, Fredrik Lundh wrote:
Henry S. Thompson <ht@cogsci.ed.ac.uk> wrote:
I think I hear a moderate consensus developing that the 'ASCII proposal' is a reasonable compromise given the time constraints.
agreed.
This makes no sense: implementing the 7-bit proposal takes the more or less the same time as implementing 8-bit downcasting. Or is it just the bickering that's too time consuming? ;-) I worry that if the current implementation goes into 1.6 more or less as it is now there's no way we can ever go back (before P3K). Or will Unicode support be marked "experimental" in 1.6? This is not so much about the 7-bit/8-bit proposal but about the dubious unicode() and unichr() functions and the u"" notation: - unicode() only takes strings, so is effectively a method of the string type. - if narrow and wide strings are meant to be as similar as possible, chr(256) should just return a wide char - similarly, why is the u"" notation at all needed? The current design is more complex than needed, and still offers plenty of surprises. Making it simpler (without integrating the two string types) is not a huge effort. Seeing the wide string type as independent of Unicode takes no physical effort at all, as it's just in our heads. Fixing str() so it can return wide strings might be harder, and can wait until later. Would be too bad, though. Just
participants (5)
-
Fredrik Lundh
-
Fredrik Lundh
-
Guido van Rossum
-
ht@cogsci.ed.ac.uk
-
Just van Rossum