
I've uploaded a new version of the proposal which incorporates a lot of what has been discussed on the list. Thanks to everybody who helped so far. Note that I have extended the list of references for those who want to join in, but are in need of more background information. The latest version of the proposal is available at: http://starship.skyport.net/~lemburg/unicode-proposal.txt Older versions are available as: http://starship.skyport.net/~lemburg/unicode-proposal-X.X.txt Some POD (points of discussion) that are still open: · support for line breaks (see http://www.unicode.org/unicode/reports/tr13/ ) · support for case conversion: Problems: string lengths can change due to multiple characters being mapped to a single new one, capital letters starting a word can be different than ones occurring in the middle, there are locale dependent deviations from the standard mappings. · support for numbers, digits, whitespace, etc. · support (or no support) for private code point areas · should Unicode objects support %-formatting ? One possibility would be to emulate this via strings and <default encoding>: s = '%s %i abcäöü' # a Latin-1 encoded string t = (u,3) # Convert Latin-1 s to a <default encoding> string s1 = unicode(s,'latin-1').encode() # The '%s' will now add u in <default encoding> s2 = s1 % t # Finally, convert the <default encoding> encoded string to Unicode u1 = unicode(s2) · specifying file wrappers: Open issues: what to do with Python strings fed to the .write() method (may need to know the encoding of the strings) and when/if to return Python strings through the .read() method. Perhaps we need more than one type of wrapper here. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 49 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/