Mailman 3 November 2010 - Python-Dev

[RELEASED] Python 2.7.1
by Benjamin Peterson Nov. 27, 2010

Nov. 27, 2010

On behalf of the Python development team, I'm happy as a clam to announce the immediate availability of Python 2.7.1. 2.7 includes many features that were first released in Python 3.1. The faster io module, the new nested with statement syntax, improved float repr, set literals, dictionary views, and the memoryview object have been backported from 3.1. Other features include an ordered dictionary implementation, unittests improvements, a new sysconfig module, auto-numbering of fields in the str/unicode format method, and support for ttk Tile in Tkinter. For a more extensive list of changes in 2.7, see http://doc.python.org/dev/whatsnew/2.7.html or Misc/NEWS in the Python distribution. To download Python 2.7.1 visit: http://www.python.org/download/releases/2.7.1/ The 2.7.1 changelog is at: http://svn.python.org/projects/python/tags/r271/Misc/NEWS 2.7 documentation can be found at: http://docs.python.org/2.7/ This is a production release. Please report any bugs you find to the bug tracker: http://bugs.python.org/ Enjoy! -- Benjamin Peterson Release Manager benjamin at python.org (on behalf of the entire python-dev team and 2.7.1's contributors)

1 0

Re: [Python-Dev] [Python-checkins] r86745 - in python/branches/py3k: Doc/library/difflib.rst Lib/difflib.py Lib/test/test_difflib.py Misc/NEWS
by Nick Coghlan Nov. 27, 2010

Nov. 27, 2010

On Thu, Nov 25, 2010 at 4:12 PM, terry.reedy <python-checkins(a)python.org> wrote: > The :class:`SequenceMatcher` class has this constructor: > > > -.. class:: SequenceMatcher(isjunk=None, a='', b='') > +.. class:: SequenceMatcher(isjunk=None, a='', b='', autojunk=True) > > Optional argument *isjunk* must be ``None`` (the default) or a one-argument > function that takes a sequence element and returns true if and only if the > @@ -340,6 +349,9 @@ > The optional arguments *a* and *b* are sequences to be compared; both default to > empty strings. The elements of both sequences must be :term:`hashable`. > > + The optional argument *autojunk* can be used to disable the automatic junk > + heuristic. > + Catching up on checkins traffic, so a later checkin may already fix this, but there should be a versionchanged tag in the docs to note when the autojunk parameter was added. Cheers, Nick. -- Nick Coghlan | ncoghlan(a)gmail.com | Brisbane, Australia

4 4

Python make fails with error "Fatal Python error: Interpreter not initialized (version mismatch?)"
by Anurag Chourasia Nov. 27, 2010

Nov. 27, 2010

Hi All, During the make step of python, I am encountering a weird error. This is on AIX 5.3 using gcc as the compiler. My configuration options are as follows ./configure --enable-shared --disable-ipv6 --with-gcc=gcc CPPFLAGS="-I /opt/freeware/include -I /opt/freeware/include/readline -I /opt/freeware/include/ncurses" LDFLAGS="-L. -L/usr/local/lib" Below is the transcript from the make step. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ running build running build_ext ldd: /lib/libreadline.a: File is an archive. INFO: Can't locate Tcl/Tk libs and/or headers building '_struct' extension gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I. -I/u01/home/apli/wm/GDD/Python-2.6.6/./Include -I. -IInclude -I./Include -I/opt/freeware/include -I/opt/freeware/include/readline -I/opt/freeware/include/ncurses -I/usr/local/include -I/u01/home/apli/wm/GDD/Python-2.6.6/Include -I/u01/home/apli/wm/GDD/Python-2.6.6 -c /u01/home/apli/wm/GDD/Python-2.6.6/Modules/_struct.c -o build/temp.aix-5.3-2.6/u01/home/apli/wm/GDD/Python-2.6.6/Modules/_struct.o ./Modules/ld_so_aix gcc -pthread -bI:Modules/python.exp -L. -L/usr/local/lib build/temp.aix-5.3-2.6/u01/home/apli/wm/GDD/Python-2.6.6/Modules/_struct.o -L. -L/usr/local/lib -lpython2.6 -o build/lib.aix-5.3-2.6/_struct.so *Fatal Python error: Interpreter not initialized (version mismatch?)* *make: 1254-059 The signal code from the last command is 6.* ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ The last command that i see above (ld_so_aix) seems to have completed as the file _struct.so exists after this command and hence I am not sure which step is failing. There is no other Python version on my machine. Please guide.

1 0

Re: [Python-Dev] [Python-checkins] r86750 - python/branches/py3k/Demo/curses/life.py
by Éric Araujo Nov. 27, 2010

Nov. 27, 2010

Hello, > Author: senthil.kumaran > Log: > Mouse support and colour to Demo/curses/life.py by Dafydd Crosby > > Modified: > python/branches/py3k/Demo/curses/life.py Okay, this time I’m reacting to the right branch <wink> > Modified: python/branches/py3k/Demo/curses/life.py > ============================================================================== > --- python/branches/py3k/Demo/curses/life.py (original) > +++ python/branches/py3k/Demo/curses/life.py Thu Nov 25 15:56:44 2010 > @@ -1,6 +1,7 @@ > #!/usr/bin/env python3 > # life.py -- A curses-based version of Conway's Game of Life. > # Contributed by AMK > +# Mouse support and colour by Dafydd Crosby Shouldn’t his name rather be in Misc/ACKS too? Modules typically (warning: non-scientific data) include the name of the author or first contributors but not the name of every contributor. I think these cool features deserve a note in Misc/NEWS too :) Re: “colour”: the rest of the file use US English, as do the function names (see for example curses.has_color). It’s good to use one dialect consistently in one file. going-back-to-stare-at-shiny-colors-ly yours, Éric

4 5

len(chr(i)) = 2?
by Alexander Belopolsky Nov. 27, 2010

Nov. 27, 2010

I was recently surprised to learn that chr(i) can produce a string of length 2 in python 3.x. I suspect that I am not alone finding this behavior non-obvious given that a mistake in Python manual stating the contrary survived several releases. [1] Note that I am not arguing that the change was bad. In Python 2.x, \U escapes have been producing surrogate pair on narrow builds for a long time if not since introduction of unicode. I do believe, however that a change like this [2] and its consequences should be better publicized. I have not found any discussion of this change in PEPs or "What's new" documents. The closest find was a mentioning of a related issue #3280 in the 3.0 NEWS file. [3] Since this feature will be first documented in the Library Reference in 3.2, I wonder if it will be appropriate to mention it in "What's new in 3.2"? [1] http://bugs.python.org/issue7828 [2] http://svn.python.org/view?view=rev&revision=56395 [3] http://www.python.org/download/releases/3.0.1/NEWS.txt

14 61

Re: [Python-Dev] [Python-checkins] r86720 - python/branches/py3k/Misc/ACKS
by Terry Reedy Nov. 27, 2010

Nov. 27, 2010

On 11/23/2010 5:43 PM, Éric Araujo wrote: >> Modified: python/branches/py3k/Misc/ACKS >> ============================================================================== >> --- python/branches/py3k/Misc/ACKS (original) >> +++ python/branches/py3k/Misc/ACKS Tue Nov 23 21:32:47 2010 >> @@ -1,4 +1,4 @@ >> -Acknowledgements >> +Acknowledgements > > This change introduced a so-called UTF-8 BOM in the file. Is > TortoiseSvn the culprit or a text editor? I used Notepad to edit the file, TortoiseSvn to commit, the same as I did for #9222, rev86702, Lib\idlelib\IOBinding.py, yesterday. If the latter is OK, perhaps *.py gets filtered better than misc. text files. I believe I have the config as specified in dev/faq. [miscellany] enable-auto-props = yes [auto-props] * = svn:eol-style=native *.c = svn:keywords=Id *.h = svn:keywords=Id *.py = svn:keywords=Id *.txt = svn:keywords=Author Date Id Revision Terry

5 7

Removal of Win32 ANSI API
by Hirokazu Yamamoto Nov. 26, 2010

Nov. 26, 2010

Hello. Is it possible to remove Win32 ANSI API (ie: GetFileAttributesA) and only use Win32 WIDE API (ie: GetFileAttributesW)? Mainly in posixmodule.c. I think we can simplify the code hugely. (This means droping bytes support for os.stat etc on windows) # I recently did it for winsound.PlaySound with MvL's approval Thank you.

8 27

PyPy 1.4 released
by Maciej Fijalkowski Nov. 26, 2010

Nov. 26, 2010

=============================== PyPy 1.4: Ouroboros in practice =============================== We're pleased to announce the 1.4 release of PyPy. This is a major breakthrough in our long journey, as PyPy 1.4 is the first PyPy release that can translate itself faster than CPython. Starting today, we are using PyPy more for our every-day development. So may you :) You can download it here: http://pypy.org/download.html What is PyPy ============ PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython. It's fast (`pypy 1.4 and cpython 2.6`_ comparison) Among its new features, this release includes numerous performance improvements (which made fast self-hosting possible), a 64-bit JIT backend, as well as serious stabilization. As of now, we can consider the 32-bit and 64-bit linux versions of PyPy stable enough to run `in production`_. Numerous speed achievements are described on `our blog`_. Normalized speed charts comparing `pypy 1.4 and pypy 1.3`_ as well as `pypy 1.4 and cpython 2.6`_ are available on benchmark website. For the impatient: yes, we got a lot faster! More highlights =============== * PyPy's built-in Just-in-Time compiler is fully transparent and automatically generated; it now also has very reasonable memory requirements. The total memory used by a very complex and long-running process (translating PyPy itself) is within 1.5x to at most 2x the memory needed by CPython, for a speed-up of 2x. * More compact instances. All instances are as compact as if they had ``__slots__``. This can give programs a big gain in memory. (In the example of translation above, we already have carefully placed ``__slots__``, so there is no extra win.) * `Virtualenv support`_: now PyPy is fully compatible with virtualenv_: note that to use it, you need a recent version of virtualenv (>= 1.5). * Faster (and JITted) regular expressions - huge boost in speeding up the `re` module. * Other speed improvements, like JITted calls to functions like map(). .. _virtualenv: http://pypi.python.org/pypi/virtualenv .. _`Virtualenv support`: http://morepypy.blogspot.com/2010/08/using-virtualenv-with-pypy.html .. _`in production`: http://morepypy.blogspot.com/2010/11/running-large-radio-telescope-software… .. _`our blog`: http://morepypy.blogspot.com .. _`pypy 1.4 and pypy 1.3`: http://speed.pypy.org/comparison/?exe=1%2B41,1%2B172&ben=1,2,3,4,5,6,7,8,9,… .. _`pypy 1.4 and cpython 2.6`: http://speed.pypy.org/comparison/?exe=2%2B35,1%2B172&ben=1,2,3,4,5,6,7,8,9,… Cheers, Carl Friedrich Bolz, Antonio Cuni, Maciej Fijalkowski, Amaury Forgeot d'Arc, Armin Rigo and the PyPy team

2 1

Summary of Python tracker Issues
by Python tracker Nov. 26, 2010

Nov. 26, 2010

ACTIVITY SUMMARY (2010-11-19 - 2010-11-26) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 2533 (-16) closed 19792 (+98) total 22325 (+82) Open issues with patches: 1083 Issues opened (66) ================== #1178: IDLE - add "paste code" functionality http://bugs.python.org/issue1178 reopened by ned.deily #3709: BaseHTTPRequestHandler innefficient when sending HTTP header http://bugs.python.org/issue3709 reopened by r.david.murray #5150: IDLE to support reindent.py http://bugs.python.org/issue5150 reopened by rhettinger #8879: Implement os.link on Windows http://bugs.python.org/issue8879 reopened by amaury.forgeotdarc #9769: PyUnicode_FromFormatV() doesn't handle non-ascii text correctl http://bugs.python.org/issue9769 reopened by belopolsky #10220: Make generator state easier to introspect http://bugs.python.org/issue10220 reopened by ncoghlan #10268: Add --enable-loadable-sqlite-extensions option to `configure` http://bugs.python.org/issue10268 reopened by ned.deily #10441: some stdlib modules need to be updated to handle SSL certifica http://bugs.python.org/issue10441 reopened by pitrou #10453: Add -h/--help option to compileall http://bugs.python.org/issue10453 reopened by eric.araujo #10464: netrc module not parsing passwords containing #s. http://bugs.python.org/issue10464 opened by the_isz #10466: locale.py resetlocale throws exception on Windows (getdefaultl http://bugs.python.org/issue10466 opened by skoczian #10469: test_socket fails using Visual Studio 2010 http://bugs.python.org/issue10469 opened by Kotan #10475: hardcoded compilers for LDSHARED/LDCXXSHARED on NetBSD http://bugs.python.org/issue10475 opened by njoly #10478: Ctrl-C locks up the interpreter http://bugs.python.org/issue10478 opened by isandler #10479: cgitb.py should assume a binary stream for output http://bugs.python.org/issue10479 opened by v+python #10480: cgi.py should document the need for binary stdin/stdout http://bugs.python.org/issue10480 opened by v+python #10481: subprocess PIPEs are byte streams http://bugs.python.org/issue10481 opened by v+python #10482: subprocess and deadlock avoidance http://bugs.python.org/issue10482 opened by v+python #10483: http.server - what is executable on Windows http://bugs.python.org/issue10483 opened by v+python #10484: http.server.is_cgi fails to handle CGI URLs containing PATH_IN http://bugs.python.org/issue10484 opened by v+python #10485: http.server fails when query string contains addition '?' char http://bugs.python.org/issue10485 opened by v+python #10486: http.server doesn't set all CGI environment variables http://bugs.python.org/issue10486 opened by v+python #10487: http.server - doesn't process Status: header from CGI scripts http://bugs.python.org/issue10487 opened by v+python #10492: test_doctest fails with iso-8859-15 locale http://bugs.python.org/issue10492 opened by pitrou #10494: Demo/comparisons/regextest.py needs some usage information. http://bugs.python.org/issue10494 opened by ramiroluz #10495: Demo/comparisons/sortingtest.py needs some usage information. http://bugs.python.org/issue10495 opened by ramiroluz #10496: "import site failed" when Python can't find home directory http://bugs.python.org/issue10496 opened by bbi5291 #10497: Incorrect use of gettext in argparse http://bugs.python.org/issue10497 opened by eric.araujo #10498: calendar.LocaleHTMLCalendar.formatyearpage() results in traceb http://bugs.python.org/issue10498 opened by r.david.murray #10499: Modular interpolation in configparser http://bugs.python.org/issue10499 opened by lukasz.langa #10500: Palevo.DZ worm msix86 installer 3.x installer http://bugs.python.org/issue10500 opened by VilIgnoble #10502: Add unittestguirunner to Tools/ http://bugs.python.org/issue10502 opened by michael.foord #10503: os.getuid() documentation should be clear on what kind of uid http://bugs.python.org/issue10503 opened by giampaolo.rodola #10504: Trivial mingw compile fixes http://bugs.python.org/issue10504 opened by jonny #10507: Check well-formedness of reST markup within "make patchcheck" http://bugs.python.org/issue10507 opened by dmalcolm #10509: PyTokenizer_FindEncoding can lead to a segfault if bad charact http://bugs.python.org/issue10509 opened by Trundle #10510: distutils upload/register should use CRLF in HTTP requests http://bugs.python.org/issue10510 opened by Brian.Jones #10512: regrtest ResourceWarning - unclosed sockets and files http://bugs.python.org/issue10512 opened by nvawda #10513: sqlite3.InterfaceError after commit http://bugs.python.org/issue10513 opened by anders.blomdell(a)control.lth.se #10514: configure does not create accurate Makefile http://bugs.python.org/issue10514 opened by daelious #10515: csv sniffer does not recognize quotes at the end of line http://bugs.python.org/issue10515 opened by Martin.Budaj #10516: Add list.clear() and list.copy() http://bugs.python.org/issue10516 opened by terry.reedy #10517: test_concurrent_futures crashes with "Fatal Python error: Inva http://bugs.python.org/issue10517 opened by lukasz.langa #10518: Bring back callable() http://bugs.python.org/issue10518 opened by pitrou #10519: setobject.c no-op typo http://bugs.python.org/issue10519 opened by arigo #10521: str methods don't accept non-BMP fillchar on a narrow Unicode http://bugs.python.org/issue10521 opened by belopolsky #10522: test_telnet exception http://bugs.python.org/issue10522 opened by pitrou #10523: argparse has problem parsing option files containing empty row http://bugs.python.org/issue10523 opened by Michal.Pomorski #10524: Patch to add Pardus to supported dists in platform http://bugs.python.org/issue10524 opened by zaburt #10527: multiprocessing.Pipe problem: "handle out of range in select() http://bugs.python.org/issue10527 opened by synapse #10528: argparse uses %s in gettext calls http://bugs.python.org/issue10528 opened by eric.araujo #10529: Write argparse i18n howto http://bugs.python.org/issue10529 opened by eric.araujo #10530: distutils2 should allow the installing of python files with in http://bugs.python.org/issue10530 opened by michael.foord #10531: write tilted text in turtle http://bugs.python.org/issue10531 opened by lanyjie #10532: A bug related to matching the empty string http://bugs.python.org/issue10532 opened by lanyjie #10533: Need example of using __missing__ http://bugs.python.org/issue10533 opened by lukasz.langa #10534: difflib.SequenceMatcher: expose junk sets, deprecate undocumen http://bugs.python.org/issue10534 opened by terry.reedy #10535: Enable warnings by default in unittest http://bugs.python.org/issue10535 opened by ezio.melotti #10536: Enhancements to gettext docs http://bugs.python.org/issue10536 opened by eric.araujo #10537: IDLE crashes when you paste something. http://bugs.python.org/issue10537 opened by 5ragar5 #10538: PyArg_ParseTuple("s*") does not always incref object http://bugs.python.org/issue10538 opened by krisvale #10539: Regular expression not checking 'range' element on 1st char in http://bugs.python.org/issue10539 opened by TxRxFx #10540: test_shutil fails on Windows after r86733 http://bugs.python.org/issue10540 opened by brian.curtin #10541: regrtest.py -T broken http://bugs.python.org/issue10541 opened by doerwalter #10542: Py_UNICODE_NEXT and other macros for surrogates http://bugs.python.org/issue10542 opened by belopolsky #10543: Test discovery (unittest) does not work with jython http://bugs.python.org/issue10543 opened by michael.foord Most recent 15 issues with no replies (15) ========================================== #10543: Test discovery (unittest) does not work with jython http://bugs.python.org/issue10543 #10542: Py_UNICODE_NEXT and other macros for surrogates http://bugs.python.org/issue10542 #10541: regrtest.py -T broken http://bugs.python.org/issue10541 #10539: Regular expression not checking 'range' element on 1st char in http://bugs.python.org/issue10539 #10538: PyArg_ParseTuple("s*") does not always incref object http://bugs.python.org/issue10538 #10537: IDLE crashes when you paste something. http://bugs.python.org/issue10537 #10536: Enhancements to gettext docs http://bugs.python.org/issue10536 #10534: difflib.SequenceMatcher: expose junk sets, deprecate undocumen http://bugs.python.org/issue10534 #10531: write tilted text in turtle http://bugs.python.org/issue10531 #10530: distutils2 should allow the installing of python files with in http://bugs.python.org/issue10530 #10523: argparse has problem parsing option files containing empty row http://bugs.python.org/issue10523 #10522: test_telnet exception http://bugs.python.org/issue10522 #10514: configure does not create accurate Makefile http://bugs.python.org/issue10514 #10507: Check well-formedness of reST markup within "make patchcheck" http://bugs.python.org/issue10507 #10499: Modular interpolation in configparser http://bugs.python.org/issue10499 Most recent 15 issues waiting for review (15) ============================================= #10542: Py_UNICODE_NEXT and other macros for surrogates http://bugs.python.org/issue10542 #10540: test_shutil fails on Windows after r86733 http://bugs.python.org/issue10540 #10536: Enhancements to gettext docs http://bugs.python.org/issue10536 #10535: Enable warnings by default in unittest http://bugs.python.org/issue10535 #10527: multiprocessing.Pipe problem: "handle out of range in select() http://bugs.python.org/issue10527 #10524: Patch to add Pardus to supported dists in platform http://bugs.python.org/issue10524 #10521: str methods don't accept non-BMP fillchar on a narrow Unicode http://bugs.python.org/issue10521 #10518: Bring back callable() http://bugs.python.org/issue10518 #10515: csv sniffer does not recognize quotes at the end of line http://bugs.python.org/issue10515 #10512: regrtest ResourceWarning - unclosed sockets and files http://bugs.python.org/issue10512 #10509: PyTokenizer_FindEncoding can lead to a segfault if bad charact http://bugs.python.org/issue10509 #10504: Trivial mingw compile fixes http://bugs.python.org/issue10504 #10499: Modular interpolation in configparser http://bugs.python.org/issue10499 #10498: calendar.LocaleHTMLCalendar.formatyearpage() results in traceb http://bugs.python.org/issue10498 #10497: Incorrect use of gettext in argparse http://bugs.python.org/issue10497 Top 10 most discussed issues (10) ================================= #10461: Use with statement throughout the docs http://bugs.python.org/issue10461 27 msgs #7995: On Mac / BSD sockets returned by accept inherit the parent's F http://bugs.python.org/issue7995 24 msgs #10453: Add -h/--help option to compileall http://bugs.python.org/issue10453 24 msgs #9915: speeding up sorting with a key http://bugs.python.org/issue9915 14 msgs #9742: Python 2.7: math module fails to build on Solaris 9 http://bugs.python.org/issue9742 13 msgs #10533: Need example of using __missing__ http://bugs.python.org/issue10533 13 msgs #9509: argparse FileType raises ugly exception for missing file http://bugs.python.org/issue9509 12 msgs #10469: test_socket fails using Visual Studio 2010 http://bugs.python.org/issue10469 12 msgs #10504: Trivial mingw compile fixes http://bugs.python.org/issue10504 12 msgs #10518: Bring back callable() http://bugs.python.org/issue10518 12 msgs Issues closed (92) ================== #2244: urllib and urllib2 decode userinfo multiple times http://bugs.python.org/issue2244 closed by orsenthil #2986: difflib.SequenceMatcher not matching long sequences http://bugs.python.org/issue2986 closed by terry.reedy #3292: Position index limit; s.insert(i,x) not same as s[i:i]=[x] http://bugs.python.org/issue3292 closed by rhettinger #4493: urllib2 doesn't always supply / where URI path component is em http://bugs.python.org/issue4493 closed by orsenthil #4925: Improve error message of subprocess when cannot open http://bugs.python.org/issue4925 closed by benjamin.peterson #5353: Improve IndexError messages with actual values http://bugs.python.org/issue5353 closed by rhettinger #5412: extend configparser to support mapping access(__*item__) http://bugs.python.org/issue5412 closed by lukasz.langa #5616: Distutils 2to3 support doesn't have the doctest_only flag. http://bugs.python.org/issue5616 closed by eric.araujo #6166: encoding error for 'setup.py --author' when read via subproces http://bugs.python.org/issue6166 closed by eric.araujo #6378: Patch to make 'idle.bat' run idle.pyw using appropriate Python http://bugs.python.org/issue6378 closed by brian.curtin #6466: duplicate get_version() code between cygwinccompiler and emxcc http://bugs.python.org/issue6466 closed by eric.araujo #6722: collections.namedtuple: confusing example http://bugs.python.org/issue6722 closed by rhettinger #6799: mimetypes does not give canonical extension for guess_extensio http://bugs.python.org/issue6799 closed by eric.araujo #6878: changed return type from tkinter.Canvas.coords http://bugs.python.org/issue6878 closed by belopolsky #7212: Retrieve an arbitrary element from a set without removing it http://bugs.python.org/issue7212 closed by rhettinger #7226: IDLE right-clicks don't work on Mac OS 10.5 http://bugs.python.org/issue7226 closed by ned.deily #7257: Improve documentation of list.sort and sorted() http://bugs.python.org/issue7257 closed by rhettinger #7645: test_distutils fails on Windows XP http://bugs.python.org/issue7645 closed by brian.curtin #7770: sin/cos function in decimal-docs http://bugs.python.org/issue7770 closed by rhettinger #7804: test_readline failure http://bugs.python.org/issue7804 closed by pitrou #8078: add more baud constants to termios http://bugs.python.org/issue8078 closed by pitrou #8340: bytearray undocumented on trunk http://bugs.python.org/issue8340 closed by pitrou #8381: IDLE 2.6 freezes on OS X 10.6 http://bugs.python.org/issue8381 closed by ned.deily #8569: Upgrade OpenSSL in Windows builds http://bugs.python.org/issue8569 closed by brian.curtin #8590: test_httpservers.CGIHTTPServerTestCase failure on 3.1-maint Ma http://bugs.python.org/issue8590 closed by michael.foord #8631: subprocess.Popen.communicate(...) hangs on Windows http://bugs.python.org/issue8631 closed by brian.curtin #8645: PyUnicode_AsEncodedObject is undocumented http://bugs.python.org/issue8645 closed by belopolsky #8646: PyUnicode_EncodeDecimal is undocumented http://bugs.python.org/issue8646 closed by belopolsky #8647: PyUnicode_GetMax is undocumented http://bugs.python.org/issue8647 closed by eric.araujo #8705: shutil.rmtree with empty filepath http://bugs.python.org/issue8705 closed by brian.curtin #8938: Mac OS dialogs(Save As..., Load) translation http://bugs.python.org/issue8938 closed by ned.deily #9222: IDLE: Fix open/saveas 'Files of type' choices http://bugs.python.org/issue9222 closed by terry.reedy #9500: urllib2: Content-Encoding http://bugs.python.org/issue9500 closed by r.david.murray #9732: Addition of getattr_static for inspect module http://bugs.python.org/issue9732 closed by michael.foord #9746: All sequence types support .index and .count http://bugs.python.org/issue9746 closed by eric.araujo #9802: Document 'stability' of builtin min() and max() http://bugs.python.org/issue9802 closed by rhettinger #9807: deriving configuration information for different builds with t http://bugs.python.org/issue9807 closed by barry #9846: ZipExtFile provides no mechanism for closing the underlying fi http://bugs.python.org/issue9846 closed by lukasz.langa #9852: test_ctypes fail with clang http://bugs.python.org/issue9852 closed by ned.deily #9876: ConfigParser can't interpolate values from other sections http://bugs.python.org/issue9876 closed by lukasz.langa #9965: Loading malicious pickle may cause excessive memory usage http://bugs.python.org/issue9965 closed by georg.brandl #10134: test_email failures on Windows: end of line issue? http://bugs.python.org/issue10134 closed by r.david.murray #10138: calendar module does not support years outside [1, 9999] range http://bugs.python.org/issue10138 closed by belopolsky #10164: Add an assertBytesEqual to unittest and use it for bytes asser http://bugs.python.org/issue10164 closed by rhettinger #10172: code block has no syntax coloring http://bugs.python.org/issue10172 closed by georg.brandl #10183: test_concurrent_futures failure on Windows http://bugs.python.org/issue10183 closed by bquinlan #10255: refleak in initstdio http://bugs.python.org/issue10255 closed by pitrou #10299: Add index with links section for built-in functions http://bugs.python.org/issue10299 closed by ezio.melotti #10319: SocketServer.TCPServer truncates responses on close (in some s http://bugs.python.org/issue10319 closed by orsenthil #10325: PY_LLONG_MAX & co - preprocessor constants or not? http://bugs.python.org/issue10325 closed by mark.dickinson #10366: Remove unneeded '(object)' from 3.x class examples http://bugs.python.org/issue10366 closed by eric.araujo #10371: Deprecate trace module undocumented API http://bugs.python.org/issue10371 closed by belopolsky #10377: cProfile incorrectly labels its output http://bugs.python.org/issue10377 closed by orsenthil #10391: obj2ast's error handling can lead to python crashing with a C- http://bugs.python.org/issue10391 closed by benjamin.peterson #10420: Document of Bdb.effective is wrong. http://bugs.python.org/issue10420 closed by georg.brandl #10430: _sha.sha().digest() method is endian-sensitive. and hexdigest( http://bugs.python.org/issue10430 closed by krisvale #10437: ThreadPoolExecutor should accept max_workers=None http://bugs.python.org/issue10437 closed by stutzbach #10439: PyCodec C API is not documented in reST http://bugs.python.org/issue10439 closed by georg.brandl #10448: Add Mako template benchmark to Python Benchmark Suite http://bugs.python.org/issue10448 closed by pitrou #10450: Fix markup in Misc/NEWS http://bugs.python.org/issue10450 closed by eric.araujo #10458: 2.7 += re.ASCII http://bugs.python.org/issue10458 closed by terry.reedy #10459: missing character names in unicodedata (CJK...) http://bugs.python.org/issue10459 closed by loewis #10460: Misc/indent.pro does not reflect PEP 7 http://bugs.python.org/issue10460 closed by georg.brandl #10462: Handler.close is not called in subclass while Logger.removeHan http://bugs.python.org/issue10462 closed by vinay.sajip #10463: Wrong return type for xml.etree.ElementTree.parse() http://bugs.python.org/issue10463 closed by tiwoc #10465: gzip module calls getattr incorrectly http://bugs.python.org/issue10465 closed by georg.brandl #10467: io.BytesIO.readinto() segfaults when used on BytesIO object se http://bugs.python.org/issue10467 closed by benjamin.peterson #10468: Document UnicodeError access functions http://bugs.python.org/issue10468 closed by georg.brandl #10470: python -m unittest ought to default to discovery http://bugs.python.org/issue10470 closed by michael.foord #10471: include documentation in python docs and under python -h for o http://bugs.python.org/issue10471 closed by georg.brandl #10472: Strange tab key behaviour in interactive python 2.7 OSX 10.6.2 http://bugs.python.org/issue10472 closed by ned.deily #10473: Strange behavior for socket.timeout http://bugs.python.org/issue10473 closed by ned.deily #10474: range.count returns boolean http://bugs.python.org/issue10474 closed by benjamin.peterson #10476: __iter__ on a byte file object using a method to return an ite http://bugs.python.org/issue10476 closed by benjamin.peterson #10477: AttributeError: 'NoneType' object has no attribute 'name' (bo http://bugs.python.org/issue10477 closed by eric.araujo #10488: Improve documentation for 'float' built-in. http://bugs.python.org/issue10488 closed by mark.dickinson #10489: configparser: remove broken `__name__` support http://bugs.python.org/issue10489 closed by lukasz.langa #10490: mimetypes read_windows_registry fails for non-ASCII keys http://bugs.python.org/issue10490 closed by r.david.murray #10491: Insecure Windows python directory permissions http://bugs.python.org/issue10491 closed by loewis #10493: test_strptime failures under OpenIndiana http://bugs.python.org/issue10493 closed by jcea #10501: make_buildinfo regression with unquoted path http://bugs.python.org/issue10501 closed by krisvale #10505: test_compileall: failure on Windows http://bugs.python.org/issue10505 closed by eric.araujo #10506: argparse execute system exit in python prompt http://bugs.python.org/issue10506 closed by r.david.murray #10508: compiler warnings about formatting pid_t as an int http://bugs.python.org/issue10508 closed by georg.brandl #10511: heapq docs clarification http://bugs.python.org/issue10511 closed by georg.brandl #10520: Build with --enable-shared fails http://bugs.python.org/issue10520 closed by barry #10525: Added mouse and colour support to Game of Life curses demo http://bugs.python.org/issue10525 closed by orsenthil #10526: Minor typo in What's New in Python 2.7 http://bugs.python.org/issue10526 closed by georg.brandl #10345: fcntl.ioctl always fails claiming an invalid fd http://bugs.python.org/issue10345 closed by ned.deily #1059244: distutil bdist hardcodes the python location http://bugs.python.org/issue1059244 closed by eric.araujo #1574217: isinstance swallows exceptions http://bugs.python.org/issue1574217 closed by r.david.murray #1699853: locale.getlocale() output fails as setlocale() input http://bugs.python.org/issue1699853 closed by r.david.murray

1 0

Re: [Python-Dev] len(chr(i)) = 2?
by Alexander Belopolsky Nov. 25, 2010

Nov. 25, 2010

On Tue, Nov 23, 2010 at 2:18 PM, Amaury Forgeot d'Arc <amauryfa(a)gmail.com> wrote: .. >> Given the apparent difficulty of writing even basic text processing >> algorithms in presence of surrogate pairs, I wonder how wise it is to >> expose Python users to them. > > This was already discussed two years ago: > > http://mail.python.org/pipermail/python-dev/2008-July/080900.html > Thanks for the link. Let me summarize that discussion as I read it. The discussion starts with a reference to Guido's 2001 post which concluded with """ ... if we had wanted to use a variable-lenth internal representation, we should have picked UTF-8 way back, like Perl did. Moving to a UTF-16-based internal representation now will give us all the problems of the Perl choice without any of the benefits. """ [1] and proposes to move to USC-4 completely for Python 3.0. Note that this is not the option that I would like to discuss here. I don't propose to discuss abandoning narrow builds. Instead, I would like to discuss the costs and benefits associated with using variable width CES as an internal representation. This is where the 2008 discussion moved. OP did not realize that narrow build supported UTF-16 and like myself was surprised that application developers should be aware of surrogates if they want to use narrow builds. It was also suggested that Python itself is likely to have many bugs that can be triggered by non-BMP characters on narrow builds. Guido's response was: """ I'd also prefer to receive bug reports about breakages actually encountered in the wild than purely theoretical issues """ I don't think this is a good position to take. Programs that expect one code unit where Python may produce two are likely to have security holes. Even when programmers carefully sanitize their input, they are likely to do it at the code point level based on Unicode category and 0xFFFF boundary does not mean anything special for their applications. I think anyone who wants to write a robust application has two choices in practice: (a) use wide Unicode build; (b) restrict all text to BMP. Supporting surrogates at the application level is likely to be prohibitively expensive. It was later suggested that the main benefit of "UTF-16" builds is that they can easily interface with system libraries that are "UTF-16" based. However, how likely are these libraries be bug-free when it comes to non-BMP characters? The history teaches us that not very likely. Daniel Arbuckle presented arguments against imposing the burden of dealing with surrogates on application writers. [2] The recurrent theme on the thread was that non-BMP characters are rare and those who need them can afford the extra development cost associated with the surrogates. This point was very eloquently articulated by Guido: """ Who are the many here? Who are the few? I'd venture that (at least for the foreseeable future, say, until China will finally have taken over the role of the US as the de-facto dominant super power :-) the many are people whose app will never see a Unicode character outside the BMP, or who do such minimal string processing that their code doesn't care whether it's handling UTF-16-encoded data. """ [3] This argument can also be used to support the position that narrow builds should not support non-BMP characters. Later the discussion started resembling this thread when it went into a scholastic dispute over fine points in Unicode Standard terminology. :-) Then BDFL vetoed len(u"\U00012345") returning 1 on narrow builds. [4] I would be against that as well. I don't see len("\U00012345") == 2 as a big problem because application developers can simply avoid using \U literals if they don't want to support non-BMP characters. On the other hand, an option to warn users about non-BMP literals on a narrow build may be useful but it is easy to implement in lint-like tools. There were multiple suggestions for standard library additions to help application writers to deal with surrogate pairs, but as far as I can tell, nothing has been done in this area in the following two years. I don't think there is a recipe on how to fix legacy character-by-character processing loop such as for c in string: ... to make it iterate over code points consistently in wide and narrow builds. (Note that I am not asking for a grapheme iterator here. This is clearly an application level feature.) > So yes, wrap() and center() should be fixed. I opened an issue 10521 for that. [5] I am fully prepared to see it dismissed as "theoretical" and be closed with "won't fix" or linger indefinitely. Fixing it would most likely involve writing the second version of pad() utility function specifically for the narrow build. All examples I've seen in Python C code of dealing with surrogates came with hand-coded #ifndef Py_UNICODE_WIDE fragments and no user-friendly macros or APIs that would abstract it away. A quick grep for maxunicode in the standard library revealed only one case of "narrow-build aware" code: if sys.maxunicode != 65535: # XXX: negation does not work with big charsets return charset See Lib/sre_compile.py. Not exactly a model to follow. To conclude, I feel that rather than trying to fully support non-BMP characters as surrogate pairs in narrow builds, we should make it easier for application developers to avoid them. If abandoning internal use of UTF-16 is not an option, I think we should at least add an option for decoders that currently produce surrogate pairs to treat non-BMP characters as errors and handle them according to user's choice. [1] http://mail.python.org/pipermail/i18n-sig/2001-June/001107.html [2] http://mail.python.org/pipermail/python-dev/2008-July/080912.html [3] http://mail.python.org/pipermail/python-dev/2008-July/080940.html [4] http://mail.python.org/pipermail/python-dev/2008-July/080916.html [5] http://bugs.python.org/issue10521

5 10