New Features in Python 1.6 ========================== With the recent release of Python 1.6 alpha 1, a lot of people have been wondering what's new. This short note aims to explain the major changes in Python 1.6. Core Language ------------- 1. Unicode strings Python strings can now be stored as Unicode strings. To make it easier to type Unicode strings, the single-quote character defaults to creating a Unicode string, while the double-quote character defaults to ASCII strings. If you need to create a Unicode string with double quotes, just preface it with the letter "u"; likewise, an ASCII string can be created by prefacing single quotes with the letter "a". For example: foo = 'hello' # Unicode foo = "hello" # ASCII foo = a'hello' # ASCII foo = u"hello" # Unicode You can still use the "r" character to quote strings in a manner convenient for the regular expression engine, but with subtle changes in semantics: see "New regular expression engine" below for more information. Also, for compatibility with most editors and operating systems, Python source code is still 7-bit ASCII. Thus, for portability it's best to write Unicode strings using one of two new escapes: \u and \N. \u lets you specify a Unicode character as a 16-bit hexadecimal number, and \N lets you specify it by name: message = 'Bienvenue \N{LATIN SMALL LETTER A WITH GRAVE} ' \ + 'Python fran\N{LATIN SMALL LETTER C WITH CEDILLA}ais!' message = 'Bienvenue \u00E0 Python fran\u00E7ais!' 2. string methods Python strings have grown methods, just like lists and dictionaries. For instance, to split a string on spaces, you can now say this: tokens = "foo bar baz".split(" ") Or, equivalently, this: tokens = " ".split("foo bar baz") (Python figures out which string is the delimiter and which is the string to split by examining both strings to see which one occurs more frequently inside the other.) Be careful not to mix Unicode and ASCII strings when doing this, though. Other examples: foo = "The quick red fox jumped over the lazy brown dog." foo.find("dog") foo.strip() foo.lower() Note that use of any string method on a particular string renders it mutable. This is for consistency with lists, which are mutable and have methods like 'append()' and 'sort()' that modify the list. Thus, "foo.strip()" modifies the string 'foo' in-place. "strip(foo)" retains its old behavior of returning a modified copy of 'foo'. 3. extended call syntax The variable argument list and keyword argument syntax introduced in Python 1.3 has been extended. Previously, it only worked in function/method signatures; calling other functions with the same arguments required the use of 'apply()' def spam(arg1,arg2,*more_args,**keyword_args): # ... apply(foo,(arg1,arg2) + more_args,keyword_args) Now it works for calling functions too. For consistency with C and C++, asterisks in the function signature become ampersands in the function body: foo(arg1,arg2,&more_args,&&keyword_args) 4. assignment to None now works In previous version of Python, values assigned to None were lost. For example, this code: (username,None,None,None,realname,homedir,None) = getpwuid(uid) would only preserve the user name, real name, and home directory fields from a password file entry -- everything else of interest was lost. In Python 1.6, you can meaningfully assign to None. In the above example, None would be replaced by a tuple containing the four values of interest. You can also use the variable argument list syntax here, for example: (username,password,uid,uid,*None) = getpwuid(uid) would set None to a tuple containing the last three elements of the tuple returned by getpwuid. Library ------- 1. Distutils In the past, lots of people have complained about the lack of a standard mechanism for distributing and installing Python modules. This has been fixed by the Distutils, or Distribution Utilities. We took the approach of leveraging past efforts in this area rather than reinventing a number of perfectly good wheels. Thus, the Distutils take advantage of a number of "best-of-breed" tools for distributing, configuring, building, and installing software. The core of the system is a set of m4 macros that augment the standard macros supplied by GNU Autoconf. Where the Autoconf macros generate shell code that becomes a configure script, the Distutils macros generate Python code that creates a Makefile. (This is a similar idea to Perl's MakeMaker system, but of course this Makefile builds Python modules and extensions!) Using the Distutils is easy: you write a script called "setup.in" which contains both Autoconf and Distutils m4 macros; the Autoconf macros are used to create a "configure" script which examines the target system to find out how to build your extensions there, and the Distutils macros create a "setup.py" script, which generates a Makefile that knows how to build your particular collection of modules. You process "setup.in" before distributing your modules, and bundle the resulting "configure" and "setup.py" with your modules. Then, the user just has to run "configure", "setup.py", and "make" to build everything. For example, here's a small, simple "setup.in" for a hypothetical module distribution that uses Autoconf to check for a C library "frob" and builds a Python extension called "_frob" and a pure Python module "frob": AC_INIT(frobmodule.c) AC_CHECK_HEADER(frob.h) AC_HAVE_LIBRARY(frob) AC_OUTPUT() DU_INIT(Frob,1.0) DU_EXTENSION(_frob,frobmodule.c,-lfrob) DU_MODULE(frob,frob.py) DU_OUTPUT(setup.py) First, you run this setup.in using the "prepare_dist" script; this creates "configure" and "setup.py": % prepare_dist Next, you configure the package and create a makefile: % ./configure % ./setup.py Finally, to create a source distribution, use the "sdist" target of the generated Makefile: % make sdist This creates Frob-1.0.tar.gz, which you can then share with the world. A user who wishes to install your extension would download Frob-1.0.tar.gz and create local, custom versions of the "configure" and "setup.py" scripts: % gunzip -c Frob-1.0.tar.gz | tar xf - % cd Frob-1.0 % ./configure % ./setup.py Then, she can build and install your modules: % make % make install Hopefully this will foster even more code sharing in the Python community, and prevent unneeded duplication of effort by module developers. Note that the Python installer for Windows now installs GNU m4, the bash shell, and Autoconf, so that Windows users will be able to use the Distutils just like on Unix. 2. Imputils Complementary to the Distutils are the Imputils, or Import Utilities. Python's import mechanism has been reworked to make it easy for Python programmers to put "hooks" into the code that finds and loads modules. The default import mechanism now includes hooks, written in Python, to load modules via HTTP from a known URL. This has allowed us to drop most of the standard library from the distribution. Now, for example, when you import a less-commonly-needed module from the standard library, Python fetches the code for you. For example, if you say import tokenize then Python -- via the Imputils -- will fetch http://modules.python.org/lib/tokenize.py for you and install it on your system for future use. (This is why the Python interpreter is now installed as a setuid binary under Unix -- if you turn off this bit, you will be unable to load modules from the standard library!) If you try to import a module that's not part of the standard library, then the Imputils will find out -- again from modules.python.org -- where it can find this module. It then downloads the entire relevant module distribution, and uses the Distutils to build and install it on your system. It then loads the module you requested. Simplicity itself! 3. New regular expression engine Python 1.6 includes a new regular expression engine, accessed through the "sre" module, to support Unicode strings. Be sure to use the *old* engine for ASCII strings, though: import re, sre # ... re.match(r"(\d+)", "The number is 42.") # ASCII sre.match(r'(\d+)', 'The number is \N{SUPERSCRIPT TWO}') # Unicode If you're not sure whether a string is ASCII or Unicode, you can always determine this at runtime: from types import * # ... if type(s) is StringType: m = re.match(r"...", s) elif type(s) is UnicodeType: m = sre.match(r'...', s)