[Python-Dev] New Features in Python 1.6

Sat, 1 Apr 2000 12:00:00 -0500 EST

New Features in Python 1.6
==========================

With the recent release of Python 1.6 alpha 1, a lot of people have
been wondering what's new.  This short note aims to explain the major
changes in Python 1.6.

Core Language
-------------

1. Unicode strings

Python strings can now be stored as Unicode strings.  To make it easier
to type Unicode strings, the single-quote character defaults to creating
a Unicode string, while the double-quote character defaults to ASCII
strings.  If you need to create a Unicode string with double quotes,
just preface it with the letter "u"; likewise, an ASCII string can be
created by prefacing single quotes with the letter "a".  For example:

   foo = 'hello'                # Unicode
   foo = "hello"                # ASCII
   foo = a'hello'               # ASCII
   foo = u"hello"               # Unicode

You can still use the "r" character to quote strings in a manner
convenient for the regular expression engine, but with subtle changes in
semantics: see "New regular expression engine" below for more
information.

Also, for compatibility with most editors and operating systems, 
Python source code is still 7-bit ASCII.  Thus, for portability it's
best to write Unicode strings using one of two new escapes: \u and \N.
\u lets you specify a Unicode character as a 16-bit hexadecimal number,
and \N lets you specify it by name:

    message = 'Bienvenue \N{LATIN SMALL LETTER A WITH GRAVE} ' \ +
              'Python fran\N{LATIN SMALL LETTER C WITH CEDILLA}ais!'
    message = 'Bienvenue \u00E0 Python fran\u00E7ais!'

2. string methods

Python strings have grown methods, just like lists and dictionaries.
For instance, to split a string on spaces, you can now say this:

    tokens = "foo bar baz".split(" ")

Or, equivalently, this:

    tokens = " ".split("foo bar baz")

(Python figures out which string is the delimiter and which is the
string to split by examining both strings to see which one occurs more
frequently inside the other.)

Be careful not to mix Unicode and ASCII strings when doing this, though.

Other examples:

    foo = "The quick red fox jumped over the lazy brown dog."
    foo.find("dog")
    foo.strip()    
    foo.lower()

Note that use of any string method on a particular string renders it
mutable.  This is for consistency with lists, which are mutable and have
methods like 'append()' and 'sort()' that modify the list.  Thus,
"foo.strip()" modifies the string 'foo' in-place.  "strip(foo)" retains
its old behavior of returning a modified copy of 'foo'.

3. extended call syntax

The variable argument list and keyword argument syntax introduced in
Python 1.3 has been extended.  Previously, it only worked in
function/method signatures; calling other functions with the same
arguments required the use of 'apply()'

    def spam(arg1,arg2,*more_args,**keyword_args):
        # ...
        apply(foo,(arg1,arg2) + more_args,keyword_args)

Now it works for calling functions too.  For consistency with C and C++,
asterisks in the function signature become ampersands in the function
body:

        foo(arg1,arg2,&more_args,&&keyword_args)

4. assignment to None now works

In previous version of Python, values assigned to None were lost.  For
example, this code:

    (username,None,None,None,realname,homedir,None) = getpwuid(uid)

would only preserve the user name, real name, and home directory fields
from a password file entry -- everything else of interest was lost.

In Python 1.6, you can meaningfully assign to None.  In the above
example, None would be replaced by a tuple containing the four values of
interest.

You can also use the variable argument list syntax here, for example:

    (username,password,uid,uid,*None) = getpwuid(uid)

would set None to a tuple containing the last three elements of the
tuple returned by getpwuid.

Library
-------

1. Distutils

In the past, lots of people have complained about the lack of a standard
mechanism for distributing and installing Python modules.  This has been
fixed by the Distutils, or Distribution Utilities.  We took the approach
of leveraging past efforts in this area rather than reinventing a number
of perfectly good wheels.

Thus, the Distutils take advantage of a number of "best-of-breed" tools
for distributing, configuring, building, and installing software.  The
core of the system is a set of m4 macros that augment the standard
macros supplied by GNU Autoconf.  Where the Autoconf macros generate
shell code that becomes a configure script, the Distutils macros
generate Python code that creates a Makefile.  (This is a similar idea
to Perl's MakeMaker system, but of course this Makefile builds Python
modules and extensions!)

Using the Distutils is easy: you write a script called "setup.in" which
contains both Autoconf and Distutils m4 macros; the Autoconf macros are
used to create a "configure" script which examines the target system to
find out how to build your extensions there, and the Distutils macros
create a "setup.py" script, which generates a Makefile that knows how to
build your particular collection of modules.  You process "setup.in"
before distributing your modules, and bundle the resulting "configure"
and "setup.py" with your modules.  Then, the user just has to run
"configure", "setup.py", and "make" to build everything.

For example, here's a small, simple "setup.in" for a hypothetical module
distribution that uses Autoconf to check for a C library "frob" and
builds a Python extension called "_frob" and a pure Python module
"frob":

    AC_INIT(frobmodule.c)
    AC_CHECK_HEADER(frob.h)
    AC_HAVE_LIBRARY(frob)
    AC_OUTPUT()

    DU_INIT(Frob,1.0)
    DU_EXTENSION(_frob,frobmodule.c,-lfrob)
    DU_MODULE(frob,frob.py)
    DU_OUTPUT(setup.py)

First, you run this setup.in using the "prepare_dist" script; this
creates "configure" and "setup.py":

    % prepare_dist

Next, you configure the package and create a makefile:

    % ./configure
    % ./setup.py

Finally, to create a source distribution, use the "sdist" target of the
generated Makefile:

    % make sdist

This creates Frob-1.0.tar.gz, which you can then share with the world.
A user who wishes to install your extension would download
Frob-1.0.tar.gz and create local, custom versions of the "configure" and
"setup.py" scripts:

    % gunzip -c Frob-1.0.tar.gz | tar xf -
    % cd Frob-1.0
    % ./configure
    % ./setup.py

Then, she can build and install your modules:

    % make
    % make install

Hopefully this will foster even more code sharing in the Python
community, and prevent unneeded duplication of effort by module
developers.

Note that the Python installer for Windows now installs GNU m4, the bash
shell, and Autoconf, so that Windows users will be able to use the
Distutils just like on Unix.

2. Imputils

Complementary to the Distutils are the Imputils, or Import Utilities.
Python's import mechanism has been reworked to make it easy for Python
programmers to put "hooks" into the code that finds and loads modules.
The default import mechanism now includes hooks, written in Python, to
load modules via HTTP from a known URL.

This has allowed us to drop most of the standard library from the
distribution.  Now, for example, when you import a less-commonly-needed
module from the standard library, Python fetches the code for you.  For
example, if you say

    import tokenize

then Python -- via the Imputils -- will fetch
http://modules.python.org/lib/tokenize.py for you and install it on your
system for future use.  (This is why the Python interpreter is now
installed as a setuid binary under Unix -- if you turn off this bit, you 
will be unable to load modules from the standard library!)

If you try to import a module that's not part of the standard library,
then the Imputils will find out -- again from modules.python.org --
where it can find this module.  It then downloads the entire relevant
module distribution, and uses the Distutils to build and install it on
your system.  It then loads the module you requested.  Simplicity
itself!

3. New regular expression engine

Python 1.6 includes a new regular expression engine, accessed through
the "sre" module, to support Unicode strings.  Be sure to use the *old*
engine for ASCII strings, though:

    import re, sre
    # ...
    re.match(r"(\d+)", "The number is 42.")   # ASCII
    sre.match(r'(\d+)', 'The number is \N{SUPERSCRIPT TWO}')  # Unicode

If you're not sure whether a string is ASCII or Unicode, you can always
determine this at runtime:

    from types import *
    # ...
    if type(s) is StringType:
        m = re.match(r"...", s)
    elif type(s) is UnicodeType:
        m = sre.match(r'...', s)