New Features in Python 1.6

New Features in Python 1.6 ========================== With the recent release of Python 1.6 alpha 1, a lot of people have been wondering what's new. This short note aims to explain the major changes in Python 1.6. Core Language ------------- 1. Unicode strings Python strings can now be stored as Unicode strings. To make it easier to type Unicode strings, the single-quote character defaults to creating a Unicode string, while the double-quote character defaults to ASCII strings. If you need to create a Unicode string with double quotes, just preface it with the letter "u"; likewise, an ASCII string can be created by prefacing single quotes with the letter "a". For example: foo = 'hello' # Unicode foo = "hello" # ASCII foo = a'hello' # ASCII foo = u"hello" # Unicode You can still use the "r" character to quote strings in a manner convenient for the regular expression engine, but with subtle changes in semantics: see "New regular expression engine" below for more information. Also, for compatibility with most editors and operating systems, Python source code is still 7-bit ASCII. Thus, for portability it's best to write Unicode strings using one of two new escapes: \u and \N. \u lets you specify a Unicode character as a 16-bit hexadecimal number, and \N lets you specify it by name: message = 'Bienvenue \N{LATIN SMALL LETTER A WITH GRAVE} ' \ + 'Python fran\N{LATIN SMALL LETTER C WITH CEDILLA}ais!' message = 'Bienvenue \u00E0 Python fran\u00E7ais!' 2. string methods Python strings have grown methods, just like lists and dictionaries. For instance, to split a string on spaces, you can now say this: tokens = "foo bar baz".split(" ") Or, equivalently, this: tokens = " ".split("foo bar baz") (Python figures out which string is the delimiter and which is the string to split by examining both strings to see which one occurs more frequently inside the other.) Be careful not to mix Unicode and ASCII strings when doing this, though. Other examples: foo = "The quick red fox jumped over the lazy brown dog." foo.find("dog") foo.strip() foo.lower() Note that use of any string method on a particular string renders it mutable. This is for consistency with lists, which are mutable and have methods like 'append()' and 'sort()' that modify the list. Thus, "foo.strip()" modifies the string 'foo' in-place. "strip(foo)" retains its old behavior of returning a modified copy of 'foo'. 3. extended call syntax The variable argument list and keyword argument syntax introduced in Python 1.3 has been extended. Previously, it only worked in function/method signatures; calling other functions with the same arguments required the use of 'apply()' def spam(arg1,arg2,*more_args,**keyword_args): # ... apply(foo,(arg1,arg2) + more_args,keyword_args) Now it works for calling functions too. For consistency with C and C++, asterisks in the function signature become ampersands in the function body: foo(arg1,arg2,&more_args,&&keyword_args) 4. assignment to None now works In previous version of Python, values assigned to None were lost. For example, this code: (username,None,None,None,realname,homedir,None) = getpwuid(uid) would only preserve the user name, real name, and home directory fields from a password file entry -- everything else of interest was lost. In Python 1.6, you can meaningfully assign to None. In the above example, None would be replaced by a tuple containing the four values of interest. You can also use the variable argument list syntax here, for example: (username,password,uid,uid,*None) = getpwuid(uid) would set None to a tuple containing the last three elements of the tuple returned by getpwuid. Library ------- 1. Distutils In the past, lots of people have complained about the lack of a standard mechanism for distributing and installing Python modules. This has been fixed by the Distutils, or Distribution Utilities. We took the approach of leveraging past efforts in this area rather than reinventing a number of perfectly good wheels. Thus, the Distutils take advantage of a number of "best-of-breed" tools for distributing, configuring, building, and installing software. The core of the system is a set of m4 macros that augment the standard macros supplied by GNU Autoconf. Where the Autoconf macros generate shell code that becomes a configure script, the Distutils macros generate Python code that creates a Makefile. (This is a similar idea to Perl's MakeMaker system, but of course this Makefile builds Python modules and extensions!) Using the Distutils is easy: you write a script called "setup.in" which contains both Autoconf and Distutils m4 macros; the Autoconf macros are used to create a "configure" script which examines the target system to find out how to build your extensions there, and the Distutils macros create a "setup.py" script, which generates a Makefile that knows how to build your particular collection of modules. You process "setup.in" before distributing your modules, and bundle the resulting "configure" and "setup.py" with your modules. Then, the user just has to run "configure", "setup.py", and "make" to build everything. For example, here's a small, simple "setup.in" for a hypothetical module distribution that uses Autoconf to check for a C library "frob" and builds a Python extension called "_frob" and a pure Python module "frob": AC_INIT(frobmodule.c) AC_CHECK_HEADER(frob.h) AC_HAVE_LIBRARY(frob) AC_OUTPUT() DU_INIT(Frob,1.0) DU_EXTENSION(_frob,frobmodule.c,-lfrob) DU_MODULE(frob,frob.py) DU_OUTPUT(setup.py) First, you run this setup.in using the "prepare_dist" script; this creates "configure" and "setup.py": % prepare_dist Next, you configure the package and create a makefile: % ./configure % ./setup.py Finally, to create a source distribution, use the "sdist" target of the generated Makefile: % make sdist This creates Frob-1.0.tar.gz, which you can then share with the world. A user who wishes to install your extension would download Frob-1.0.tar.gz and create local, custom versions of the "configure" and "setup.py" scripts: % gunzip -c Frob-1.0.tar.gz | tar xf - % cd Frob-1.0 % ./configure % ./setup.py Then, she can build and install your modules: % make % make install Hopefully this will foster even more code sharing in the Python community, and prevent unneeded duplication of effort by module developers. Note that the Python installer for Windows now installs GNU m4, the bash shell, and Autoconf, so that Windows users will be able to use the Distutils just like on Unix. 2. Imputils Complementary to the Distutils are the Imputils, or Import Utilities. Python's import mechanism has been reworked to make it easy for Python programmers to put "hooks" into the code that finds and loads modules. The default import mechanism now includes hooks, written in Python, to load modules via HTTP from a known URL. This has allowed us to drop most of the standard library from the distribution. Now, for example, when you import a less-commonly-needed module from the standard library, Python fetches the code for you. For example, if you say import tokenize then Python -- via the Imputils -- will fetch http://modules.python.org/lib/tokenize.py for you and install it on your system for future use. (This is why the Python interpreter is now installed as a setuid binary under Unix -- if you turn off this bit, you will be unable to load modules from the standard library!) If you try to import a module that's not part of the standard library, then the Imputils will find out -- again from modules.python.org -- where it can find this module. It then downloads the entire relevant module distribution, and uses the Distutils to build and install it on your system. It then loads the module you requested. Simplicity itself! 3. New regular expression engine Python 1.6 includes a new regular expression engine, accessed through the "sre" module, to support Unicode strings. Be sure to use the *old* engine for ASCII strings, though: import re, sre # ... re.match(r"(\d+)", "The number is 42.") # ASCII sre.match(r'(\d+)', 'The number is \N{SUPERSCRIPT TWO}') # Unicode If you're not sure whether a string is ASCII or Unicode, you can always determine this at runtime: from types import * # ... if type(s) is StringType: m = re.match(r"...", s) elif type(s) is UnicodeType: m = sre.match(r'...', s)

On Sat, 1 Apr 2000, Guido van Rossum wrote: New Features in Python 1.6 ========================== [lots 'n' lots] tokens = "foo bar baz".split(" ") tokens = " ".split("foo bar baz")
Has anyone started working up a style guide that'll recommend when to use these new methods, when to use the string module's calls, etc.? Ditto for the other changes --- where there are now two or more ways of doing something, how do I (or my students) tell which one is preferred? Greg p.s. "There's More Than One Way To Do It" == "No Matter How Much Of This Language You Learn, Other People's Code Will Always Look Strange"

Hi! Guido van Rossum on april 1st: [...]
With the recent release of Python 1.6 alpha 1, a lot of people have been wondering what's new. This short note aims to explain the major changes in Python 1.6. [...] Python strings can now be stored as Unicode strings. To make it easier to type Unicode strings, the single-quote character defaults to creating -------------------------------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ a Unicode string, while the double-quote character defaults to ASCII ----^^^^^^^^^^^^^^ strings.
As I read this my first thoughts were: "Huh? Is that really true? To me this sounds like a april fools joke. But to be careful I checked first before I read on: pf@artcom0:ttyp4 ~/archiv/freeware/python/CVS_01_04_00/dist/src 41> ./python Python 1.6a1 (#2, Apr 1 2000, 19:19:18) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
'a' 'a' 'ä' '\344' u'ä' u'\344'
Since www.python.org happens to be down at that moment, I was unable to check, whether my CVS tarball I downloaded from Davids starship account was recent enough and whether this single-quote-defaults-to-unicode has been discussed earlier before I got subscribed to python-dev. Better I should have read on first, before starting to wonder... [...]
tokens = "foo bar baz".split(" ") Or, equivalently, this: tokens = " ".split("foo bar baz")
(Python figures out which string is the delimiter and which is the string to split by examining both strings to see which one occurs more frequently inside the other.)
Now it becomes clearer that this *must* be an april fools joke! ;-) :
tokens = "foo bar baz".split(" ") print tokens ['foo', 'bar', 'baz'] tokens = " ".split("foo bar baz") print tokens [' ']
[...]
Note that use of any string method on a particular string renders it mutable. [...] For consistency with C and C++, asterisks in the function signature become ampersands in the function body: [...] load modules via HTTP from a known URL. [...] This has allowed us to drop most of the standard library from the distribution... [...] Pheeew... Oh Well. And pigs can fly. Sigh! ;-)
That was a well prepared April fools joke! Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen)

Peter Funk wrote:
Hi!
Guido van Rossum on april 1st:
[turns into a Perli for a moment - well done! ] ...
Since www.python.org happens to be down at that moment, I was unable to check, whether my CVS tarball I downloaded from Davids starship account was recent enough and whether this single-quote-defaults-to-unicode has been discussed earlier before I got subscribed to python-dev. Better I should have read on first, before starting to wonder...
You should not give up when python.org is down. As a fallback, I used to use www.cwi.nl which appears to be quite up-to-date. You can find the files and the *true* change list at http://www.cwi.nl/www.python.org/1.6/ Note that today is April 2, so you may believe me at-least-not-less-than-usually - ly y'rs - chris -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home

On Sun, 2 Apr 2000, Peter Funk wrote:
As I read this my first thoughts were: "Huh? Is that really true? To me this sounds like a april fools joke. But to be careful I checked first before I read on:
My favourite part was the distutils section. The great thing about this announcement is that it would have been almost believable if we were talking about any language other than Python! -- ?!ng "To be human is to continually change. Your desire to remain as you are is what ultimately limits you." -- The Puppet Master, Ghost in the Shell --JAA00536.954694740/skuld.lfw.org-- --JAB00536.954694740/skuld.lfw.org--

On Sat, Apr 01, 2000 at 12:00:00PM -0500, Guido van Rossum wrote:
Python strings can now be stored as Unicode strings. To make it easier to type Unicode strings, the single-quote character defaults to creating a Unicode string, while the double-quote character defaults to ASCII strings. If you need to create a Unicode string with double quotes, just preface it with the letter "u"; likewise, an ASCII string can be created by prefacing single quotes with the letter "a". For example:
foo = 'hello' # Unicode foo = "hello" # ASCII
Is single-quoting for creating unicode clever ? I think there might be a problem with old code when the operations on unicode strings are not 100% compatible to the standard string operations. I don't know if this is a real problem - it's just a point for discussion. Cheers, Andreas

Andreas Jung wrote:
On Sat, Apr 01, 2000 at 12:00:00PM -0500, Guido van Rossum wrote:
The above line has all the answers ;-) ...
Python strings can now be stored as Unicode strings. To make it easier to type Unicode strings, the single-quote character defaults to creating a Unicode string, while the double-quote character defaults to ASCII strings. If you need to create a Unicode string with double quotes, just preface it with the letter "u"; likewise, an ASCII string can be created by prefacing single quotes with the letter "a". For example:
foo = 'hello' # Unicode foo = "hello" # ASCII
Is single-quoting for creating unicode clever ? I think there might be a problem with old code when the operations on unicode strings are not 100% compatible to the standard string operations. I don't know if this is a real problem - it's just a point for discussion.
-- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

Mark Hammond wrote:
On Sat, Apr 01, 2000 at 12:00:00PM -0500, Guido van Rossum wrote:
The above line has all the answers ;-) ...
That was pretty sneaky tho! <grin> Had the added twist of being half-true...
... and on time like a CRON-job ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

In comp.lang.python, you wrote:
On Sat, Apr 01, 2000 at 12:00:00PM -0500, Guido van Rossum wrote:
Python strings can now be stored as Unicode strings. To make it easier to type Unicode strings, the single-quote character defaults to creating a Unicode string, while the double-quote character defaults to ASCII strings. If you need to create a Unicode string with double quotes, just preface it with the letter "u"; likewise, an ASCII string can be created by prefacing single quotes with the letter "a". For example:
foo = 'hello' # Unicode foo = "hello" # ASCII
Is single-quoting for creating unicode clever ? I think there might be a problem with old code when the operations on unicode strings are not 100% compatible to the standard string operations. I don't know if this is a real problem - it's just a point for discussion.
Cheers, Andreas
Hallo Andreas, hast Du mal auf das Datum des Beitrages von Guido geschaut? Echt guter April- Scherz, da er die Scherze sehr gut mit der Realität mischt. Liebe Grüße, auch an die anderen, Joachim

Not only was it an April fool's joke, but it wasn't mine! It was forged by an insider. I know by who, but won't tell, because it was so good. It shows that I can trust to delegate way more to the Python community than I think I can! :-) BTW, the biggest give-away that it wasn't mine was the absence of my standard sign-off line: --Guido van Rossum (home page: http://www.python.org/~guido/)

On 03 April 2000, Guido van Rossum said:
Not only was it an April fool's joke, but it wasn't mine! It was forged by an insider. I know by who, but won't tell, because it was so good. It shows that I can trust to delegate way more to the Python community than I think I can! :-)
BTW, the biggest give-away that it wasn't mine was the absence of my standard sign-off line:
--Guido van Rossum (home page: http://www.python.org/~guido/)
D'ohhh!!! Hasn't anyone noticed that the largest amount of text in the joke feature list was devoted to the Distutils? I thought *that* would give it away "fer shure". You people are *so* gullible! ;-) And for my next trick... *poof*! Greg

In comp.lang.python, you wrote:
You people are *so* gullible! ;-)
Well done. You had me going for a while. You had just enough truth in there. Guido releasing the alpha at that time helped your cause as well. Neil -- Tact is the ability to tell a man he has an open mind when he has a hole in his head.

[Greg Ward, fesses up]
Hasn't anyone noticed that the largest amount of text in the joke feature list was devoted to the Distutils? I thought *that* would give it away "fer shure". You people are *so* gullible! ;-)
Me too! My first suspect was me, but for the life of me, me couldn't remember writing that. You were only second on me list (it had to be one of us, as nobody else could have described legitimate Python features as if they had been implemented in Perl <0.9 wink>).
And for my next trick... *poof*!
Nice try. You're not only not invisible, I've posted your credit card info to a hacker list. crushing-guido's-enemies-cuz-he's-too-much-of-a-wuss-ly y'rs - tim
participants (12)
-
Andreas Jung
-
Christian Tismer
-
Greg Ward
-
Guido van Rossum
-
gvwilson@nevex.com
-
joachim@medien.tecmath.de
-
Ka-Ping Yee
-
M.-A. Lemburg
-
Mark Hammond
-
Neil Schemenauer
-
pf@artcom-gmbh.de
-
Tim Peters