[Python-Dev] Python-dev summary for 2002-08-15 - 2002-09-01

Brett Cannon drifty@bigfoot.com
Sun, 1 Sep 2002 15:57:53 -0700 (PDT)


Yes, with Michael's permission, I am attempting to start up the Python-dev
summaries again.  Below is my attempt at summarizing the last half of
August.  It's longer then normal summaries, but that is because I bothered
to include discussions on threads that were not directly relating to the
Python core but are interesting nonetheless (e.g., the whole spambayes
thread).

I am posting to Python-dev first before posting to c.l.py, c.l.py.a (also
lwn.net and probably Slashdot) because I want to get the general okay from
the list that I have done a good enough of a job to send this out; I don't
want to have a summary that represents the going-ons here without the
general populace (or just the BDFL since he can overrule =) being okay
with it.  I am also curious as to whether I should go into more or less
detail, leave out the summaries that do not directly pertain to the Python
core, etc.

So please read the summary and let me know if you are okay with it.  If so
I will try to do semi-monthly summaries from now on.  Oh, and I am on
vacation right now and will be doing a lot of travelling in the next two
months, so I can't guarantee summaries will be this quick to come out for
a while.  I will do them, though, even if they are a week late.  =)

Oh, and if I do get the okay to do this, expect a lot of dumb questions
from me in the future in terms of clarifying things.  Just remember, it is
for the good of the Python community.  =)


=======================================


This is a summary of traffic on the python-dev mailing list between August
16, 2002 and September 1, 2002 (exclusive).  It is intended to inform the
wider Python community of ongoing developments.  To comment, just post to
python-list@python.org or comp.lang.python in the usual way. Give your
posting a meaningful subject line, and if it's about a PEP, include the
PEP number (e.g. Subject: PEP 201 - Lockstep iteration) All python-dev
members are interested in seeing ideas discussed by the community, so
don't hesitate to take a stance on a PEP if you have an opinion.

This is the first summary written by Brett Cannon.
Summaries are archived no where at the moment.  =)   They will be, though,
so stay tuned for the URL in future summaries.



   Posting distribution (with apologies to mbm, but thanks to mwh for the
code)

   Number of articles in summary: 585

    80 |                     [|]
       |                     [|]
       |                     [|]
       |                     [|]
       | [|]                 [|]
    60 | [|]             [|] [|]
       | [|]             [|] [|]
       | [|]             [|] [|]
       | [|]             [|] [|]
       | [|]             [|] [|]                 [|]
    40 | [|]         [|] [|] [|]                 [|]
       | [|]         [|] [|] [|]         [|]     [|]         [|]
       | [|]         [|] [|] [|]         [|]     [|]         [|] [|]
       | [|]         [|] [|] [|] [|]     [|]     [|]     [|] [|] [|]
       | [|]         [|] [|] [|] [|]     [|]     [|] [|] [|] [|] [|]
    20 | [|] [|]     [|] [|] [|] [|] [|] [|]     [|] [|] [|] [|] [|]
       | [|] [|]     [|] [|] [|] [|] [|] [|]     [|] [|] [|] [|] [|]
       | [|] [|]     [|] [|] [|] [|] [|] [|]     [|] [|] [|] [|] [|]
       | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|]
       | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|]
     0 +-071-025-012-042-063-084-030-021-039-009-047-027-033-041-036-005
        Fri 16| Sun 18| Tue 20| Thu 22| Sat 24| Mon 26| Wed 28| Fri 30|
            Sat 17  Mon 19  Wed 21  Fri 23  Sun 25  Tue 27  Thu 29  Sat 31



================
Type Categories
================
This VERY long thread was sparked by Andrew Koenig asking if a discussion
of making type categories more explicit had ever occured (Andrew meant for
category to mean "the set of all types that implement a particular marker
interface").  As Andrew later pointed out, he was asking about  "a way of
making notions such as 'file-like object' more formal and/or automatic".
The discussion quickly started using the term interface to mean defining a
way to specify that an object implemented certain methods (think of it in
terms of Java's 'implements' mechanism).  Once that was out of the way,
the discussion took off.  Zope's implementation was pointed out
(http://cvs.zope.org/Zope3/lib/python/Interface/) very quickly.  PEP 245
(Python Interface Syntax) was also brought to the attention of the list.
The idea of using inheritance to handle interfaces was brought up.  Guido
said that he hasn't "given up the hope that inheritance and interfaces
could use the same mechanisms.  But Jim Fulton, based on years of
experience in Zope, claims they really should be different" in terms of
how interfaces should be handled in objects.  Jeremy Hylton tried to
channel Jim's opinion by pointing out that "We'd like to use interfaces to
make fairly strong claims.  If a class A implements an interface I, then
we should be able to use an instance of A anywhere that an I is needed."
But "the inheritance mechanism is too general" because if a class A
implements interface I and then a class B, which does not implement I,
subclasses class A we end up with a class B that claims it has a certain
interface which it doesn't actually have.  Guido understood the point, but
still thought inheritence could be used "if there was a way to "shut off"
inheritance as far as isinstance() (or issubclass())" is concerned.  Guido
asked the simple question, "Why do keep arguing for inheritance?  (a) the
need to deny inheritance from an interface, while essential, is relatively
rare IMO, and in *most* cases the inheritance rules work just fine; (b)
having two separate but similar mechanisms makes the language larger."
Samuele Pedroni asked that any implementation "allow also for refering to
anonymous super-interfaces of an interface in terms of the interface plus
a subset of its signatures, also e.g. FileLike and just 'write'.  [that
means an interface can be thought to correspond to a set of
(tag,signature) tuples, where tag identifies the interface, and one can
also just consider subsets of it]".  The thread has finally seemed to have
stopped (for now) with Guido saying he is mulling the whole thing in the
back of his head.  This is a very sticky topic because of the number of
design decisions required and how it might change the way people program
in Python.

There was also a partial sub-thread in this whole discussion about
multimethods; basically a way to do overloading of methods based on
parameter signature.  Most of the discussion was over syntax and such and
how to handle resolution order.  It then seemed to go to the wayside when
the main part of the thread took over again.

==============================
type categories -- an example
==============================
This thread was starteed when Andrew Koenig said that the reason he
brought up his type category question was because he wanted a way so as to
be able to identify members of a type easily.  He now had an example in a
program he was writing where what the type of the argument was varied and
thus what needed to be done to the data changed accordingly.  Jermey
Hylton suggested the isinstance(obj, type(re.compile(''))) idiom.  Andrew
asked if this was guaranteed to work, which Jeremy said no.  I asked why
this was not guaranteed, and Frederick Lundh said because re.compile() is
a factory fxn and it is possible that a future version could return a
different object based on the pattern.

===============================================
Python build trouble with the new gcc/binutils
===============================================
Andrew Koenig said that he couldn't compile Python using the newest gcc
(this was the day after the latest release hit servers).  With help from
Zack Weinberg of Code Sourcery (who also recently rewrote the tempfile
module), the problem was tracked down to binutils 2.13. being the culprit
and was not Python's fault.

===================================
Last call: mortal interned strings
===================================
The patch python.org/sf/576101 removes the default immortality of interned
strings.  I believe it was in early August (possibly spilled over from
late July) when Oren Tirosh proposed the idea and wrote the above
mentioned patch.  There had been some discussion over whether any 3rd
party code was reliant upon interned strings being immortal; none was
found (MacPython was reliant upon it, but since it is under Python core
control it was considered a moot point since it could be changed).  It has
been checked in.  With the patch the way to make a string immortal is to
call PyString_InternImmortal(); no code in the core uses this function.

=====================================
PEP 218 (sets); moving set.py to Lib
=====================================
Thanks to Greg Wilson (for writing the PEP), Alex Martelli (for writing
the module initially), and Guido (for refactoring Alex's code) the stdlib
has now gained a sets module.  It has both the notion of mutable and
immutable sets (the latter used when you have a set of sets).  There was
discussion about how sets should print (sorted or not; unsorted is default
but option is there to print sorted) and what operators should be
overloaded for working on sets (| and & were chosen).  The module is a
beautiful chunk of code and I highly recommend reading its source.

===========================================
A few lessons from the tempfile.py rewrite
===========================================
Zack Weinberg, after rewriting the tempfile module, brought up three
points:
1) Lack of dummy threads, 2) lack of a pthreads_once equivalent, and 3)
lack of a way to skip tests from unittest.py via some built-in method.
Guido responded accordingly: 1) since some code uses the idiom of trying
to import thread and catching the exception if it fails, Guido said he
would be willing to accept a dummy_thread.py that would allow:

try:
    import thread as _thread
except ImportError:
    import dummy_thread as _thread

to work.  No word on whether this is being written at the moment.  2)
Guido said the method was, in his opinion, overkill.  He said to "be
Pythonic, live dangerously, accept the risk that a ^C can screw you.  It
can anyway. :-)".  And as for 3) Guido deferred Zack to the PyUnit list
and Steve Purcell since Python just tracks Steve's code (pyunit.sf.net).
Guido's suggestion was to stick code that was reliant on some other code
in a separate testing suite that is only run when the reliant code is
available.

===========================
Standard datetime objects?
===========================
Kevin Jacobs asked what stage the new datetime object was at.  Guido said
it is in python/nondist/sandbox/datetime/ in CVS which also has comments
pointing to a wiki containing the current work on it.  Fred L. Drake, Jr.
is working on the C re-implementation and Guido expects a checkin at any
moment (hasn't happened as of this writing).

===================
PEP 269 versus 283
===================
Jonathan Riehl noticed that PEP 283 said PEP 269 was dead; not good
considering he was close to having a patch for PEP 269 (pgen module to
interface with the C version).  Guido said he will revive the PEP.  The
patch has since been put on SF at python.org/sf/599331 .

==============================
What is a backport candidate?
==============================
Since Python 2.2 is going to be around for a long time, the question was
brought up of what constitutes code that should be backported.  Guido made
the following three points:

1) code trivial to backport should always be backported

2) code patcheing 2.3 code should obviously not be backported

3) 2.2 code requires changes to use patch, but applies; gradients of this
exist.

So please, when submitting patches, mention whether you think the patch
should be backported to the 2.2 tree and any possible dependencies it
might have in a backport.

=================================
python/nondist/sandbox/spambayes
=================================
In response to Paul Graham's spam filter written using Baye's Rule
(Slashdot post on it is at
http://developers.slashdot.org/article.pl?sid=02/08/16/1428238&tid=156), a
thread spawned around this checkin of code that followed that paper's
suggestions.  This thread quickly jumped into discussions on data
structures, Baye's Rule, and a whole lot of talk about spam.  Very
interesting if spam filtering interests you.  Tim Peters has been leading
the drive on this chunk of code (and thanks to his illness that befelled
him in late August which he has subsequently gotten over he had a few days
of major hacking on it; Tim showed he is a performance stats whore
<wink>).

A very cool quote came out of this thread from Eric S. Raymond when
discussing the spam filter he has been working on: "This is actually the
first new program I've coded in C (rather than
Python) in a good four years or so".

====================
Parsing vs. lexing.
====================
In response to a question by Aahz about what the differences were between
a lexer, parser, and tokenizer, Eric Raymond posted a good overview of the
differences.  Guido later commented in an email mentioning SPARK and about
how Python's lexer (pgen) works and why he wrote it.  He also made some
other comments on lexers.  Jeremy Hylton pointed out a "neat new paper
about an old algorithm for recursive descent parsers with backtracking and
unlimited lookahead" by Bryan Ford at http://www.brynosaurus.com/pub.html
.  Alex Martelli pointed out that this discussion reminded him of "a
long-ago interview with Borland's techies"  in which they said they were
able to make Borland PASCAL fit on a floppy while MS PASCAL took multiple
floppies.  Their trick was "we just did everything by the Dragon Book --
except that the parser is a hand-written recursive descent parser [Aho &c
being adamant defenders of Yacc & the like], which buys us a lot".
Someone named Noah also emailed a discussion on lexers and parsers pulling
in Finite State Machines, Push Down Autonoma, and Turing Machines in his
discussion.

Martin Sj?n says that Haskell's pattern matching and lazy evaluation makes
lexers easy (even a Recursive-Descent parser), but unfortunately Haskell
does not play with other languages nicely.  Haskell is where Python got
it's list comprehension idea.

=========================================
[Python-Dev] Fw: Security hole in rexec?
=========================================
It was brought to the attention of the list that deleting __builtins__
allowed a compromise in rexec.  Guido pointed out that
python.org/sf/577530 reports this.  He also said don't trust rexec.

A patch is going to be submitted to document the view that rexec is really
not that safe.

=================
A `cogen' module
=================
Francois Pinard asked about Cartesian products using the new sets module.
Guido didn't think people would in general need it.  Francois quickly
started this thread of discussing a cogen module to generate Cartesian
products and other ways of operating on sets.

=================
Mersenne Twister
=================
Raymond Hettinger volunteered to implement the Merseene Twister algorithm
(one in Python exists at www.math.keio.ac.jp/~matumoto/emt.html).  While
discussing to implement in C or Python, Guido noticed that random.Random
re-implements whrandom.  Guido then came up with the idea of writing a
base random class that is subclassed where .random() can be implemented;
Tim Peters agreed and suggested more methods to subclass.

=================================
New PEP Format: reStructuredText
=================================
David Goodger and Barry Warsaw have now gotten reST as a usable syntax for
PEPs.  Read the PEPs on the subject to learn more:

- PEP 12 -- Sample reStructuredText PEP Template
  (http://www.python.org/peps/pep-0012.html)

- PEP 258 -- Docutils Design Specification
  (http://www.python.org/peps/pep-0258.html)

- PEP 287 -- reStructuredText Docstring Format
  (http://www.python.org/peps/pep-0287.html)

====================================
tiny optimization in ceval mainloop
====================================
Jeremy Hylton noticed that in ceval that their is a test of whether the
ticker was 0 or if things_to_do was set to true (explanation of the
ticker, checkinterval, and the GIL follow this paragraph).  Jeremy
wondered if we could just drop the ticker to 0 when things_to_do is true.
Jack Janssen, though, pointed out that clearing it is not guaranteed since
there may be an interrupt routine when "we fiddle things_to_do".  Skip
Montanaro then pointed out that since neither ticker nor things_to_do is
fiddled with unless the GIL is held that instead of causing each thread to
execute this test that they could be made globals instead; he did a patch
that implements this (python.org/sf/602191).  Guido then said that if
there wasn't a decent speed improvement, then no patch would be checked
in.  He then changed his mind when it was pointed out that it actually
simplified the code.  Skip tested anyway, though, and there is a speed
improvement.  This also brought up whether the default value of 10 for
checkinterval was reasonable.  It was then agreed to be bumped up to 100.
Jack ran some code and said he noticed a definite improvement.

Python's version of threading is not like in C.  There is something called
the GIL (Global Interpreter Lock) which any thread wishing to execute
Python code or play with Python objects must hold.  This means that when
you have Python threads running (using the thread or threading module)
they are usually all waiting in line to get the GIL.  Now for Python to
decide when to release the GIL for another thread to grab it, it uses the
ticker.  This variable counts down to zero by being decremented every time
a Python opcode is executed (originally defaulted to 10, now defaulted to
100).  The ticker's starting value after each release of the GIL is what
sys.checkinterval() sets.

To get a better understanding of therading under Python I recommend
reading Aahz's tutorials on threading.