[Python-Dev] Python-dev summary for 2002-08-15 - 2002-09-01

Guido van Rossum guido@python.org
Tue, 03 Sep 2002 12:53:45 -0400

> Yes, with Michael's permission, I am attempting to start up the Python-dev
> summaries again.  Below is my attempt at summarizing the last half of
> August.  It's longer then normal summaries, but that is because I bothered
> to include discussions on threads that were not directly relating to the
> Python core but are interesting nonetheless (e.g., the whole spambayes
> thread).
> I am posting to Python-dev first before posting to c.l.py, c.l.py.a (also
> lwn.net and probably Slashdot) because I want to get the general okay from
> the list that I have done a good enough of a job to send this out; I don't
> want to have a summary that represents the going-ons here without the
> general populace (or just the BDFL since he can overrule =) being okay
> with it.  I am also curious as to whether I should go into more or less
> detail, leave out the summaries that do not directly pertain to the Python
> core, etc.
> So please read the summary and let me know if you are okay with it.  If so
> I will try to do semi-monthly summaries from now on.  Oh, and I am on
> vacation right now and will be doing a lot of travelling in the next two
> months, so I can't guarantee summaries will be this quick to come out for
> a while.  I will do them, though, even if they are a week late.  =)
> Oh, and if I do get the okay to do this, expect a lot of dumb questions
> from me in the future in terms of clarifying things.  Just remember, it is
> for the good of the Python community.  =)

Thanks, Brett.  Minor comments ahead; but basically, go ahead --
don't let striving for perfection keep you from posting something good!

> =======================================
> This is a summary of traffic on the python-dev mailing list between August
> 16, 2002 and September 1, 2002 (exclusive).  It is intended to inform the
> wider Python community of ongoing developments.  To comment, just post to
> python-list@python.org or comp.lang.python in the usual way. Give your
> posting a meaningful subject line, and if it's about a PEP, include the
> PEP number (e.g. Subject: PEP 201 - Lockstep iteration) All python-dev
> members are interested in seeing ideas discussed by the community, so
> don't hesitate to take a stance on a PEP if you have an opinion.
> This is the first summary written by Brett Cannon.
> Summaries are archived no where at the moment.  =)   They will be, though,
> so stay tuned for the URL in future summaries.
>    Posting distribution (with apologies to mbm, but thanks to mwh for the
> code)
>    Number of articles in summary: 585
>     80 |                     [|]
>        |                     [|]
>        |                     [|]
>        |                     [|]
>        | [|]                 [|]
>     60 | [|]             [|] [|]
>        | [|]             [|] [|]
>        | [|]             [|] [|]
>        | [|]             [|] [|]
>        | [|]             [|] [|]                 [|]
>     40 | [|]         [|] [|] [|]                 [|]
>        | [|]         [|] [|] [|]         [|]     [|]         [|]
>        | [|]         [|] [|] [|]         [|]     [|]         [|] [|]
>        | [|]         [|] [|] [|] [|]     [|]     [|]     [|] [|] [|]
>        | [|]         [|] [|] [|] [|]     [|]     [|] [|] [|] [|] [|]
>     20 | [|] [|]     [|] [|] [|] [|] [|] [|]     [|] [|] [|] [|] [|]
>        | [|] [|]     [|] [|] [|] [|] [|] [|]     [|] [|] [|] [|] [|]
>        | [|] [|]     [|] [|] [|] [|] [|] [|]     [|] [|] [|] [|] [|]
>        | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|]
>        | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|]
>      0 +-071-025-012-042-063-084-030-021-039-009-047-027-033-041-036-005
>         Fri 16| Sun 18| Tue 20| Thu 22| Sat 24| Mon 26| Wed 28| Fri 30|
>             Sat 17  Mon 19  Wed 21  Fri 23  Sun 25  Tue 27  Thu 29  Sat 31

I'm not sure I care about this diagram.  It's also kind of hard to
read.  I would mind less if it was at the end of the summary.

> ================
> Type Categories
> ================
> This VERY long thread was sparked by Andrew Koenig asking if a discussion
> of making type categories more explicit had ever occured (Andrew meant for
> category to mean "the set of all types that implement a particular marker
> interface").  As Andrew later pointed out, he was asking about  "a way of
> making notions such as 'file-like object' more formal and/or automatic".
> The discussion quickly started using the term interface to mean defining a
> way to specify that an object implemented certain methods (think of it in
> terms of Java's 'implements' mechanism).  Once that was out of the way,
> the discussion took off.  Zope's implementation was pointed out
> (http://cvs.zope.org/Zope3/lib/python/Interface/) very quickly.  PEP 245
> (Python Interface Syntax) was also brought to the attention of the list.
> The idea of using inheritance to handle interfaces was brought up.  Guido
> said that he hasn't "given up the hope that inheritance and interfaces
> could use the same mechanisms.  But Jim Fulton, based on years of
> experience in Zope, claims they really should be different" in terms of
> how interfaces should be handled in objects.  Jeremy Hylton tried to
> channel Jim's opinion by pointing out that "We'd like to use interfaces to
> make fairly strong claims.  If a class A implements an interface I, then
> we should be able to use an instance of A anywhere that an I is needed."
> But "the inheritance mechanism is too general" because if a class A
> implements interface I and then a class B, which does not implement I,
> subclasses class A we end up with a class B that claims it has a certain
> interface which it doesn't actually have.  Guido understood the point, but
> still thought inheritence could be used "if there was a way to "shut off"
> inheritance as far as isinstance() (or issubclass())" is concerned.  Guido
> asked the simple question, "Why do keep arguing for inheritance?  (a) the
> need to deny inheritance from an interface, while essential, is relatively
> rare IMO, and in *most* cases the inheritance rules work just fine; (b)
> having two separate but similar mechanisms makes the language larger."
> Samuele Pedroni asked that any implementation "allow also for refering to
> anonymous super-interfaces of an interface in terms of the interface plus
> a subset of its signatures, also e.g. FileLike and just 'write'.  [that
> means an interface can be thought to correspond to a set of
> (tag,signature) tuples, where tag identifies the interface, and one can
> also just consider subsets of it]".  The thread has finally seemed to have
> stopped (for now) with Guido saying he is mulling the whole thing in the
> back of his head.  This is a very sticky topic because of the number of
> design decisions required and how it might change the way people program
> in Python.

Please break up that paragraph into pieces shorter than 12 lines
each. :-)

> There was also a partial sub-thread in this whole discussion about
> multimethods; basically a way to do overloading of methods based on
> parameter signature.  Most of the discussion was over syntax and such and
> how to handle resolution order.  It then seemed to go to the wayside when
> the main part of the thread took over again.
> ==============================
> type categories -- an example
> ==============================
> This thread was starteed when Andrew Koenig said that the reason he
> brought up his type category question was because he wanted a way so as to
> be able to identify members of a type easily.  He now had an example in a
> program he was writing where what the type of the argument was varied and
> thus what needed to be done to the data changed accordingly.  Jermey
> Hylton suggested the isinstance(obj, type(re.compile(''))) idiom.  Andrew
> asked if this was guaranteed to work, which Jeremy said no.  I asked why
> this was not guaranteed, and Frederick Lundh said because re.compile() is
> a factory fxn and it is possible that a future version could return a
> different object based on the pattern.
> ===============================================
> Python build trouble with the new gcc/binutils
> ===============================================
> Andrew Koenig said that he couldn't compile Python using the newest gcc
> (this was the day after the latest release hit servers).  With help from
> Zack Weinberg of Code Sourcery (who also recently rewrote the tempfile
> module), the problem was tracked down to binutils 2.13. being the culprit
> and was not Python's fault.
> ===================================
> Last call: mortal interned strings
> ===================================
> The patch python.org/sf/576101 removes the default immortality of interned
> strings.  I believe it was in early August (possibly spilled over from
> late July) when Oren Tirosh proposed the idea and wrote the above
> mentioned patch.  There had been some discussion over whether any 3rd
> party code was reliant upon interned strings being immortal; none was
> found (MacPython was reliant upon it, but since it is under Python core
> control it was considered a moot point since it could be changed).  It has
> been checked in.  With the patch the way to make a string immortal is to
> call PyString_InternImmortal(); no code in the core uses this function.
> =====================================
> PEP 218 (sets); moving set.py to Lib
> =====================================
> Thanks to Greg Wilson (for writing the PEP), Alex Martelli (for writing
> the module initially), and Guido (for refactoring Alex's code) the stdlib

You might add Raymond Hettinger who wrote the docs and did significant
work on the code after me.  Also Tim Peters who added some good speedups.

> has now gained a sets module.  It has both the notion of mutable and
> immutable sets (the latter used when you have a set of sets).  There was
> discussion about how sets should print (sorted or not; unsorted is default
> but option is there to print sorted)

This option is no longer documented though.  It may yet disappear.

>                                      and what operators should be
> overloaded for working on sets (| and & were chosen).  The module is a
> beautiful chunk of code and I highly recommend reading its source.


> ===========================================
> A few lessons from the tempfile.py rewrite
> ===========================================
> Zack Weinberg, after rewriting the tempfile module, brought up three
> points:
> 1) Lack of dummy threads, 2) lack of a pthreads_once equivalent, and 3)
> lack of a way to skip tests from unittest.py via some built-in method.
> Guido responded accordingly: 1) since some code uses the idiom of trying
> to import thread and catching the exception if it fails, Guido said he
> would be willing to accept a dummy_thread.py that would allow:
> try:
>     import thread as _thread
> except ImportError:
>     import dummy_thread as _thread
> to work.  No word on whether this is being written at the moment.  2)
> Guido said the method was, in his opinion, overkill.  He said to "be
> Pythonic, live dangerously, accept the risk that a ^C can screw you.  It
> can anyway. :-)".  And as for 3) Guido deferred Zack to the PyUnit list
> and Steve Purcell since Python just tracks Steve's code (pyunit.sf.net).
> Guido's suggestion was to stick code that was reliant on some other code
> in a separate testing suite that is only run when the reliant code is
> available.
> ===========================
> Standard datetime objects?
> ===========================
> Kevin Jacobs asked what stage the new datetime object was at.  Guido said
> it is in python/nondist/sandbox/datetime/ in CVS which also has comments
> pointing to a wiki containing the current work on it.  Fred L. Drake, Jr.
> is working on the C re-implementation and Guido expects a checkin at any
> moment (hasn't happened as of this writing).

Has now, in the sandbox (more to come).

> ===================
> PEP 269 versus 283
> ===================
> Jonathan Riehl noticed that PEP 283 said PEP 269 was dead; not good
> considering he was close to having a patch for PEP 269 (pgen module to
> interface with the C version).  Guido said he will revive the PEP.  The
> patch has since been put on SF at python.org/sf/599331 .
> ==============================
> What is a backport candidate?
> ==============================
> Since Python 2.2 is going to be around for a long time, the question was
> brought up of what constitutes code that should be backported.  Guido made
> the following three points:
> 1) code trivial to backport should always be backported
> 2) code patcheing 2.3 code should obviously not be backported

> 3) 2.2 code requires changes to use patch, but applies; gradients of this
> exist.
> So please, when submitting patches, mention whether you think the patch
> should be backported to the 2.2 tree and any possible dependencies it
> might have in a backport.
> =================================
> python/nondist/sandbox/spambayes
> =================================
> In response to Paul Graham's spam filter written using Baye's Rule
> (Slashdot post on it is at
> http://developers.slashdot.org/article.pl?sid=02/08/16/1428238&tid=156), a
> thread spawned around this checkin of code that followed that paper's
> suggestions.  This thread quickly jumped into discussions on data
> structures, Baye's Rule, and a whole lot of talk about spam.  Very
> interesting if spam filtering interests you.  Tim Peters has been leading
> the drive on this chunk of code (and thanks to his illness that befelled
> him in late August which he has subsequently gotten over he had a few days
> of major hacking on it; Tim showed he is a performance stats whore
> <wink>).
> A very cool quote came out of this thread from Eric S. Raymond when
> discussing the spam filter he has been working on: "This is actually the
> first new program I've coded in C (rather than
> Python) in a good four years or so".

(Several of us think even this didn't have to be coded in C after
all. :-)

> ====================
> Parsing vs. lexing.
> ====================
> In response to a question by Aahz about what the differences were between
> a lexer, parser, and tokenizer, Eric Raymond posted a good overview of the
> differences.  Guido later commented in an email mentioning SPARK and about
> how Python's lexer (pgen) works and why he wrote it.  He also made some
> other comments on lexers.  Jeremy Hylton pointed out a "neat new paper
> about an old algorithm for recursive descent parsers with backtracking and
> unlimited lookahead" by Bryan Ford at http://www.brynosaurus.com/pub.html
> .  Alex Martelli pointed out that this discussion reminded him of "a
> long-ago interview with Borland's techies"  in which they said they were
> able to make Borland PASCAL fit on a floppy while MS PASCAL took multiple
> floppies.  Their trick was "we just did everything by the Dragon Book --
> except that the parser is a hand-written recursive descent parser [Aho &c
> being adamant defenders of Yacc & the like], which buys us a lot".
> Someone named Noah also emailed a discussion on lexers and parsers pulling
> in Finite State Machines, Push Down Autonoma, and Turing Machines in his
> discussion.
> Martin Sj?n says that Haskell's pattern matching and lazy evaluation makes

Come on, you know his real name is Sjögren. :-)

> lexers easy (even a Recursive-Descent parser), but unfortunately Haskell
> does not play with other languages nicely.  Haskell is where Python got
> it's list comprehension idea.
> =========================================
> [Python-Dev] Fw: Security hole in rexec?
> =========================================
> It was brought to the attention of the list that deleting __builtins__
> allowed a compromise in rexec.  Guido pointed out that
> python.org/sf/577530 reports this.  He also said don't trust rexec.
> A patch is going to be submitted to document the view that rexec is really
> not that safe.

It was checked in.

> =================
> A `cogen' module
> =================
> Francois Pinard asked about Cartesian products using the new sets module.
> Guido didn't think people would in general need it.  Francois quickly
> started this thread of discussing a cogen module to generate Cartesian
> products and other ways of operating on sets.

Tim Peters quickly posted *his* elaborate state-of-the-art code, which
ended the discussion (as usual, posting code is a good way to stop
discussion :-).

> =================
> Mersenne Twister
> =================
> Raymond Hettinger volunteered to implement the Merseene Twister algorithm
> (one in Python exists at www.math.keio.ac.jp/~matumoto/emt.html).  While
> discussing to implement in C or Python, Guido noticed that random.Random
> re-implements whrandom.  Guido then came up with the idea of writing a
> base random class that is subclassed where .random() can be implemented;
> Tim Peters agreed and suggested more methods to subclass.
> =================================
> New PEP Format: reStructuredText
> =================================
> David Goodger and Barry Warsaw have now gotten reST as a usable syntax for
> PEPs.  Read the PEPs on the subject to learn more:
> - PEP 12 -- Sample reStructuredText PEP Template
>   (http://www.python.org/peps/pep-0012.html)
> - PEP 258 -- Docutils Design Specification
>   (http://www.python.org/peps/pep-0258.html)
> - PEP 287 -- reStructuredText Docstring Format
>   (http://www.python.org/peps/pep-0287.html)
> ====================================
> tiny optimization in ceval mainloop
> ====================================
> Jeremy Hylton noticed that in ceval that their is a test of whether the
> ticker was 0 or if things_to_do was set to true (explanation of the
> ticker, checkinterval, and the GIL follow this paragraph).  Jeremy
> wondered if we could just drop the ticker to 0 when things_to_do is true.
> Jack Janssen, though, pointed out that clearing it is not guaranteed since
> there may be an interrupt routine when "we fiddle things_to_do".  Skip
> Montanaro then pointed out that since neither ticker nor things_to_do is
> fiddled with unless the GIL is held that instead of causing each thread to
> execute this test that they could be made globals instead; he did a patch
> that implements this (python.org/sf/602191).  Guido then said that if
> there wasn't a decent speed improvement, then no patch would be checked
> in.  He then changed his mind when it was pointed out that it actually
> simplified the code.  Skip tested anyway, though, and there is a speed
> improvement.  This also brought up whether the default value of 10 for
> checkinterval was reasonable.  It was then agreed to be bumped up to 100.
> Jack ran some code and said he noticed a definite improvement.
> Python's version of threading is not like in C.  There is something called
> the GIL (Global Interpreter Lock) which any thread wishing to execute
> Python code or play with Python objects must hold.  This means that when
> you have Python threads running (using the thread or threading module)
> they are usually all waiting in line to get the GIL.  Now for Python to
> decide when to release the GIL for another thread to grab it, it uses the
> ticker.  This variable counts down to zero by being decremented every time
> a Python opcode is executed (originally defaulted to 10, now defaulted to
> 100).  The ticker's starting value after each release of the GIL is what
> sys.checkinterval() sets.
> To get a better understanding of therading under Python I recommend
> reading Aahz's tutorials on threading.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev

All in all, please keep this up!!!

--Guido van Rossum (home page: http://www.python.org/~guido/)