At 08:57 AM 10/20/2005 -0700, Guido van Rossum wrote:
>Whoa, folks! Can I ask the gentlemen to curb their enthusiasm?
>PEP 343 is still (back) on the drawing table, PEP 342 has barely been
>implemented (did it survive the AST-branch merge?), and already you
>are talking about adding more stuff. Please put on the brakes!
Sorry. I thought that 343 was just getting a minor tune-up. In the months
since the discussion and approval (and implementation; Michael Hudson
actually had a PEP 343 patch out there), I've been doing a lot of thinking
about how they will be used in applications, and thought that it would be a
good idea to promote people using task-specific variables in place of
globals or thread-locals.
The conventional wisdom is that global variables are bad, but the truth is
that they're very attractive because they allow you to have one less thing
to pass around and think about in every line of code. Without globals, you
would sooner or later end up with every function taking twenty arguments to
pass through states down to other code, or else trying to cram all this
data into some kind of "context" object, which then won't work with code
that doesn't know about *your* definition of what a context is.
Globals are thus extremely attractive for practical software
development. If they weren't so useful, it wouldn't be necessary to warn
people not to use them, after all. :)
The problem with globals, however, is that sometimes they need to be
changed in a particular context. PEP 343 makes it safer to use globals
because you can always offer a context manager that changes them
temporarily, without having to hand-write a try-finally block. This will
make it even *more* attractive to use globals, which is not a problem as
long as the code has no multitasking of any sort.
Of course, the multithreading scenario is usually fixed by using
thread-locals. All I'm proposing is that we replace thread locals with
task locals, and promote the use of task-local variables for managed
contexts (such as the decimal context) *that would otherwise be a global or
a thread-local variable*. This doesn't seem to me like a very big deal;
just an encouragement for people to make their stuff easy to use with PEP
342 and 343.
By the way, I don't know if you do much with Java these days, but a big
part of the whole J2EE fiasco and the rise of the so-called "lightweight
containers" in Java has all been about how to manage implicit context so
that you don't get stuck with either the inflexibility of globals or the
deadweight of passing tons of parameters around. One of the big selling
points of AspectJ is that it lets you implicitly funnel parameters from
point A to point B without having to modify all the call signatures in
between. In other words, its use is promoted for precisely the sort of
thing that 'with' plus a task variable would be ideal for. As far as I can
tell, 'with' plus a task variable is *much* easier to explain, use, and
understand than an aspect-oriented programming tool is! (Especially from
the "if the implementation is easy to explain, it may be a good idea"
>I know that somewhere in the proto-PEP Phillip argues that the context
>API needs to be made a part of the standard library so that his
>trampoline can efficiently swap implicit contexts required by
>arbitrary standard and third-party library code. My response to that
>is that library code (whether standard or third-party) should not
>depend on implicit context unless it assumes it can assume complete
>control over the application.
I think maybe there's some confusion here, at least on my part. :) I see
two ways to read your statement, one of which seems to be saying that we
should get rid of the decimal context (because it doesn't have complete
control over the application), and the other way of reading it doesn't seem
connected to what I proposed.
Anything that's a global variable is an "implicit context". Because of
that, I spent considerable time and effort in PEAK trying to utterly stamp
out global variables. *Everything* in PEAK has an explicit context. But
that then becomes more of a pain to *use*, because you are now stuck with
managing it, even if you cram it into a Zope-style acquisition tree so
there's only one "context" to deal with. Plus, it assumes that everything
the developer wants to do can be supplied by *one* framework, be it PEAK,
Zope, or whatever, which is rarely the case but still forces framework
developers to duplicate everybody else's stuff.
In other words, I've come to realize that the path the major Python
application frameworks is not really Pythonic. A Pythonic framework
shouldn't load you down with new management burdens and keep you from using
other frameworks. It should make life easier, and make your code *more*
interoperable, not less. Indeed, I've pretty much come to agreement with
the part of the Python developer community that has says Frameworks Are
Evil. A primary source of this evil in the big three frameworks (PEAK,
Twisted, and Zope) stem from their various approaches to dealing with this
issue of context, which lack the simplicity of global (or task-local)
So, the lesson I've taken from my attempt to make everything explicit is
that what developers *really* want is to have global variables, just
without the downsides of uncontrolled modifications, and inter-thread or
inter-task pollution. Explicit isn't always better than implicit, because
oftentimes the practicality of having implicit things is much more
important than the purity of making them all explicit. Simple is better
than complex, and task-local variables are *much* simpler than trying to
make everything explicit.
>Also, Nick wants the name 'context' for PEP-343 style context
>managers. I think it's overloading too much to use the same word for
>per-thread or per-coroutine context.
Actually, I was the one who originally proposed the term "context manager",
and it doesn't seem like a conflict to me. Indeed, I suggested in the
pre-PEP that "@context.manager" might be where we could put the
decorator. The overload was intentional, to suggest that when creating a
new context manager, it's worth considering whether the state should be
kept in a context variable, rather than a global variable. The naming
choice was for propaganda purposes, in other words. :)
Anyway, I'll withdraw the proposal for now. We can always leave it out of
2.5, I can release an independent implementation, and then submit it for
consideration again in the 2.6 timeframe. I just thought it would be a
no-brainer to use task locals where thread locals are currently being used,
and that's really all I was proposing we do as far as stdlib changes
anyway. I was also hoping to get good input from Python-dev regarding some
of the open issues, to try and build a consensus on them from the beginning.
> Speeding up list append calls
> A `comp.lang.python message from Tim Peters`_ prompted Neal Norwitz
> to investigate how the code that Tim posted could be sped up. He
> hacked the code to replace var.append() with the LIST_APPEND opcode,
Someone want a finite project that would _really_ help their Uncle
Timmy in his slow-motion crusade to get Python on the list of "solved
it!" languages for each problem on that magnificent site?
It turns out that many of the problems there have input encoded as
vast quantities of integers (stdin is a mass of decimal integers on
one or more lines). Most infamous for Python is this tutorial (you
don't get points for solving it) problem, which is _trying_ to test
whether your language of choice can read from stdin "fast enough":
The input begins with two positive integers n k (n, k<=10**7). The
next n lines of input contain one positive integer t_i, not greater
than 10**9, each.
Write a single integer to output, denoting how many integers t_i are
divisable by k.
There's an 8-second time limit, and I believe stdin is about 8MB
(you're never allowed to see the actual input they use). They have a
slower machine than you use ;-), so it's harder than it sounds. To
date, 975 people have submitted a program that passed, but only a few
managed to do it using Python. I did, and it required every trick in
the book, including using psyco.
Turns out it's _not_ input speed that's the problem here, and not even
mainly the speed of integer mod: the bulk of the time is spent in
int(string) (and, yes, that's also far more important to the problem
Neal was looking at than list.append time). If you can even track all
the levels of C function calls that ends up invoking <wink>, you find
yourself in PyOS_strtoul(), which is a nifty all-purpose routine that
accepts inputs in bases 2 thru 36, can auto-detect base, and does
platform-independent overflow checking at the cost of a division per
digit. All those are features, but it makes for sloooow conversion.
I assume it's the overflow-checking that's the major time sink, and
it's not correct anyway: it does the check slightly differently for
base 10 than for any other base, explained only in the checkin comment
for rev 2.13, 8 years ago:
For base 10, cast unsigned long to long before testing overflow.
This prevents 4294967296 from being an acceptable way to spell zero!
So what are the odds that base 10 was the _only_ base that had a "bad
input" case for the overflow-check method used? If you thought
"slim", you were right ;-) Here are other bad cases, under all Python
versions to date (on a 32-bit box; if sizeof(long) == 8, there are
different bad cases):
int('102002022201221111211', 3) = 0
int('32244002423141', 5) = 0
int('1550104015504', 6) = 0
int('211301422354', 7) = 0
int('12068657454', 9) = 0
int('1904440554', 11) = 0
int('9ba461594', 12) = 0
int('535a79889', 13) = 0
int('2ca5b7464', 14) = 0
int('1a20dcd81', 15) = 0
int('a7ffda91', 17) = 0
int('704he7g4', 18) = 0
int('4f5aff66', 19) = 0
int('3723ai4g', 20) = 0
int('281d55i4', 21) = 0
int('1fj8b184', 22) = 0
int('1606k7ic', 23) = 0
int('mb994ag', 24) = 0
int('hek2mgl', 25) = 0
int('dnchbnm', 26) = 0
int('b28jpdm', 27) = 0
int('8pfgih4', 28) = 0
int('76beigg', 29) = 0
int('5qmcpqg', 30) = 0
int('4q0jto4', 31) = 0
int('3aokq94', 33) = 0
int('2qhxjli', 34) = 0
int('2br45qb', 35) = 0
int('1z141z4', 36) = 0
IOW, the only bases that _aren't_ "bad" are powers of 2, and 10
because it's special-cased (BTW, I'm not sure that base 10 doesn't
have a different bad case now, but don't care enough to prove it one
way or the other).
Now fixing that is easy: the problem comes from being too clever,
doing both a multiply and an addition before checking for overflow.
Check each operation on its own and it would be bulletproof, without
special-casing. But that might be even slower (it would remove the
branch special-casing 10, but add a cheap integer addition overflow
check with its own branch).
The challenge (should you decide to accept it <wink>) is to replace
the overflow-checking with something both correct _and_ much faster
than doing n integer divisions for an n-character input. For example,
36**6 < 2**32-1, so whenever the input has no more than 6 digits
overflow is impossible regardless of base and regardless of platform.
That's simple and exploitable. For extra credit, make int(string) go
faster than preparing your taxes ;-)
BTW, Python as-is can be used to solve many (I'd bet most) of these
problems in the time limit imposed, although it may take some effort,
and it may not be possible without using psyco. A Python triumph I'm
particularly fond of:
The legend at the bottom:
Warning: large Input/Output data, be careful with certain languages
seems to be a euphemism for "don't even think about using Python" <0.9 wink>.
But there's a big difference in this one: it's a _hard_ problem,
requiring graph analysis, delicate computation, greater than
double-precision precision (in the end), and can hugely benefit from
preprocessing a batch of queries to plan out and minimize the number
of operations needed. Five people have solved it to date (click on
"Best Solutions"), and you'll see that my Python entry is the
second-fastest so far, beating 3 C++ entries by 3 excellent C++
programmers. I don't know what they did, but I suspect I was far more
willing to code up an effective but tedious "plan out and minimize"
phase _because_ I was using Python. I sure didn't beat them on
reading the mass quantities of integers from stdin <wink>.
I don't follow why the PEP deprecates catching a category of exceptions
in a different release than it deprecates raising them. Why would a
release allow catching something that cannot be raised? I must be
missing something here.
I find myself occasionally doing this:
... = dirname(dirname(dirname(p)))
I'm always--literally every time-- looking for a more functional form,
something that would be like this:
# apply dirname() 3 times on its results, initializing with p
... = repapply(dirname, 3, p)
There is a way to hack something like that with reduce, but it's not
pretty--it involves creating a temporary list and a lambda function:
... = reduce(lambda x, y: dirname(x), [p] + [None] * 3)
Just wondering, does anybody know how to do this nicely? Is there an
easy form that allows me to do this?
I've had this PEP laying around for quite a few months. It was inspired
by some code we'd written which wanted to be able to get immutable
versions of arbitrary objects. I've finally finished the PEP, uploaded
a sample patch (albeit a bit incomplete), and I'm posting it here to see
if there is any interest.
I tried "svn up" to bring my sandbox up-to-date and got this output:
% svn up
svn: Checksum mismatch for 'Objects/.svn/text-base/unicodeobject.c.svn-base'; expected: '8611dc5f592e7cbc6070524a1437db9b', actual: '2d28838f2fec366fc58386728a48568e'
What's that telling me?
Please bear with me for a few paragraphs ;-)
One aspect of str-type strings is the efficiency afforded when all the encoding really
is ascii. If the internal encoding were e.g. fixed utf-16le for strings, maybe with today's
computers it would still be efficient enough for most actual string purposes (excluding
the current use of str-strings as byte sequences).
I.e., you'd still have to identify what was "strings" (of characters) and what was really
byte sequences with no implied or explicit encoding or character semantics.
Ok, let's make that distinction explicit: Call one kind of string a byte sequence and the
other a character sequence (representation being a separate issue).
A unicode object is of course the prime _general_ representation of a character sequence
in Python, but all the names in python source code (that become NAME tokens) are UIAM
also character sequences, and representable by a byte sequence interpreted according to
For the sake of discussion, suppose we had another _character_ sequence type that was
the moral equivalent of unicode except for internal representation, namely a str
subclass with an encoding attribute specifying the encoding that you _could_ use
to decode the str bytes part to get unicode (which you wouldn't do except when necessary).
We could call it class charstr(str): ... and have chrstr().bytes be the str part and
chrstr().encoding specify the encoding part.
In all the contexts where we have obvious encoding information, we can then generate
a charstr instead of a str. E.g., if the source of module_a has
# -*- coding: latin1 -*-
cs = 'über-cool'
type(cs) # => <type 'charstr'>
cs.bytes # => '\xfcber-cool'
cs.encoding # => 'latin-1'
and print cs would act like print cs.bytes.decode(cs.encoding) -- or I guess
for the newline of the print.
Now if module_b has
# -*- coding: utf8 -*-
cs = 'über-cool'
and we interactively
import module_a, module_b
print module_a.cs + ' =?= ' + module_b.cs
what could happen ideally vs. what we have currently?
UIAM, currently we would just get the concatenation of
the three str byte sequences concatenated to make
'\xfcber-cool =?= \xc3\xbcber-cool'
and that would be printed as whatever that comes out as
without conversion when seen by the output according to
But if those cs instances had been charstr instances, the coding cookie
encoding information would have been preserved, and the interactive print could
have evaluated the string expression -- given cs.decode() as sugar for
(cs.bytes.decode(cs.encoding or globals().get('__encoding__') or
module_a.cs.decode() + ' =?= '.decode() + module_b.cs.decode()
if pairwise terms differ in encoding as they might all here. If the interactive
session source were e.g. latin-1, like module_a, then
module_a.cs + ' =?= '
would not require an encoding change, because the ' =?= ' would be a charstr instance
with encoding == 'latin-1', and so the result would still be latin-1 that far.
But with module_b.cs being utf8, the next addition would cause the .decode() promotions
to unicode. In a console window, the ' =?= '.encoding might be 'cp437' or such, and
the first addition would then cause promotion (since module_a.cs.encoding != 'cp437').
I have sneaked in run-time access to individual modules' encodings by assuming that
the encoding cookie could be compiled in as an explicit global __encoding__ variable
for any given module (what to have as __encoding__ for built-in modules could vary for
ISTM this could have use in situations where an encoding assumption is necessary and
currently 'ascii' is not as good a guess as one could make, though I suspect if string
literals became charstr strings instead of str strings, many if not most of those situations
would disappear (I'm saying this because ATM I can't think of an 'ascii'-guess situation that
wouldn't go away ;-) If there were a charchr() version of chr() that would result in
a charstr instead of a str, IWT one would want an easy-sugar default encoding assumption,
probably based on the same as one would assume for '%c' % num in a given module source
-- which presumably would be '%c'.encoding, where '%c' assumes the encoding of the module
source, normally recorded in __encoding__. So charchr(n) would act like chr(n).decode().encode(''.encoding) -- or more reasonably charstr(chr(n)), which would be
charstr(chr(n), globals().get('__encoding__') or __import__('sys').getdefaultencoding())
Or some efficient equivalent ;-)
Using strings in dicts requires hashing to find key comparison candidates and comparison to
check for key equivalence. This would seem to point to some kind of normalized hashing, but
not necessarily normalized key representation. Some is apparently happening, since
>>> hash('a') == hash(unicode('a'))
I don't know what would be worth the trouble to optimize string key usage where strings are
really all of one encoding vs totally general use vs a heavily biased mix. Or even if it could
be done without unreasonable complexity. Maybe a dict could be given an option to hash all
its keys as unicode vs whatever it does now. But having a charstr subtype of str would improve
the "implicit" conversions to unicode IMO.
Anyway, I wanted to throw in my .02USD re the implicit conversions, taking the view that
much of the implicitness could be based on reliable inferences from source encodings of
string literals or from their effects as format strings.
[not a normal subscriber to python-dev, so I'll have to google for any responses]
I'd like to start the subversion switchover this coming Wednesday,
with a total commit freeze at 16:00 GMT. If you have larger changes
to commit that you would like to commit before the switchover, but
after that date, please let me know.
At that point, I will set the repository to read-only (through a
commitinfo hook), and request that SF rolls a tarfile. I will then
notify you when the Subversion repository is online.
If you have sandboxes with modifications, it might be good to
cvs diff -u them now. I plan to keep the CVS up for a short while
after the switchover (about a month); after that point, you will
need to get the CVS tarball and retarget your sandbox to perform
I'm not aware of a procedure to convert a CVS sandbox into an SVN
one, so you will have to recheckout all your sandboxes after the
The StreamHandler available under the logging package is currently
catching all exceptions under the emit() method call. In the
Handler.handleError() documentation it's mentioned that it's
implemented like that because users do not care about errors
in the logging system.
I'd like to apply the following patch:
--- Lib/logging/__init__.py (revision 41357)
+++ Lib/logging/__init__.py (working copy)
@@ -738,6 +738,8 @@
self.stream.write(fs % msg.encode("UTF-8"))
+ except KeyboardInterrupt:
Anyone against the change?
After a few hours of tedious and frustrating hacking I've managed to
separate the Python abstract syntax tree parser from the rest of Python
itself. This could be useful for people who may wish to build Python
tools without Python, or tools in C/C++.
In the process of doing this, I came across a comment mentioning that
it would be desirable to separate the parser. Is there any interest in
doing this? I now have a vague idea about how to do this. Of course,
there is no point in making changes like this unless there is some
I will make my ugly hack available once I have polished it a little bit
more. It involved hacking header files to provide a "re-implementation"
of the pieces of Python that the parser needs (PyObject, PyString, and
PyInt). It likely is a bit buggy, and it doesn't support all the types
(notably, it is missing support for Unicode, Longs, Floats, and
Complex), but it works well enough to get the AST from a few simple
strings, which is what I wanted.