On 21-Feb-06, at 11:21 AM, Almann T. Goo" <almann.goo(a)gmail.com>
>> Why not just use a class?
>> def incgen(start=0, inc=1) :
>> class incrementer(object):
>> a = start - inc
>> def __call__(self):
>> self.a += inc
>> return self.a
>> return incrementer()
>> a = incgen(7, 5)
>> for n in range(10):
>> print a(),
> Because I think that this is a workaround for a concept that the
> language doesn't support elegantly with its lexically nested scopes.
> IMO, you are emulating name rebinding in a closure by creating an
> object to encapsulate the name you want to rebind--you don't need this
> workaround if you only need to access free variables in an enclosing
> scope. I provided a "lighter" example that didn't need a callable
> object but could use any mutable such as a list.
> This kind of workaround is needed as soon as you want to re-bind a
> parent scope's name, except in the case when the parent scope is the
> global scope (since there is the "global" keyword to handle this).
> It's this dichotomy that concerns me, since it seems to be against the
> elegance of Python--at least in my opinion.
> It seems artificially limiting that enclosing scope name rebinds are
> not provided for by the language especially since the behavior with
> the global scope is not so. In a nutshell I am proposing a solution
> to make nested lexical scopes to be orthogonal with the global scope
> and removing a "wart," as Jeremy put it, in the language.
> Almann T. Goo
If I may be so bold, couldn't this be addressed by introducing a
"rebinding" operator? So the ' = ' operator would continue to create
a new name in the current scope, and the (say) ' := ' operator would
for an existing name to rebind. The two operators would highlight
the special way Python handles variable / name assignment, which many
(from someone who was surprised by this quirk of Python before:
a while ago, I wrote
> > Hopefully something can get hammered out so that at least the Python
> > 3 docs can premiere having been developed on by the whole community.
> why wait for Python 3 ?
> what's the current release plan for Python 2.5, btw? I cannot find a
> relevant PEP, and the "what's new" says "late 2005":
but I don't think that anyone followed up on this. what's the current
First off, thanks to Neil for writing this all down. The whole thread
of discussion on the bytes type was rather long and thus hard to
follow. Nice to finally have it written down in a PEP.
Anyway, a few comments on the PEP. One, should the hex() method
instead be an attribute, implemented as a property? Seems like static
data that is entirely based on the value of the bytes object and thus
is not properly represented by a method.
Next, why are the __*slice__ methods to be defined? Docs say they are
And for the open-ended questions, I don't think sort() is needed.
Lastly, maybe I am just dense, but it took me a second to realize that
it will most likely return the ASCII string for __str__() for use in
something like socket.send(), but it isn't explicitly stated anywhere.
There is a chance someone might think that __str__ will somehow
return the sequence of integers as a string does exist.
Guido van Rossen wrote:
> I think the pattern hasn't been commonly known; people have been
> struggling with setdefault() all these years.
I use setdefault _only_ to speed up the following code pattern:
if akey not in somedict:
somedict[akey] = list()
These lines of simple Python are much easier to read and write than
At 06:14 AM 2/22/2006 -0500, Jeremy Hylton wrote:
>On 2/22/06, Greg Ewing <greg.ewing(a)canterbury.ac.nz> wrote:
> > Mark Russell wrote:
> > > PEP 227 mentions using := as a rebinding operator, but rejects the
> > > idea as it would encourage the use of closures.
> > Well, anything that facilitates rebinding in outer scopes
> > is going to encourage the use of closures, so I can't
> > see that as being a reason to reject a particular means
> > of rebinding. You either think such rebinding is a good
> > idea or not -- and that seems to be a matter of highly
> > individual taste.
>At the time PEP 227 was written, nested scopes were contentious. (I
>recall one developer who said he'd be embarassed to tell his
>co-workers he worked on Python if it had this feature :-).
Was this because of the implicit "inheritance" of variables from the
>was more contentious, so the feature was left out. I don't think any
>particular syntax or spelling for rebinding was favored more or less.
> > On this particular idea, I tend to think it's too obscure
> > as well. Python generally avoids attaching randomly-chosen
> > semantics to punctuation, and I'd like to see it stay
> > that way.
Note that '.' for relative naming already exists (attribute access), and
Python 2.5 is already introducing the use of a leading '.' (with no name
before it) to mean "parent of the current namespace". So, using that
approach to reference variables in outer scopes wouldn't be without precedents.
IOW, I propose no new syntax for rebinding, but instead making variables'
context explicit. This would also fix the issue where right now you have
to inspect a function and its context to find out whether there's a closure
and what's in it. The leading dots will be quite visible.
I'm concerned that the on_missing() part of the proposal is gratuitous. The main use cases for defaultdict have a simple factory that supplies a zero, empty list, or empty set. The on_missing() hook is only there to support the rarer case of needing a key to compute a default value. The hook is not needed for the main use cases.
As it stands, we're adding a method to regular dicts that cannot be usefully called directly. Essentially, it is a framework method meant to be overridden in a subclass. So, it only makes sense in the context of subclassing. In the meantime, we've added an oddball method to the main dict API, arguably the most important object API in Python.
To use the hook, you write something like this:
def on_missing(self, key):
However, we can already do something like that without the hook:
def __getitem__(self, key):
return dict.__getitem__(self, key)
self[key] = value = somefunc(key)
The latter form is already possible, doesn't require modifying a basic API, and is arguably clearer about when it is called and what it does (the former doesn't explicitly show that the returned value gets saved in the dictionary).
Since we can already do the latter form, we can get some insight into whether the need has ever actually arisen in real code. I scanned the usual sources (my own code, the standard library, and my most commonly used third-party libraries) and found no instances of code like that. The closest approximation was safe_substitute() in string.Template where missing keys returned themselves as a default value. Other than that, I conclude that there isn't sufficient need to warrant adding a funky method to the API for regular dicts.
I wondered why the safe_substitute() example was unique. I think the answer is that we normally handle default computations through simple in-line code ("if k in d: do1() else do2()" or a try/except pair). Overriding on_missing() then is really only useful when you need to create a type that can be passed to a client function that was expecting a regular dictionary. So it does come-up but not much.
Aside: Why on_missing() is an oddball among dict methods. When teaching dicts to beginner, all the methods are easily explainable except this one. You don't call this method directly, you only use it when subclassing, you have to override it to do anything useful, it hooks KeyError but only when raised by __getitem__ and not other methods, etc. I'm concerned that evening having this method in regular dict API will create confusion about when to use dict.get(), when to use dict.setdefault(), when to catch a KeyError, or when to LBYL. Adding this one extra choice makes the choice more difficult.
My recommendation: Dump the on_missing() hook. That leaves the dict API unmolested and allows a more straight-forward implementation/explanation of collections.default_dict or whatever it ends-up being named. The result is delightfully simple and easy to understand/explain.
I'm writing a program in python that creates tar files of a certain
maximum size (to fit onto CD/DVD). One of the problems I'm running
into is that when using compression, it's pretty much impossible to
determine if a file, once added to an archive, will cause the archive
size to exceed the maximum size.
I believe that to do this properly, you need to copy the state of tar
file (basically the current file offset as well as the state of the
compression object), then add the file. If the new size of the archive
exceeds the maximum, you need to restore the original state.
The critical part is being able to copy the compression object.
Without compression it is trivial to determine if a given file will
"fit" inside the archive. When using compression, the compression
ratio of a file depends partially on all the data that has been
compressed prior to it.
The current implementation in the standard library does not allow you
to copy these compression objects in a useful way, so I've made some
minor modifications (patch attached) to the standard 2.4.2 library:
- Add copy() method to zlib compression object. This returns a new
compression object with the same internal state. I named it copy() to
keep it consistent with things like sha.copy().
- Add snapshot() / restore() methods to GzipFile and TarFile. These
work only in write mode. snapshot() returns a state object. Passing
in this state object to restore() will restore the state of the
GzipFile / TarFile to the state represented by the object.
- Decompression objects could use a copy() method too
- Add support for copying bzip2 compression objects
Although this patch isn't complete, does this seem like a good approach?
Greg Ewing wrote:
> In other words, just because A inherits from B in
> Python isn't meant to imply that an A is a drop-in
> replacement for a B.
Hmm - this is interesting. I'm not arguing Liskov violations or anything
However, *because* Python uses duck typing, I tend to feel that
subclasses in Python *should* be drop-in replacements. If it's not a
drop-in replacement, then it should probably not subclass, but just use
duck typing (probably by wrapping).
Subclassing implies a stronger relationship to me. Which is why I think
I prefer using a wrapper for a default dict, rather than a subclass.
While playing around with the defaultdict patch, adding __reduce__ to
make defaultdict objects properly copyable through the copy module, I
noticed that copy.py doesn't support copying function objects. This
seems an oversight, since the (closely related) pickle module *does*
support copying functions. The semantics of pickling a function is
that it just stores the module and function name in the pickle; that
is, if you unpickle it in the same process it'll just return a
reference to the same function object. This would translate into
"atomic" semantics for copying functions: the "copy" is just the
original, for shallow as well as deep copies. It's a simple patch:
--- Lib/copy.py (revision 42537)
+++ Lib/copy.py (working copy)
@@ -101,7 +101,8 @@
for t in (type(None), int, long, float, bool, str, tuple,
frozenset, type, xrange, types.ClassType,
d[t] = _copy_immutable
for name in ("ComplexType", "UnicodeType", "CodeType"):
t = getattr(types, name, None)
@@ -217,6 +218,7 @@
d[xrange] = _deepcopy_atomic
d[types.ClassType] = _deepcopy_atomic
d[types.BuiltinFunctionType] = _deepcopy_atomic
+d[types.FunctionType] = _deepcopy_atomic
def _deepcopy_list(x, memo):
y = 
Any objections? Given that these are picklable, I can't imagine there
are any but I thought I'd ask anyway.
--Guido van Rossum (home page: http://www.python.org/~guido/)