A critic of Guido's blog on Python's lambda
Ken Tilton
kentilton at gmail.com
Mon May 8 14:29:38 EDT 2006
[Sorry, i was just reading comp.lang.lisp, missed the following till
someone mentioned it in email. k]
Alex Martelli wrote:
> Carl Friedrich Bolz <cfbolz at gmx.de> wrote:
> ...
>
>>>an extension that allows the programmer to specify how the value of
>>>some slot (Lisp lingo for "member variable") can be computed. It
>>>frees the programmer from having to recompute slot values since Cells
>
> ...
>
>>I have not looked at Cells at all, but what you are saying here sounds
>>amazingly like Python's properties to me. You specify a function that
>>calculates the value of an attribute (Python lingo for something like a
>
>
> You're right that the above-quoted snipped does sound exactly like
> Python's properties, but I suspect that's partly because it's a very
> concise summary. A property, as such, recomputes the value each and
> every time, whether the computation is necessary or not; in other words,
> it performs no automatic caching/memoizing.
Right, and the first thing we did was simply that, no memoizing. The
second thing we did was memoize without tracking dependencies, updating
everything on each pass thru the eventloop. We knew both would not
(uh-oh) scale, but we wanted to see if the approach solved the
computational problem that started the whole research programme.
Soon enough we were tracking dependencies.
>
> A more interesting project might therefore be a custom descriptor, one
> that's property-like but also deals with "caching wherever that's
> possible". This adds interesting layers of complexity, some of them not
> too hard (auto-detecting dependencies by introspection), others really
> challenging (reliably determining what attributes have changed since
> last recomputation of a property). Intuition tells me that the latter
> problem is equivalent to the Halting Problem -- if somewhere I "see" a
> call to self.foo.zap(), even if I can reliably determine the leafmost
> type of self.foo, I'm still left with the issue of analyzing the code
> for method zap to find out if it changes self.foo on this occasion, or
> not -- there being no constraint on that code, this may be too hard.
"no constraint on the code" is a sufficient objection but also: if the
rule for the code is (in neutral pseudo-code)
if a is true
then return b
else return c
..you only want to depend on one of b or c at a time. (also, not do the
internal dependency tracking on, say, b if it happens to hold an
immutable value (part of the Cells API). Nice performance win, it turns out.
I just keep what I call a "datapulse ID", sequentially growing from
zero, in a global variable. Each ruled Cell keeps track of its memoized
value, datapulse stamp, and whether it in fact changed value in reaching
its current datapulse stamp. (I can reach the current datapulse stamp by
determining no dependency (direct or indirect recursively down the
dependency graph) is both more current than me /and/ in fact changed in
value getting there.[1] If not, I take on the current datapulse but
never run my rule. Or, if yes, I run my rule but might compute the same
value as last time. Either way, I can flag myself as current but
not-actually-changed.)
So we have push and pull. The whole thing gets kicked off by a setting
operation on some slot, who notifies dependents that they should
recompute. This is a cascade of further notifications if anyone notified
recomputes and in fact computes a different value. The pull comes in
while rules are running to make sure no obsolete value gets used. So the
dependency graph gets updated JIT during rule evaluations, possively
recursively so.
>
> The practical problem of detecting alterations may be softened by
> realizing that some false positives are probably OK -- if I know that
> self.foo.zap() *MAY* alter self.foo, I might make my life simpler by
> assuming that it *HAS* altered it. This will cause some recomputations
> of property-like descriptors' values that might theoretically have been
> avoided, "ah well", not a killer issue. Perhaps a more constructive
> approach would be: start by assuming the pseudoproperty always
> recomputes, like a real property would; then move toward avoiding SOME
> needless recomputations when you can PROVE they're needless. You'll
> never avoid ALL needless recomputations, but if you avoid enough of them
> to pay for the needed introspection and analysis, it may still be a win.
> As to whether it's enough of a real-world win to warrant the project, I
> pass -- in a research setting it would surely be a worthwhile study, in
> a production setting there are other optimizations that look like
> lower-hanging fruits to me. But, I'm sure the Cells people will be back
> with further illustrations of the power of their approach, beyond mere
> "properties with _some_ automatic-caching abilities".
Who knows, maybe something like that lies in the future of cells. It
would not be because of the need for update being undecidable, it would
be because the decision somehow would be too expensive compared to "Just
Run the Rule!"[2] It seems every new application brings up interesting
new requirements where Cells can usefully be extended with new
capabilities. But over time the code has gotten simpler and simpler, so
I think we are headed in the right direction.
kenny
[1] Aha! I see a flaw. Arises if two datapulses pass before I (a cell
<g>) get read and must determine if my cache is obsolete, and some
dependency changed in the first datapulse but not the second. I have to
enhance regression test suite and cure. One possible cure is akin to
your thinking: if I missed a generation, I have to assume it changed in
the missed generation. The alternatives are keeping a history of my
datapulses going back no further than the last true change, or having a
Cell keep a separate record of the last value used from each dependency.
The second could get expensive if I consult a lot of other cells,
while the first ... ah wait, I do not need a history, I just need one
new attribute: datapulse-of-latest-actual change. Sweet.
[2] I am making a mental note to look into that optimization.
More information about the Python-list
mailing list