[Python-3000] PEP 3108 - stdlib reorg/cleanup

Alex Martelli aleaxit at gmail.com
Tue Apr 29 17:05:51 CEST 2008


On Tue, Apr 29, 2008 at 5:10 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
   ...
>  Perhaps sched/mutex could be dumped in the Demo directory? Or perhaps we
> should just get rid of them entirely and see if anyone with a real use case
> complains - it's not like the modules will be particularly hard to dig out
> of SVN if we decide we want to keep them after all.

I have real use cases and I'm already complaining;-).  I'd rather not
get into very specific details -- after all for the last 3+ years I've
been working at Google in the area we now call "cluster management",
so you can imagine that my recent and most important use cases may be
systems that play important roles in the innards of Google clusters,
and it's not exactly stuff that Google likes to have me chat about.

But, somewhat abstractly, suppose that as a part of keeping clusters
healthy you are, in a certain generation N of your cluster management
infrastructure, periodically running (e.g. via cron) several scripts
that perform housekeeping "sysadm" tasks -- check that all vital
services are up and running and healthy, allocating replacement
machines (to ensure redundancy remains good) if some vital server has
keeled over and has been replaced by a "hot spare", ensuring
multiply-redundant backups of important data offsite, and the like. At
the next generation N+1 you rewrite all that morass of scripts
(originally a mix of bash, perl and python) into Python - so far so
good. But then you notice that using cron is far from optimal -- tasks
are performed at different periodicity and take different times, so
sometimes they end up overlapping and that's no good (they should be
sequenced...) so you introduce locking between processes... but still
that leaves times where several Python processes are alive (all but
one waiting on a lock) uselessly consuming machine resources (the
machines aren't all that limited, but, they ARE live-running servers,
so resources taken by "overhead" sysadm tasks must be minimized).

So, the big breakthrough: you rewrite the whole thing as one Python
daemon based on sched. No more locking, delicate bugs and race
conditions just disappear, more steady and predictable
resource-consumption footprint (you still want to avoid conditions
where the now long-running sysadm daemon grows memory footprint and
never shrinks again, but for those rare tasks which risk that you can
fork, do the work in the subprocess while the parent waits for it,
etc), AND, suddenly and wonderfully, a new higher level of testability
-- in a level of tests living between unit-tests and full
system-integration tests, you can use a simulated (accelerated)
timeline for sched, mock out just the parts that get information from
"the outside" and perform actions on it, and exercise the whole system
logic and workflow almost as well as a full system-integration test,
but orders of magnitude faster AND without requiring a whole cluster
to be devoted to the test...

Sure, if sched was taken away, I could just take it back and make it
part of the specific system rather than using it from the standard
library -- but this argument would apply to a vast majority of library
modules, particularly pure-Python ones; I think (I hope) we're just
"spring-cleaning the cruft", not drastically rethinking the whole idea
of "batteries included", right?-)

Besides Google, I know that many other shops are using Python for
cluster management and system administration tasks -- for example,
RackSpace was a sponsor at Pycon and busy trying to hire Pythonistas,
because their cluster management infrastructure software also appears
to be all-Python. Large shops like RackSpace or Google are least
affected by having to make sched part of their "own" code (though it
WOULD needlessly add one more epsilon to the inevitable resistance
that the 2.* -> 3.* migration will of course encounter, particularly
among conservative, reliability-is-all types such as sysadms), but a
lot of sysadm work happens in far smaller environments, often with
"part-time" admins for whom finding out that sched once existed and
was perfect for replacing cron in so many jobs, and then was removed
and can still be downloaded from X.Y.Z, would be a significant chore.

For tasks unrelated to system administration, consider for example the
very instructive
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/413137 : the
original "home cooked" solution (without sched) was really pretty bad
and unreliable -- then Raymond Hettinger came in and saved the day by
showing how to trivially use sched to do it RIGHT -- solid,
lightweight, reliable, no threads and locks and things, what more
could you ask for?

And then, if needed, we can discuss pure simulation (as opposed to
simulation-testing of systems designed to normally use the "real"
sched). But already it seems to me there are plenty of use cases to
justify retaining sched in the library...!


Alex


More information about the Python-3000 mailing list