Date: Mon, 03 Aug 2009 18:00:07 -0400
From: "Edward Z. Yang" <ezyang(a)MIT.EDU>
Subject: Re: [Twisted-Python] Deferred documentation rewrite
To: Twisted general discussion <twisted-python(a)twistedmatrix.com>
Content-Type: text/plain; charset=UTF-8
> - Asynchronous interaction to synchronous interaction
> - Delocalized execution (the parser example)
> - High level functions in Python review
I don't know if this is an aid but here goes: When I was preparing my
Pycon 2008 talk, amongst the many papers I read, was the Doug
Schmitt's concurrency design pattern papers including the original
Reactor Pattern paper. I also read the Microsoft "Cooperative Task Management without Manual Stack Management Event-driven Programming is Not the Opposite of Threaded Programming" paper. In the light of those papers,
this is the way I see things:
1) Reactors provide a portable form of non-preemptive multitasking. By
implication, reactors are schedulers.
2) If you buy Reactor as scheduler, then Deferreds can be viewed as representing a thread (or chain) of execution, each callback analogous to a continuation - the next address to resume execution when a result is ready.
3) One of the main differences between asychronous and synchronous processing becomes who is responsible for setting up shared state between
the links in the execution chain.
>Quite frankly, I'm stumped on "defining synchronous and asynchronous."
Simple definition: In a synchronous call, the caller blocks until a result is ready. Upon return the next statement is executed (barring something like an exception). In an asynchronous call, the caller does not wait for a result and continues.
> I just don't know what direction people are coming from.
I would suggest most people are trying to solve simple problems and want the least surprise. Unfortunately asynchronous programming has lots of
surprises. In case of terminology, try looking up how terms
like 'asynchronous' or 'synchronous' as used in a few of the more
popular network programming books.
1) I have a proxy server running on my computer on port 8888. It's
listening on localhost.
I am wondering whether Twisted.web.client.getPage has some sort of
proxy kwargs that can do this for me. ( I checked the source and I
really doubt it has something like that)
What are my alternatives? I have looked at low-level alternatives, and
ProxyClient doesn't really seem to fit my situation.
2) I sometimes have 2 proxy servers running. I know urllib has a
method which allows you to build opener, then you can access webpages
Does twisted have something like this which allows me to keep multiple
"proxy space" to access different websites at the same time on
different proxy servers?
I would like to help Twisted by adding documentation or reviewing existing
documentation. However, I don't see many (any?) unassigned tickets regarding
documentation of specific items, and because I am quite new to Twisted this
makes it hard for me to determine where you wish I would focus my attention.
Where should I focus my attention? Want to open some tickets for me to
claim? Is adding to API docs more important than updating the examples and
It looks like to add to the API docs you just update the doc strings for
functions and let someone let pydoctor do its magic later. Is that true? Is
there a special Twisted pydoctor incant to see how they'd look on the web
before doing any hasty patch-submitting?
I'd like to update
http://twistedmatrix.com/trac/wiki/ContributingToTwistedLabs to include some
of this information as well as help I got earlier for updating xhtml
Dear people of the twisted mailing list:
This release announcement is relevant to this list because Tahoe-LAFS
is built on Twisted's networking and concurrency framework. Also, we
strive to emulate the UQDS.
The Tahoe-LAFS team is pleased to announce the immediate availability of
version 1.5 of Tahoe, the Lofty Atmospheric File System.
Tahoe-LAFS is the first cloud storage technology which offers security
and privacy in the sense that the cloud storage service provider itself
can't read or alter your data. Here is the one-page explanation of
its unique security and fault-tolerance properties:
This release is the successor to v1.4.1, which was released April 13,
2009 . This is a major new release, improving the user interface and
performance and fixing a few bugs, and adding ports to OpenBSD, NetBSD,
ArchLinux, NixOS, and embedded systems built on ARM CPUs. See the NEWS
file  for more information.
In addition to the functionality of Tahoe-LAFS itself, a crop of related
projects have sprung up to extend it and to integrate it into operating
systems and applications. These include frontends for Windows,
Hadoop, and TiddlyWiki, and more. See the Related Projects page on the
Version 1.5 is fully compatible with the version 1 series of
Tahoe-LAFS. Files written by v1.5 clients can be read by clients of all
versions back to v1.0. v1.5 clients can read files produced by clients
of all versions since v1.0. v1.5 servers can serve clients of all
versions back to v1.0 and v1.5 clients can use servers of all versions
back to v1.0.
This is the sixth release in the version 1 series. The version 1 series
of Tahoe-LAFS will be actively supported and maintained for the
forseeable future, and future versions of Tahoe-LAFS will retain the
ability to read and write files compatible with Tahoe-LAFS v1.
The version 1 series of Tahoe-LAFS is the basis of the consumer backup
product from Allmydata, Inc. -- http://allmydata.com .
WHAT IS IT GOOD FOR?
With Tahoe-LAFS, you can distribute your filesystem across a set of
servers, such that if some of them fail or even turn out to be
malicious, the entire filesystem continues to be available. You can
share your files with other users, using a simple and flexible access
We believe that the combination of erasure coding, strong encryption,
Free/Open Source Software and careful engineering make Tahoe-LAFS safer
than RAID, removable drive, tape, on-line backup or other Cloud storage
This software comes with extensive tests, and there are no known
security flaws which would compromise confidentiality or data integrity
in typical use. (For all currently known issues please see the
known_issues.txt file .)
You may use this package under the GNU General Public License, version 2
or, at your option, any later version. See the file "COPYING.GPL" 
for the terms of the GNU General Public License, version 2.
You may use this package under the Transitive Grace Period Public
Licence, version 1 or, at your option, any later version. (The
Transitive Grace Period Public Licence has requirements similar to the
GPL except that it allows you to wait for up to twelve months after you
redistribute a derived work before releasing the source code of your
derived work.) See the file "COPYING.TGPPL.html"  for the terms of
the Transitive Grace Period Public Licence, version 1.
(You may choose to use this package under the terms of either licence,
at your option.)
Tahoe-LAFS works on Linux, Mac OS X, Windows, Cygwin, Solaris, *BSD, and
probably most other systems. Start with "docs/install.html" .
HACKING AND COMMUNITY
Please join us on the mailing list . Patches are gratefully accepted
-- the RoadMap page  shows the next improvements that we plan to make
and CREDITS  lists the names of people who've contributed to the
project. The Dev page  contains resources for hackers.
Tahoe-LAFS was originally developed thanks to the sponsorship of
Allmydata, Inc. , a provider of commercial backup services.
Allmydata, Inc. created the Tahoe-LAFS project and contributed hardware,
software, ideas, bug reports, suggestions, demands, and money (employing
several Tahoe-LAFS hackers and instructing them to spend part of their
work time on this Free Software project). Also they awarded customized
t-shirts to hackers who found security flaws in Tahoe-LAFS (see
http://hacktahoe.org ). After discontinuing funding of Tahoe-LAFS R&D in
early 2009, Allmydata, Inc. has continued to provide servers, co-lo
space and bandwidth to the open source project. Thank you to Allmydata,
Inc. for their generous and public-spirited support.
This is the second release of Tahoe-LAFS which was created solely as a
labor of love by volunteers; developer time is no longer funded by
allmydata.com (see  for details).
on behalf of the Tahoe-LAFS team
Special acknowledgment goes to Brian Warner, whose superb engineering
skills and dedication are primarily responsible for the Tahoe
implementation, and significantly responsible for the Tahoe design as
well, not to mention most of the docs and tests. Tahoe-LAFS wouldn't
exist without him.
August 1, 2009
Boulder, Colorado, USA
P.S. Just kidding about that acronym. "LAFS" actually stands for
"Lightweight Authorization File System". Or possibly for
"Least-Authority File System". There is no truth to the rumour that it
actually stands for "Long-lived Axe-tolerant File System".
Trying to implement a custom mail service I have run into the problem that
this ticket describes:
Ticket #3472 (new defect )
Opened 10 months ago
Last modified 10 months ago
twisted.mail.smtp sendmail() should [have] parameters to be passed for the
retry and timeout logic supported by SMTPClientFactory:
The method: def sendmail() spawns SMTPSenderFactories without a default
timeout value. These cause factories to wait forever if there is no response
from the remote server. On prolonged periods of time, stale
SMTPSenderFactories accumulates and will cause file descriptors to ran out
(Couldn't bind: 24: Too many open files.)
What would be the best way for me to work around this?
Specifically, when I use twisted.mail.smtp sendmail() for a lengthy list, I
fairly quickly get the "too many open files" message.
I tried using a coiterator to send small enough batches of emails, waiting
for each batch to be done before sending the next, because the "too many
files" error seemed to be caused by creating too many deferreds when a too
large batch of emails was sent at once. This worked in a test simulation,
but with real emails the system never completes the batch, it just hangs
waiting for sendmail to return.
Is there a best timeout mechanism I should use to force sendmail's return?
What I can think of so far are these:
option 1: Call setTimeout() on the deferred returned by sendmail despite
this warning in the source for Deferred.setTimeout():
"warnings.warn("Deferred.setTimeout is deprecated. Look for timeout support
specific to the API you are using instead.", DeprecationWarning,
or option2: Patch a timeout argument into sendmail to have it passed to
SMTPSenderFactory(from_addr, to_addrs, msg, d)? Would this work? I can't
figure out how or if SMTPSenderFactory would handle a timeout argument.
Sorry for the long message. Thanks for your advice!
I am writing some scraper scripts and need to pass them through an
intercepting proxy. getPage does not support a proxy argument and this code
I found on internet won't work with SSL proxy (stalls indefinitely):
def getPage(url, contextFactory=None, *args, **kwargs):
scheme, host, port, path = _parse(url)
factory = HTTPClientFactory(url, *args, **kwargs)
if 0: # use a proxy
host, port = 'localhost', 8080
factory.path = url
if scheme == 'https':
from twisted.internet import ssl
if contextFactory is None:
contextFactory = ssl.ClientContextFactory()
reactor.connectSSL(host, port, factory, contextFactory)
reactor.connectTCP(host, port, factory)
Plain http proxying works. My guess is that there is an issue with
self-signed or otherwise invalid certificate the http proxy supplies. Any
Applied IT sorcery.
Excerpts from Itamar Shtull-Trauring's message of Fri Jul 31 22:26:29 -0400 2009:
> The problem with this is that it perpetuates the misunderstanding the
> Deferreds *make* things asynchronous, even with the intro that says
> otherwise. I think it's better to assume already asynchronous code,
> handling the transition from synchronous to async in an intro event loop
Either way, the function that the first segment of the new docs do belong
somewhere. The documentation that traditionally served this purpose
has been removed.
As for perpetuating the misunderstanding of Deferreds making things
asynchronous, I completely agree! However, I think this is something
that can be fixed by spelling out the distinction between "writing
asynchronous code" and "interacting with asynchronous code", and not
just omitting the important paradigm shift that comes with sync->async.
> A better comparative exposition might be with normal callbacks, e.g.:
> "def foo(x, gotResultCallback): pass" vs. "def foo(x): # return
> At the very least having that async but callbacky version in the middle
> helps understanding.
I briefly gloss on this, but I agree that this is an important point
that could be further expanded. We could have implemented asynchronous
mechanisms using normal callbacks, but we decided to use Deferreds instead.
It's not clear to me if the common case of confusion of Deferreds occurs
in people who "know callbacks" but "don't know Deferreds." As an incoming
developer who was familiar with asynchronous programming, my primary problem
was the ill-defined behavior of callback chains (which I resolved by
hunkering down and reading the source code) rather than any fundamental
misunderstanding of what Deferreds were supposed to do.
> It also omits half the story: how you *create* Deferreds. There should
> be a section on that as well.
I agree. In fact, it might be worth making the document a little longer
to address this point, because I realize now that even if you're not
writing asynchronous code, you'll often need to baton Deferreds to
make the execution flow work the way you want them to.
> An example involving a parser, where you just wave your hands about who
> pushes data in to the parser exactly (so no need to go into event loop
> details), may work well. In particular, the object that wants the result
> of the parsing wants to get parse errors, *not* whoever pushes data in.
> Often it's same object, but not always. Deferreds help with that.
That's a good distinction. I'll see how I can work that in.