[Python-ideas] asyncore: included batteries don't fit

Fri Oct 5 20:51:21 CEST 2012

On Wed, Oct 3, 2012 at 7:43 AM, chrysn <chrysn at fsfe.org> wrote:
> On Wed, Sep 26, 2012 at 10:02:24AM -0700, Josiah Carlson wrote:
>> Go ahead and read PEP 3153, we will wait.
>>
>> A careful reading of PEP 3153 will tell you that the intent is to make
>> a "light" version of Twisted built into Python. There isn't any
>> discussion as to *why* this is a good idea, it just lays out the plan
>> of action. Its ideas were gathered from the experience of the Twisted
>> folks.
>>
>> Their experience is substantial, but in the intervening 1.5+ years
>> since Pycon 2011, only the barest of abstract interfaces has been
>> defined (https://github.com/lvh/async-pep/blob/master/async/abstract.py),
>> and no discussion has taken place as to forward migration of the
>> (fairly large) body of existing asyncore code.
>
> it doesn't look like twisted-light to me, more like a interface
> suggestion for a small subset of twisted. in particular, it doesn't talk
> about main loops / reactors / registration-in-the-first-place.
>
> you mention interaction with the twisted people. is there willingness,
> from the twisted side, to use a standard python middle layer, once it
> exists and has sufficiently high quality?

>> To the point, Giampaolo already has a reactor that implements the
>> interface (more or less "idea #3" from his earlier message), and it's
>> been used in production (under staggering ftp(s) load). Even better,
>> it offers effectively transparent replacement of the existing asyncore
>> loop, and supports existing asyncore-derived classes. It is available:
>> https://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py
>
> i've had a look at it, but honestly can't say more than that it's good
> to have a well-tested asyncore compatible main loop with scheduling
> support, and i'll try it out for my own projects.
>
>> >> Again, at this point in time what you're proposing looks too vague,
>> >> ambitious and premature to me.
>> >
>> > please don't get me wrong -- i'm not proposing anything for immediate
>> > action, i just want to start a thinking process towards a better
>> > integrated stdlib.
>>
>> I am curious as to what you mean by "a better integrated stdlib". A
>> new interface that doesn't allow people to easily migrate from an
>> existing (and long-lived, though flawed) standard library is not
>> better integration. Better integration requires allowing previous
>> users to migrate, while encouraging new users to join in with any
>> later development. That's what Giampaolo's suggested interface offers
>> on the lowest level; something to handle file-handle reactors,
>> combined with a scheduler.
>
> a new interface won't make integration automatically happen, but it's
> something the standard library components can evolve on. whether, for
> example urllib2 will then automatically work asynchronously in that
> framework or whether we'll wait for urllib3, we'll see when we have it.

Things don't "automatically work" without work. You can't just make
urllib2 work asynchronously unless you do the sorts of greenlet-style
stack switching that lies to you about what is going on, or unless you
redesign it from scratch to do such. That's not to say that greenlets
are bad, they are great. But expecting that a standard library
implementing an updated async spec will all of a sudden hook itself
into a synchronous socket client? I think that expectation is
unreasonable.

> @migrate from an existing standard library: is there a big user base for
> the current asyncore framework? my impression from is that it is not
> very well known among python users, and most that could use it use
> twisted.

"Well known" is an interesting claim. I believe it actually known of
by quite a large part of the community, but due to a (perhaps
deserved) reputation (that may or may not still be the case), isn't
used as often as Twisted.

But along those lines, there are a few questions that should be asked:
1. Is it desirable to offer users the chance to transition from
asyncore-derived stuff to some new thing?
2. If so, what is necessary for an upgrade/replacement for
asyncore/asynchat in the long term?
3. Would 3rd parties use this as a basis for their libraries?
4. What are the short, mid, and long-term goals?

For my answers:
1. I think it is important to offer people who are using a standard
library module to continue using a standard library module if
possible.
2. A transition should offer either an adapter or similar-enough API
equivalency between the old and new.
3. I think that if it offers a reasonable API, good functionality, and
examples are provided - both as part of the stdlib and outside the
stdlib, people will see the advantages of maintaining less of their
own custom code. To the point: would Twisted use *whatever* was in the
stdlib? I don't know the answer, but unless the API is effectively
identical to Twisted, that transition may be delayed significantly.
4. Short: get current asyncore people transitioned to something
demonstrably better, that 3rd parties might also use. Mid: pull
parsers/logic out of cores of methods and make them available for
sync/async/3rd party parsing/protocol handling (get the best protocol
parsers into the stdlib, separated from the transport). Long: everyone
contributes/updates the stdlib modules because it has the best parsers
for protocols/formats, that can be used from *anywhere* (sync or
async).

My long-term dream (which has been the case for 6+ years, since I
proposed doing it myself on the python-dev mailing list and was told
"no") is that whether someone uses urllib2, httplib2, smtpd, requests,
ftplib, etc., they all have access to high-quality protocol-level
protocol parsers. So that once one person writes the bit that handles
http 30X redirects, everyone can use it. So that when one person
writes the gzip + chunked transfer encoding/decoding, everyone can use
it.

>> > we've talked about many things we'd need in a python asynchronous
>> > interface (not implementation), so what are the things we *don't* need?
>> > (so we won't start building a framework like twisted). i'll start:
>> >
>> > * high-level protocol handling (can be extra modules atop of it)
>> > * ssl
>> > * something like the twisted delayed framework (not sure about that, i
>> >   guess the twisted people will have good reason to use it, but i don't
>> >   see compelling reasons for such a thing in a minimal interface from my
>> >   limited pov)
>> > * explicit connection handling (retries, timeouts -- would be up to the
>> >   user as well, eg urllib might want to set up a timeout and retries for
>> >   asynchronous url requests)
>>
>> I disagree with the last 3. If you have an IO loop, more often than
>> not you want an opportunity to do something later in the same context.
>> This is commonly the case for bandwidth limiting, connection timeouts,
>> etc., which are otherwise *very* difficult to do at a higher level
>> (which are the reasons why schedulers are built into IO loops).
>> Further, SSL in async can be tricky to get right. Having the 20-line
>> SSL layer as an available class is a good idea, and will save people
>> time by not having them re-invent it (poorly or incorrectly) every
>> time.
>
> i see; those should be provided, then.
>
> i'm afraid i don't completely get the point you're making, sorry for
> that, maybe i've missed important statements or lack sufficiently deep
> knowledge of topics affected and got lost in details.
>
> what is your opinion on the state of asynchronous operations in python,
> and what would you like it to be?

I think it is functional, but flawed. I also think that every 3rd
party that does network-level protocols are different mixes of
functional and flawed. I think that there is a repeated and
often-times wasted effort where folks are writing different and
invariably crappy (to some extent) protocol parsers and network
handlers. I think that whenever possible, that should stop, and the
highest-quality protocol parsing functions/methods should be available
in the Python standard library, available to be called from any
library, whether sync, async, stdlib, or 3rd party.

Now, my discussions in the context of asyncore-related upgrades may
seem like a strange leap, but some of these lesser-quality parsing
routines exist in asyncore-derived classes, as well as
non-asyncore-derived classes. But if we make an effort on the asyncore
side of things, under the auspices of improving one stdlib module,
offering additional functionality, the obviousness of needing
protocol-level parsers shared among sync/async should become obvious
to *everyone* (that it isn't now the case I suspect is because the
communities either don't spend a lot of time cross-pollinating, people
like writing parsers - I do too ;) - or the sync folks end up going
the greenlet route if/when threading bites them on the ass).

Regards,
 - Josiah