[Python-Dev] API design: where to add async variants of existing stdlib APIs?
Nathaniel Smith
njs at pobox.com
Tue Mar 7 19:17:03 EST 2017
On Tue, Mar 7, 2017 at 9:41 AM, Brett Cannon <brett at python.org> wrote:
> I don't think a common practice has bubbled up yet for when there's both
> synchronous and asynchronous versions of an API (closest I have seen is
> appending an "a" to the async version but that just looks like a spelling
> mistake to me most of the time). This is why the question of whether
> separate modules are a better idea is coming up.
For the CSV case, it might be sensible to factor out the io. Like,
provide an API that looks like:
pushdictreader = csv.PushDictReader()
while pushdictreader:
chunk = read_some(...)
pushdictreader.push(chunk)
for row in pushdictreader:
...
This API can now straightforwardly be used with sync and async code.
Of course you'd want to wrap it up in a nicer interface, somewhere in
the ballpark of:
def sync_rows(read_some):
pushdictreader = csv.PushDictReader()
while pushdictreader:
chunk = read_some(...)
pushdictreader.push(chunk)
for row in pushdictreader:
yield row
async def async_rows(read_some):
pushdictreader = csv.PushDictReader()
while pushdictreader:
chunk = await read_some(...)
pushdictreader.push(chunk)
for row in pushdictreader:
yield row
So there'd still be a bit of code duplication, but much much less.
Essentially the idea here is to convert the csv module to sans-io
style (http://sans-io.readthedocs.io/).
Another option is to make it all-async internally, and then offer a
sync facade around it. So like start with the natural all-async
interface:
class AsyncFileLike(ABC):
async def async_read(...):
...
class AsyncDictReader:
def __init__(self, async_file_like):
self._async_file_like = async_file_like
async def __anext__(self):
...
And (crucially!) let's assume that the only way AsyncDictReader
interacts with the coroutine runner is by calls to
self._async_file_like.async_read. Now we can pass in a
secretly-actually-synchronous AsyncFileLike and make a synchronous
facade around the whole thing:
class AsyncSyncAdapter(AsyncFileLike):
def __init__(self, sync_file_like):
self._sync_file_like = sync_file_like
# Technically an async function, but guaranteed to never yield
async def read(self, *args, **kwargs):
return self._sync_file_like.read(*args, **kwargs)
# Minimal coroutine supervisor: runs async_fn(*args, **kwargs), which
must never yield
def syncify(async_fn, *args, **kwargs):
coro = async_fn(*args, **kwargs)
it = coro.__await__()
return next(it)
class DictReader:
def __init__(self, sync_file_like):
# Technically an AsyncDictReader, but guaranteed to never yield
self._async_dict_reader =
AsyncDictReader(AsyncSyncAdapter(sync_file_like))
def __next__(self):
return syncify(self._async_dict_reader.__anext__)
So here we still have some goo around the edges of the module, but the
actual CSV logic only has to be written once, and can still be written
in a "pull" style where it does its own I/O, just like it is now.
This is basically another approach to writing sans-io protocols, with
the annoying trade-off that it means even your synchronous version
requires Python 3.5+. But for a stdlib module that's no big deal...
-n
> On Tue, 7 Mar 2017 at 02:24 Michel Desmoulin <desmoulinmichel at gmail.com>
> wrote:
>>
>> Last week I had to download a CSV from an FTP and push any update on it
>> using websocket so asyncio was a natural fit and the network part went
>> well.
>>
>> The surprise was that the CSV part would not work as expected. Usually I
>> read csv doing:
>>
>> import csv
>>
>> file_like_object = csv_crawler.get_file()
>> for row in csv.DictReader(file_like_object)
>>
>> But it didn't work because file_like_object.read() was a coroutine which
>> the csv module doesn't handle.
>>
>> So I had to do:
>>
>> import csv
>> import io
>>
>> raw_bytes = await stream.read(10000000)
>> wrapped_bytes = io.BytesIO(raw_bytes)
>> text = io.TextIOWrapper(wrapped_bytes, encoding=encoding,
>> errors='replace')
>>
>> for i, row in enumerate(csv.DictReader(text)):
>>
>> Turns out I used asyncio a bit, and I now the stdlib, the io AIP, etc.
>> But for somebody that doesn't, it's not very easy to figure out. Plus
>> it's not as elegant as traditional Python. Not to mention it loads the
>> entire CSV in memory.
>>
>> So I wondered if I could fix the csv module so it accept async. But the
>> question arised. Where should I put it ?
>>
>> - Create AsyncDictReader and AsyncReader ?
>> - Add inspect.iscoroutine calls widh it in the regular Readers and some
>> __aiter__ and __aenter__ ?
>> - add a csv.async namespace ?
>>
>> What API design are we recommanding for expose both sync and async
>> behaviors ?
>>
>>
>> Le 07/03/2017 à 03:08, Guido van Rossum a écrit :
>> > On Mon, Mar 6, 2017 at 5:57 PM, Raymond Hettinger
>> > <raymond.hettinger at gmail.com <mailto:raymond.hettinger at gmail.com>>
>> > wrote:
>> >
>> > Of course, it makes sense that anything not specific to asyncio
>> > should go outside of asyncio.
>> >
>> > What I'm more concerned about is what the other places actually
>> > are. Rather than putting async variants of everything sprinkled
>> > all over the standard library, I suggest collecting them all
>> > together, perhaps in a new asynctools module.
>> >
>> >
>> > That's a tough design choice. I think neither extreme is particularly
>> > attractive -- having everything in an asynctools package might also
>> > bundle together thing that are entirely unrelated. In the extreme it
>> > would be like proposing that all metaclasses should go in a new
>> > "metaclasstools" package. I think we did a reasonable job with ABCs:
>> > core support goes in abc.py, support for collections ABCs goes into the
>> > collections package (in a submodule), and other packages and modules
>> > sometimes define ABCs for their own users.
>> >
>> > Also, in some cases I expect we'll have to create a whole new module
>> > instead of updating some ancient piece of code with newfangled async
>> > variants to its outdated APIs.
>> >
>> > --
>> > --Guido van Rossum (python.org/~guido <http://python.org/~guido>)
>> >
>> >
>> > _______________________________________________
>> > Python-Dev mailing list
>> > Python-Dev at python.org
>> > https://mail.python.org/mailman/listinfo/python-dev
>> > Unsubscribe:
>> > https://mail.python.org/mailman/options/python-dev/desmoulinmichel%40gmail.com
>> >
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
>> https://mail.python.org/mailman/options/python-dev/brett%40python.org
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/njs%40pobox.com
>
--
Nathaniel J. Smith -- https://vorpus.org
More information about the Python-Dev
mailing list