Mailman 3 make Connections iterable - Python-ideas

make Connections iterable

Oscar Smith

Jan. 8, 2018

8:17 a.m.

I am currently working on a program where it would be really useful if a connection had a __next__ method, because then it would be much easier to iterate over. It would just be an alias to recv, but would allow you to do things like merging the results of connections using heapq.merge that currently are highly non-trivial to accomplish. Is there a reason this API isn't supported? Oscar Smith

Attachments:

attachment.htm (text/html — 1.0 KB)

Show replies by date

Terry Reedy

January 2018

noon

On 1/8/2018 11:17 AM, Oscar Smith wrote:

...

The reference to recv says that you must be talking about multiprocessing.Connection rather than sqlite3.Connection. Since recv raises EORError when done, an alias does not work. Try the following generator adaptor. def connect_gen(connection): try: while True: yield connection.recv() except EOFError: pass You could make the above the .__iter__ method of a MyConnecton subclass. -- Terry Jan Reedy

Steven D'Aprano

5:25 p.m.

On Mon, Jan 08, 2018 at 10:17:30AM -0600, Oscar Smith wrote:

...

I am currently working on a program where it would be really useful if a connection had a __next__ method, because then it would be much easier to iterate over.

What sort of connection are you referring to?

...

It would just be an alias to recv, but would allow you to do things like merging the results of connections using heapq.merge that currently are highly non-trivial to accomplish.

This gives you an iterator which repeatedly calls connection.recv until it raises a FooException, then ends. def conn_iter(connection): try: while True: yield connection.recv() except FooException: # FIXME -- what does recv actually raise? return Doesn't seem "highly non-trivial" to me. Have I missed something?

...

Is there a reason this API isn't supported?

You are asking the wrong question. Adding APIs isn't "default allow", where there has to be a reason to *not* support it otherwise it gets added. It is "default deny" -- there has to be a good reason to add it, otherwise it gets left out. YAGNI is an excellent design principle, as it is easier to add a useful API later, than to remove an unnecessary or poorly designed one. So the question needs to be: "Is this a good enough reason to support this API?" Maybe, maybe not. Not every trivial wrapper function needs to be a method. But perhaps this is an exception: perhaps iterability is such a common and useful API for connections that it should be added, for the same reason that files are iterable. Care to elaborate on why this would be useful and why the generator I showed above isn't satisfactory? -- Steve

Oscar Smith

7:05 p.m.

The arguments for including this API is that it allows easy iteration over the results of a connection allowing it to be used with any of the features of itertools or any other library accepting iterables. recv is only used in places where the iterable protocol could be used, so it makes sense for consistency to use the API shared by the rest of Python. Oscar Smith On Mon, Jan 8, 2018 at 7:25 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...

Amit Green

7:27 p.m.

An argument against this API, is that any caller of recv should be doing error handling (i.e.: catching exceptions from the socket). Changing into an iterator makes it less likely that error handling will be properly coded, and makes the error handling more obscure. Thus although the API would make the code more readable for the [wrong case] of not handling errors; the real issue is that it would make the code more obscure for the proper case of error handling. We should focus on the proper use case: using recv with error handling & thus not add this API. <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon> Virus-free. www.avast.com <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> On Mon, Jan 8, 2018 at 10:05 PM, Oscar Smith <oscardssmith@gmail.com> wrote:

...

The arguments for including this API is that it allows easy iteration over the results of a connection allowing it to be used with any of the features of itertools or any other library accepting iterables. recv is only used in places where the iterable protocol could be used, so it makes sense for consistency to use the API shared by the rest of Python.

Oscar Smith

On Mon, Jan 8, 2018 at 7:25 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...
On Mon, Jan 08, 2018 at 10:17:30AM -0600, Oscar Smith wrote:

...
I am currently working on a program where it would be really useful if a connection had a __next__ method, because then it would be much easier to iterate over.

What sort of connection are you referring to?

...
It would just be an alias to recv, but would allow you to do things like merging the results of connections using heapq.merge that currently are highly non-trivial to accomplish.

This gives you an iterator which repeatedly calls connection.recv until it raises a FooException, then ends.

def conn_iter(connection): try: while True: yield connection.recv() except FooException: # FIXME -- what does recv actually raise? return

Doesn't seem "highly non-trivial" to me. Have I missed something?

...
Is there a reason this API isn't supported?

You are asking the wrong question. Adding APIs isn't "default allow", where there has to be a reason to *not* support it otherwise it gets added. It is "default deny" -- there has to be a good reason to add it, otherwise it gets left out. YAGNI is an excellent design principle, as it is easier to add a useful API later, than to remove an unnecessary or poorly designed one.

So the question needs to be:

"Is this a good enough reason to support this API?"

Maybe, maybe not. Not every trivial wrapper function needs to be a method.

But perhaps this is an exception: perhaps iterability is such a common and useful API for connections that it should be added, for the same reason that files are iterable.

Care to elaborate on why this would be useful and why the generator I showed above isn't satisfactory?

-- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Nick Coghlan

7:34 p.m.

On 9 January 2018 at 13:27, Amit Green <amit.mixie@gmail.com> wrote:

...

It could be useful to include a recipe in the documentation that shows a generator with suitable error handling (taking the generic connection errors and adapting them to app specific ones) while also showing how to adapt the connection to the iterator protocol, though. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nathaniel Smith

9:22 p.m.

On Mon, Jan 8, 2018 at 7:27 PM, Amit Green <amit.mixie@gmail.com> wrote:

...

An argument against this API, is that any caller of recv should be doing error handling (i.e.: catching exceptions from the socket).

It's still not entirely clear, but I'm pretty sure this thread is talking about multiprocessing.Connection objects, which don't have anything to do with sockets. (I think. They might use sockets internally on some platforms.) The only documented error from multiprocessing.Connection.recv is EOFError, which is basically equivalent to a StopIteration. I'm surprised that multiprocessing.Connection isn't iterable -- it seems like an obvious oversight. -n -- Nathaniel J. Smith -- https://vorpus.org <http://vorpus.org>

Antoine Pitrou

2:07 a.m.

On Mon, 8 Jan 2018 21:22:56 -0800 Nathaniel Smith <njs@pobox.com> wrote:

...

The only documented error from multiprocessing.Connection.recv is EOFError, which is basically equivalent to a StopIteration.

Actually recv() can raise an OSError corresponding to any system-level error.

...

I'm surprised that multiprocessing.Connection isn't iterable -- it seems like an obvious oversight.

What is obvious about making a connection iterable? It's the first time I see someone requesting this. Regards Antoine.

Nick Coghlan

2:46 a.m.

On 9 January 2018 at 20:07, Antoine Pitrou <solipsis@pitrou.net> wrote:

...

If you view them as comparable to subprocess pipes, then it can be surprising that they're not iterable when using a line-oriented protocol. If you instead view them as comparable to socket connections, then the lack of iteration support seems equally reasonable. Hence my suggestion of providing a docs recipe showing an example of wrapping a connection in a generator in order to define a suitable way of getting from a raw bytestream to iterable chunks. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Antoine Pitrou

3:02 a.m.

On Tue, 9 Jan 2018 20:46:35 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:

...

multiprocessing connections are actually message-oriented. So perhaps it could make sense for them to be iterable. But they are also quite low-level (often you wouldn't use them directly, but instead rely on multiprocessing.Queue).

...

Well... if someone needs a doc recipe for this, they shouldn't use the lower-level functionality and instead stick to multiprocessing.Queue. (this begs the question: should multiprocessing.Queue be iterable? well, it's modeled on queue.Queue which isn't iterable) Regards Antoine.

Random832

4:12 a.m.

On Tue, Jan 9, 2018, at 05:46, Nick Coghlan wrote:

...

Sockets are files - there's no fundamental reason a stream socket using a line-oriented protocol (which is a common enough case), or a datagram socket, shouldn't be iterable. Why aren't they? Making sockets iterable would be a separate discussion, but I don't think this is necessarily an argument. And saying "I think you should be handling errors in some particular way, so we'll make the API more difficult to encourage this" seems a non-sequitur. The whole point of exceptions is that the error handling code doesn't need to be directly at the point of use but can be, say, a try/catch wrapped around the inner loop.

Chris Angelico

8:27 a.m.

On Tue, Jan 9, 2018 at 11:12 PM, Random832 <random832@fastmail.com> wrote:

...

Only in POSIX. On other platforms, sockets are most definitely NOT files. And datagram sockets don't really make sense to iterate over. Part of the problem with even POSIX stream sockets (either TCP or Unix domain) is what you do when there's nothing to read. Do you block, waiting for a line? Do you raise StopIteration and then subsequently become un-finished again (which, according to Python semantics, is a broken iterator)? Do you yield a special value that says "no data yet but more maybe later"?? Blocking is the only one that makes sense, and that only if you run two threads, one for reading and one for writing. (Unless you're using a unidirectional socket, basically a TCP-enabled or filesystem-named pipe. Far from common.) ChrisA

Ethan Furman

8:40 a.m.

On 01/09/2018 08:27 AM, Chris Angelico wrote:

...

This. Although it is technically broken, this is how reading from the console works. I believe that falls under practicality beats purity. ;) -- ~Ethan~

Nathaniel Smith

8:39 a.m.

On Jan 9, 2018 04:12, "Random832" <random832@fastmail.com> wrote: On Tue, Jan 9, 2018, at 05:46, Nick Coghlan wrote:

...

Sockets are files - there's no fundamental reason a stream socket using a line-oriented protocol (which is a common enough case), or a datagram socket, shouldn't be iterable. Why aren't they? Supporting line iteration on sockets would require adding a whole buffering layer, which would be a huge change in semantics. Also, due to the way the BSD socket API works, stream and datagram sockets are the same Python type, so which one would socket.__next__ assume? (Plus datagrams are a bit messy anyway; you need to know the protocol's max size before you can call recv.) I know this was maybe a rhetorical question, but this particular case does have an answer beyond "we never did it that way before" :-). -n

Antoine Pitrou

8:49 a.m.

On Tue, 9 Jan 2018 08:39:06 -0800 Nathaniel Smith <njs@pobox.com> wrote:

...

The buffering layer already exists. Just call socket.makefile() and you've got your iterable object :-) https://docs.python.org/3/library/socket.html#socket.socket.makefile Regards Antoine.

Serhiy Storchaka

4:51 a.m.

09.01.18 12:46, Nick Coghlan пише:

...

recv() can raise OSError, and it is more likely than raising OSError in file's readline(). The user code inside the loop also can perform writing and can raise OSError. This in the case of line-oriented files there is a single main source of OSError, but in the case of Connections there are two possible sources, and you need a way to distinguish errors raised by recv() and by writing. Currently you just use two different try-except statements. while True: try: data = conn.recv() except EOFError: break except OSError: # error on reading try: # user code except OSError: # error on writing It is very easy extend the case when don't handle OSError to the case when handle OSError. If Connections be iterable the code that don't handle errors looks simple: for data in conn: # user code But this simple code is not correct. When add error handling it will look like: while True: try: data = next(conn) except StopIteration: break except OSError: # error on reading try: # user code except OSError: # error on writing Not too different from the first example and very differen from the second example. This feature is not useful if properly handle errors, it makes the simpler only a quick code when you don't handle errors or handle them improperly. Yet one concern: the Connections object has the send() method, generator objects also have send() methods, but with the different semantic. This may be confusing.

Nathaniel Smith

3:24 a.m.

On Tue, Jan 9, 2018 at 2:07 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...

On the receive side, it's a stream of incoming objects that you fetch one at a time until you get to the end, probably processed with a loop like: while True: try: next_message = conn.recv() except EOFError: break ... Why wouldn't it be iterable? -n -- Nathaniel J. Smith -- https://vorpus.org

Terry Reedy

January 2018

8 p.m.

On 1/8/2018 11:17 AM, Oscar Smith wrote:

...

Steven D'Aprano

1:25 a.m.

On Mon, Jan 08, 2018 at 10:17:30AM -0600, Oscar Smith wrote:

...

I am currently working on a program where it would be really useful if a connection had a __next__ method, because then it would be much easier to iterate over.

What sort of connection are you referring to?

...

It would just be an alias to recv, but would allow you to do things like merging the results of connections using heapq.merge that currently are highly non-trivial to accomplish.

...

Is there a reason this API isn't supported?

Oscar Smith

3:05 a.m.

...

Amit Green

3:27 a.m.

...

The arguments for including this API is that it allows easy iteration over the results of a connection allowing it to be used with any of the features of itertools or any other library accepting iterables. recv is only used in places where the iterable protocol could be used, so it makes sense for consistency to use the API shared by the rest of Python.

Oscar Smith

On Mon, Jan 8, 2018 at 7:25 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...
On Mon, Jan 08, 2018 at 10:17:30AM -0600, Oscar Smith wrote:

...
I am currently working on a program where it would be really useful if a connection had a __next__ method, because then it would be much easier to iterate over.

What sort of connection are you referring to?

...
It would just be an alias to recv, but would allow you to do things like merging the results of connections using heapq.merge that currently are highly non-trivial to accomplish.

This gives you an iterator which repeatedly calls connection.recv until it raises a FooException, then ends.

def conn_iter(connection): try: while True: yield connection.recv() except FooException: # FIXME -- what does recv actually raise? return

Doesn't seem "highly non-trivial" to me. Have I missed something?

...
Is there a reason this API isn't supported?

You are asking the wrong question. Adding APIs isn't "default allow", where there has to be a reason to *not* support it otherwise it gets added. It is "default deny" -- there has to be a good reason to add it, otherwise it gets left out. YAGNI is an excellent design principle, as it is easier to add a useful API later, than to remove an unnecessary or poorly designed one.

So the question needs to be:

"Is this a good enough reason to support this API?"

Maybe, maybe not. Not every trivial wrapper function needs to be a method.

But perhaps this is an exception: perhaps iterability is such a common and useful API for connections that it should be added, for the same reason that files are iterable.

Care to elaborate on why this would be useful and why the generator I showed above isn't satisfactory?

-- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Nick Coghlan

3:34 a.m.

On 9 January 2018 at 13:27, Amit Green <amit.mixie@gmail.com> wrote:

...

Nathaniel Smith

5:22 a.m.

On Mon, Jan 8, 2018 at 7:27 PM, Amit Green <amit.mixie@gmail.com> wrote:

...

An argument against this API, is that any caller of recv should be doing error handling (i.e.: catching exceptions from the socket).

Antoine Pitrou

January 2018

2:07 a.m.

On Mon, 8 Jan 2018 21:22:56 -0800 Nathaniel Smith <njs@pobox.com> wrote:

...

The only documented error from multiprocessing.Connection.recv is EOFError, which is basically equivalent to a StopIteration.

Actually recv() can raise an OSError corresponding to any system-level error.

...

I'm surprised that multiprocessing.Connection isn't iterable -- it seems like an obvious oversight.

What is obvious about making a connection iterable? It's the first time I see someone requesting this. Regards Antoine.

Nick Coghlan

2:46 a.m.

On 9 January 2018 at 20:07, Antoine Pitrou <solipsis@pitrou.net> wrote:

...

Antoine Pitrou

3:02 a.m.

On Tue, 9 Jan 2018 20:46:35 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:

...

Random832

4:12 a.m.

On Tue, Jan 9, 2018, at 05:46, Nick Coghlan wrote:

...

Chris Angelico

8:27 a.m.

On Tue, Jan 9, 2018 at 11:12 PM, Random832 <random832@fastmail.com> wrote:

...

Ethan Furman

8:40 a.m.

On 01/09/2018 08:27 AM, Chris Angelico wrote:

...

This. Although it is technically broken, this is how reading from the console works. I believe that falls under practicality beats purity. ;) -- ~Ethan~

Nathaniel Smith

January 2018

8:39 a.m.

On Jan 9, 2018 04:12, "Random832" <random832@fastmail.com> wrote: On Tue, Jan 9, 2018, at 05:46, Nick Coghlan wrote:

...

Sockets are files - there's no fundamental reason a stream socket using a line-oriented protocol (which is a common enough case), or a datagram socket, shouldn't be iterable. Why aren't they? Supporting line iteration on sockets would require adding a whole buffering layer, which would be a huge change in semantics. Also, due to the way the BSD socket API works, stream and datagram sockets are the same Python type, so which one would socket.__next__ assume? (Plus datagrams are a bit messy anyway; you need to know the protocol's max size before you can call recv.) I know this was maybe a rhetorical question, but this particular case does have an answer beyond "we never did it that way before" :-). -n

Antoine Pitrou

8:49 a.m.

On Tue, 9 Jan 2018 08:39:06 -0800 Nathaniel Smith <njs@pobox.com> wrote:

...

The buffering layer already exists. Just call socket.makefile() and you've got your iterable object :-) https://docs.python.org/3/library/socket.html#socket.socket.makefile Regards Antoine.