[Python-ideas] Iterating non-newline-separated files should be easier

Andrew Barnert abarnert at yahoo.com
Sun Jul 20 13:53:01 CEST 2014


On Jul 20, 2014, at 0:50, Wichert Akkerman <wichert at wiggy.net> wrote:

> 
>> On 19 Jul 2014, at 22:05, Guido van Rossum <guido at python.org> wrote:
>> 
>> I don't have time for this thread.
>> 
>> I never meant to suggest anything that would require pushing back data into the buffer (you must have misread me).
>> 
>> I don't like changing the meaning of the newline argument to open (and it doesn't solve enough use cases any way).
> 
> I see another problem with doing this by modifying the open() call: it does not work for filehandles creates using other methods such as pipe() or socket(), either used directly or via subprocess. There are have real-world examples of situations where that is very useful.

A socket() is not a python file object, doesn't have a similar API, and doesn't have a readline method.

The result of calling socket.makefile, on the other hand, is a file object--and it's created by calling open.* And I'm pretty sure socket.makefile already takes a newline argument and just passes it along, in which case it will magically work with no changes at all.**

IIRC, os.pipe() just returns a pair of fds (integers), not a file object at all. It's up to you to wrap that in a file object if you want to--which you do by passing it to the open function.

So, neither of your objections works.

There are some better examples you could have raised, however. For example, a bz2.BzipFile is created with bz2.open. And, while the file delegates to a BufferedReader or TextIOWrapper, bz2.open almost certainly validates its inputs and won't pass newline on to the BufferedReader in binary mode. So, it would have to be changed to get the benefit.

However, given that there's no way to magically make every file-like object anyone has ever written automatically grow this new functionality, having the API change on the constructors, which are not part of any API and not consistent, is better than having it on the readline method. Think about where you'd get the error in each case: before even writing your code, when you look up how BzipFile instances are created and see there's no way to pass a newline argument, or deep in your code when you're using a file object that came from who knows where and it's readline method doesn't like the standard, documented newline argument?


* Or maybe it's created by constructing a BufferedReader, BufferedWriter, BufferedRandom, or TextIOWrapper directly. I don't remember off hand. But it doesn't matter, because the suggestion is to put the new parameter in those constructors, and make open forward to them, so whether makefile calls them directly or via open, it gets the same effect.

** Unless it validates the arguments before passing them along. I looked over a few stdlib classes, and there was at least one that unnecessarily does the same validation open is going to do anyway, so obviously that needs to be removed before the class magically benefits.


In some cases (like tempfile.NamedTemporaryFile), even that isn't necessary, because the implementation just passes through all  **kwargs that it doesn't want to handle to the open or constructor call.


More information about the Python-ideas mailing list