
Is there a way to check in Python whether an object is file-like? (Like `open`, `StringIO`, etc.) I would think that an abc in collections.abc will be the standard solution, like we have for so many other data types, but I can't find one. Thanks, Ram.

On 17 June 2016 at 23:21, Bar Harel <bzvi7919@gmail.com> wrote:
+1 although isn't io.IOBase already an ABC for file-like objects?
To an extent, but it would likely be an error to typecheck for it. There's a huge history of people implementing objects with (for example) just a read() method and expecting it to work in functions that need files but only actually use that one method. The idea of a "file-like object" predates ABCs by such a long time that code that defines "file-like" via any ABC will be seen as "broken" by someone (for public libraries at least - for private code you can of course follow whatever conventions you want).
Perhaps there should be some kind of reference to it in collections.abc? Maybe an alias even?
I would think that would be a mistake. Far too often it wouldn't do what people would expect, so it'd end up being more of a hindrance than a help. Paul

On 18 June 2016 at 12:16, Bar Harel <bzvi7919@gmail.com> wrote:
Hmm... If the term "file-like object" is so ambiguous, why does the STL use it so much?
Not sure what you mean by the STL? Do you mean the standard library? Because it was an easily-understandable term to use, even though it does require context (and sometimes a little bit of trial and error, or reading the source) to understand the precise requirements. Historically, the stdlib docs were somewhat more informal in their language than people seem to expect these days. (Personally, I think a bit of informality is fine, but opinions differ).
Perhaps a new term should be introduced clearly stating the required methods.
The point is there would be many separate terms, each stating different sets of methods. Or people would be required to implement unnecessary methods just to satisfy the constraints of an overly-strict type check. This is the fundamental idea behind duck typing - you only need to implement the methods needed by the code you're calling. Paul

This is the fundamental idea behind duck typing - you only need to implement the methods needed by the code you're calling.
But then again, the problem is that you don't know what methods are needed by the code. Even if you don't type-check, an ABC gives you a general idea of the methods needed for the code to work whether you inherit from it, or just look at it. I believe trial & error, and looking through the source code is a poor solution compared to the precise documentation of the required methods. On Sat, Jun 18, 2016 at 2:27 PM Paul Moore <p.f.moore@gmail.com> wrote:

On Sat, Jun 18, 2016 at 7:47 AM Bar Harel <bzvi7919@gmail.com> wrote:
Pass a stub object into the function and it will raise AttributeError or TypeError. That's the best way to learn about what interface is necessary.
I believe trial & error, and looking through the source code is a poor solution compared to the precise documentation of the required methods.
Documentation can go stale very easily. An AttributeError never lies; it tells you exactly which method you need to implement. Take a look at IOBase. There's a whole lot going on there that's unnecessary for a function that just expects a ``read`` method.

On Sat, Jun 18, 2016 at 11:16:51AM +0000, Bar Harel wrote:
Hmm... If the term "file-like object" is so ambiguous, why does the STL use it so much?
I don't know. What is the STL and how does it relate to Python? That's a rhetorical question. I know that the STL is C++'s Standard Template Library, which has very little to do with Python. Their use of the term "file-like object" is not necessarily identical to Python's use of it.
Perhaps a new term should be introduced clearly stating the required methods.
But that's the whole point -- there is no one standard set of required methods. Some uses of a "file-like object" just require a read() or write() method. Some might require a close() method, or a flush(), or both. Some might need seek() and/or tell(). There's very little point in insisting on the full interface provided by open(...) if all you need is to call the write() method. -- Steve

On Sat, Jun 18, 2016 at 10:16 PM, Steven D'Aprano <steve@pearwood.info> wrote:
So then the question is: What can I provide to something which wants a "file-like object"? Is an io.StringIO valid for this function, or for that function? It's the most file-like non-file that I can think of, so I'd generally expect it to work in most situations; it appears to support a lot of operations. What about io.BytesIO? Right there, we have to distinguish between "binary file-like objects" and "text file-like objects". And with other classes, how do you know what you need? The term "file-like object" is unfortunately made somewhat useless by its variant uses. ChrisA

I think it may be time to gently phase out "file-like" from the stdlib documentation, in favor of more precise terms like (text | binary) (input | output) stream. There are probably a handful of cases where it's worth restricting the (duck) type more, e.g. to "an object with a readline() method returning str". But in most cases for the stdlib I favor specifying a relatively wide interface so that future evolution of the documented module is not constrained by old documentation. For example, an API that takes a writable binary stream might currently only call .readline() on that stream, but in the future might find it more convenient to use .readlines() or .read(). As to whether we should just stick with the ABCs from 'io', there are a few problems with that, so I don't want to go there. First, these ABCs don't distinguish between input and output. This is intentional because otherwise there would be a bewildering set of combinations, since for each form would have to support input, output, or both -- and we already have more than enough ABCs here (IOBase, RawIOBase, BufferedIOBase, TextIOBase -- I really don't want to have to remember 12 of these). We also need to be sensitive to the use of duck typing (which is particularly common for I/O streams) -- while it's fine to formally define an API as taking e.g. a text input stream, it's usually not fine to start asserting that it must be an instance of TextIOBase -- ABC.register() notwithstanding, that would break a lot of code. (Type annotations are a different story -- but except for brand new APIs, the stdlib should not yet start using those.) PS. I assume the reference to 'STL' was meant to be 'stdlib', an innocent mistake by someone more familiar with C++, not something to be picked apart. -- --Guido van Rossum (python.org/~guido)

On 17 June 2016 at 23:21, Bar Harel <bzvi7919@gmail.com> wrote:
+1 although isn't io.IOBase already an ABC for file-like objects?
To an extent, but it would likely be an error to typecheck for it. There's a huge history of people implementing objects with (for example) just a read() method and expecting it to work in functions that need files but only actually use that one method. The idea of a "file-like object" predates ABCs by such a long time that code that defines "file-like" via any ABC will be seen as "broken" by someone (for public libraries at least - for private code you can of course follow whatever conventions you want).
Perhaps there should be some kind of reference to it in collections.abc? Maybe an alias even?
I would think that would be a mistake. Far too often it wouldn't do what people would expect, so it'd end up being more of a hindrance than a help. Paul

On 18 June 2016 at 12:16, Bar Harel <bzvi7919@gmail.com> wrote:
Hmm... If the term "file-like object" is so ambiguous, why does the STL use it so much?
Not sure what you mean by the STL? Do you mean the standard library? Because it was an easily-understandable term to use, even though it does require context (and sometimes a little bit of trial and error, or reading the source) to understand the precise requirements. Historically, the stdlib docs were somewhat more informal in their language than people seem to expect these days. (Personally, I think a bit of informality is fine, but opinions differ).
Perhaps a new term should be introduced clearly stating the required methods.
The point is there would be many separate terms, each stating different sets of methods. Or people would be required to implement unnecessary methods just to satisfy the constraints of an overly-strict type check. This is the fundamental idea behind duck typing - you only need to implement the methods needed by the code you're calling. Paul

This is the fundamental idea behind duck typing - you only need to implement the methods needed by the code you're calling.
But then again, the problem is that you don't know what methods are needed by the code. Even if you don't type-check, an ABC gives you a general idea of the methods needed for the code to work whether you inherit from it, or just look at it. I believe trial & error, and looking through the source code is a poor solution compared to the precise documentation of the required methods. On Sat, Jun 18, 2016 at 2:27 PM Paul Moore <p.f.moore@gmail.com> wrote:

On Sat, Jun 18, 2016 at 7:47 AM Bar Harel <bzvi7919@gmail.com> wrote:
Pass a stub object into the function and it will raise AttributeError or TypeError. That's the best way to learn about what interface is necessary.
I believe trial & error, and looking through the source code is a poor solution compared to the precise documentation of the required methods.
Documentation can go stale very easily. An AttributeError never lies; it tells you exactly which method you need to implement. Take a look at IOBase. There's a whole lot going on there that's unnecessary for a function that just expects a ``read`` method.

On Sat, Jun 18, 2016 at 11:16:51AM +0000, Bar Harel wrote:
Hmm... If the term "file-like object" is so ambiguous, why does the STL use it so much?
I don't know. What is the STL and how does it relate to Python? That's a rhetorical question. I know that the STL is C++'s Standard Template Library, which has very little to do with Python. Their use of the term "file-like object" is not necessarily identical to Python's use of it.
Perhaps a new term should be introduced clearly stating the required methods.
But that's the whole point -- there is no one standard set of required methods. Some uses of a "file-like object" just require a read() or write() method. Some might require a close() method, or a flush(), or both. Some might need seek() and/or tell(). There's very little point in insisting on the full interface provided by open(...) if all you need is to call the write() method. -- Steve

On Sat, Jun 18, 2016 at 10:16 PM, Steven D'Aprano <steve@pearwood.info> wrote:
So then the question is: What can I provide to something which wants a "file-like object"? Is an io.StringIO valid for this function, or for that function? It's the most file-like non-file that I can think of, so I'd generally expect it to work in most situations; it appears to support a lot of operations. What about io.BytesIO? Right there, we have to distinguish between "binary file-like objects" and "text file-like objects". And with other classes, how do you know what you need? The term "file-like object" is unfortunately made somewhat useless by its variant uses. ChrisA

I think it may be time to gently phase out "file-like" from the stdlib documentation, in favor of more precise terms like (text | binary) (input | output) stream. There are probably a handful of cases where it's worth restricting the (duck) type more, e.g. to "an object with a readline() method returning str". But in most cases for the stdlib I favor specifying a relatively wide interface so that future evolution of the documented module is not constrained by old documentation. For example, an API that takes a writable binary stream might currently only call .readline() on that stream, but in the future might find it more convenient to use .readlines() or .read(). As to whether we should just stick with the ABCs from 'io', there are a few problems with that, so I don't want to go there. First, these ABCs don't distinguish between input and output. This is intentional because otherwise there would be a bewildering set of combinations, since for each form would have to support input, output, or both -- and we already have more than enough ABCs here (IOBase, RawIOBase, BufferedIOBase, TextIOBase -- I really don't want to have to remember 12 of these). We also need to be sensitive to the use of duck typing (which is particularly common for I/O streams) -- while it's fine to formally define an API as taking e.g. a text input stream, it's usually not fine to start asserting that it must be an instance of TextIOBase -- ABC.register() notwithstanding, that would break a lot of code. (Type annotations are a different story -- but except for brand new APIs, the stdlib should not yet start using those.) PS. I assume the reference to 'STL' was meant to be 'stdlib', an innocent mistake by someone more familiar with C++, not something to be picked apart. -- --Guido van Rossum (python.org/~guido)
participants (7)
-
Bar Harel
-
Chris Angelico
-
Guido van Rossum
-
Michael Selik
-
Paul Moore
-
Ram Rachum
-
Steven D'Aprano