Sharing: File Reader Generator with & w/o Policy
Mark H Harris
harrismh777 at gmail.com
Sun Mar 16 04:34:52 CET 2014
On 3/15/14 9:01 PM, Steven D'Aprano wrote:
> Reading from files is already pretty simple. I would expect that it will
> be harder to learn the specific details of custom, specialised, file
> readers that *almost*, but not quite, do what you want, than to just
> write a couple of lines of code to do what you need when you need it.
> Particularly for interactive use, where robustness is less important than
> ease of use.
Yes. What I'm finding is that I'm coding the same 4-6 lines of code
with every file open (I do want error handling, at least for
FileNotFoundError) and I only want it to be two lines, read the file
into a list with error handling.
> What's "policy"?
That's part of what I personally struggle with (frequently) is do I
place the policy in the generator, or do I handle it on the outside. For
instance, I normally strip the line-end and I want to know the record
lengths. I also may want to know the record number from arrival
sequence. This policy can be handled in the generator; although, I could
have handled it outside too.
> for i, line in enumerate(open(pathname + "my_fox")):
> print((i, len(line), line))
I like it... and this is where I've always been, when I finally said to
myself, yuk. yes, it technically works very well. But, its ugly. And I
don't mean its technically ugly, I mean its aesthetically ugly and not
user-easy-to-read. (I know that's all subjective)
for line in getnumline(path+"my_foxy")):
In this case getnumline() is a generator wrapper around fName(). It of
course doesn't do anything different than the two lines you listed, but
it is immediately easier to tell what is happening; even if you're not
an experienced python programmer.
> [Aside: I don't believe that insulating programmers from tracebacks does
> them any favours.
Yes. I think you're right about that. But what if they're not
programmers; what if they're just application users that don't have a
clue what a trace-back is, and just want to know that the file does not
exist? And right away they realize that, oops, I spelled the filename
wrong. Yeaah, I struggle with this as I'm trying to simplify, because
personally I want to see the trace back info.
> Worse, it's inconsistent! Some errors are handled normally, with an
> exception. It's only FileNotFoundError that is captured and printed. So
> if the user wants to re-use this function and do something with any
> exceptions, she has to use *two* forms of error handling:
Yes. The exception handling needs to handle all normal errors.
> (1) wrap it in try...except handler to capture any exception other
> than FileNotFoundError; and
> (2) intercept writes to standard out, capture the error message, and
> reverse-engineer what went wrong.
> Apart from stripping newlines, which is surely better left to the user
> (what if they need to see the newline? by stripping them automatically,
> the user cannot distinguish between a file which ends with a newline
> character and one which does not), this part is just a re-invention of
> the existing wheel. File objects are already iterable, and yield the
> lines of the file.
Yes, this is based on my use case, which never needs the line-ends, in
fact they are a pain. These files are variable record length and the
only thing the newline is used for is delimiting the records.
> def fnName(filename):
> for count, line in enumerate(fName(filename)):
> yield (count, len(line), line)
I like this, thanks! enumerate and I are becoming friends.
I like this case philosophically because it is a both | and. The policy
is contained in the wrapper generator using enumerate() and len()
leaving the fName() generator to produce the line.
And you are right about another thing, I just want to use this thing
over and over.
for line in getnumline(filename):
There does seem to be just one way of doing this (file reads) but
there are actually many ways of doing this. Is a file object really
better than a generator, are there good reasons for using the generator,
are there absolute cases for using a file object?
More information about the Python-list