[Tutor] New file stuff (formerly inoffensive non-commercial mail)
Scot W. Stevenson
scot@possum.in-berlin.de
Tue, 13 Aug 2002 23:28:51 +0200
Hello Alan,
> Yep, so maybe there's a reason...
There usually is. You know, back when I was 17, I knew it all, and ever
since then, I seem to have become progressively more stupid =X)...
> Actually altho' Python does wrap the C stuff the file
> methods are very similar to every mainstream programming
> language around from ADA to Lisp, to Smalltalk...
Well, yes, but just because it is the way that it has always been done by
computer scientists doesn't mean that a different way might not be easier
for people who don't tend to start counting with 0. Look at indentation
and (new) division, two places where Python is now marching to its own
drummer. Asking somebody to remember "br+" or whatever it is to open a
file just is not the way things are usually done in the language.
> Pascal kind of makes that distinction by having text
> files as a special category. But binary files are
> broken into myriad types... FILE OF <FOO>
If anybody wanted to do that (deal with a file as a collection of 64-bit
words instead of bytes, for example) they could persumably subclass the
binfile object. Or something like that. I'm guessing that people use text
or text-like files (HTML, XML, whatnot) so much that having a set of
commands for text files is worth the effort.
> > With this type of file, we can iterate, splice, or index the content
> > without having to explicitly tell the Elves that we want to
> > read or write or whatnot:
> Thats actually quite tricky to do. Why not try implememting
> the interface in Python to see whats involved....
Yes, that would be the next logical step in my argument, wouldn't
it...argh. I'll have to see what I can come up with (did I mention I'm
just learning Python <g>) next week when I have some time on my hands...
> One problem is that under the covers you have to figure out predictively
> what mode to open the raw file in - what
> does the user want to do with it. Otherwise you have to
> open/close the file after each operation and keep track
> of where the last access was etc etc...
This is where I run up against my lack of background knowledge on operating
system basics - why can't you just read the whole file into a buffer and
manipulate that, which occasional flushes to a backup version? If I
understand Linux correctly, this is what the operating system does anyway,
or at least that is the excuse everybody keeps giving me when I ask why
"free" shows me that all my nice RAM is being used for buffers and caches
and stuff like that..
The trick (I guess) would be to make sure at all times that the file is not
corrupted when the system crashes (this seems to be more a constant worry
with Windows and (old) Mac users, but I also remember being told that
Murphy was a computer scientist at heart). Include a buffer flush command
after every write? Once you have everything in a buffer, you can do
everything you want rather quickly (the end version has to be in C anyway
for speed).
If you do decide to do everything directly, yes, you might have to reopen
and close the file a few times. But if speed is the problem, you can
always go to the os module and do it the hard way. I'm assuming here that
the lowest level you can get to are the POSIX calls (was that the name?)
to the operating system, and that they force you to decide if you want to
read or write? So you couldn't just write a new C library for opening and
closing files?
> Not too hard if its text and you assume line by line
> access rather than characters, binary presumably
> returns bytes?
Yes - with maybe an option for multiples of bytes, but that would be for
somebody to decide who knows more about the uses of binary files. I don't
think I have ever accessed one in Python, but then I've heard that they
are more common with Windows and (old) Macs than with Linux.
[splices and indices]
> Ah, but now try implementing that on a binary file.
> But I guess you could just seek(0) after each
> operation... or could you? It might depend on the
> current mode...
Worse case would probably be close file, open file, seek(0). That certainly
would not be fast in relative terms; the question is, how fast is this
going to be in human terms? I'm assuming the heavy-lifting people will
want to use the old version with the os module anyway, because staying
close to C (or Java in the case of Jython) is always going to be faster
than anything one level up.
> BTW Have you looked at the fileinput module which
> does a little bit of what you want I think....
No, I hadn't, thank you for the reference. Will read it.
I'll see about throwing together an interface for the new versions as a
first step; tho I should warn everybody right away that I'll need a bit of
help here...
Y, Scot
--
Scot W. Stevenson wrote me on Tuesday, 13. Aug 2002 in Zepernick, Germany
on his happy little Linux system that has been up for 1363 hours
and has a CPU that is falling asleep at a system load of 0.00.