[Tutor] I am a friendly inoffensive non-commercial subject line
Scot W. Stevenson
scot@possum.in-berlin.de
Tue, 13 Aug 2002 11:11:34 +0200
[This did have the subject line "High level file handling in the rain" but
the SMTP mail dragon at mail.python.org didn't like that]
Hi there,
While wading thru the rain that is slowly turning most of Central Europe
into one big lake, my mind went from water ditches to water pipes to FIFO
pipes to file handeling, and I finally realized why I have always felt
that there is something fishy about the way Python handles files: It seems
so low-level to me, so downright gutteresque compared with everything
else. Let me explain while I'm waiting for my hair to dry (not to mention
the cat).
The current high-level file commands force you to deal with a class of
questions that the Python Elves usually keep submerged: You have to /open/
the things, and you even have to tell the Elves is it is a file you only
want to read from, or maybe write to, and if you think it is binary, which
is sort of like having to decide beforehand if the variable 'lifejacket'
is going to be an integer or a (groan) float. Even worse, this information
is transfered by magic characters like 'r+', 'br', 'a+', which is not the
way Python usually does stuff. Even worse, you seem to need different
magic depending on which operating system you use, because Windows and
Macs don't do files the same way that Unix does.
Then you have a bunch of methods that only really make sense if you know
how hard disks physically operate:
/seek/ - A synonym for 'search'. So does "filename.seek(0)" search for the
occurence of the number 0 in the file and return the position? (the user
shouldn't be forced to know what a 'head seek' is to read stuff)
/flush/ - But I always do! No, wait - the docs say this is for "buffers",
so it probably is used by blond female teenagers to pound wooden stakes
into vampires. (the user should not be forced to deal with buffers on the
highest level)
/tell/ - The docs say that this gives me "the file's position", which is
strange, because it should be on the harddisk in '/home/scot/python/'
(more 'head seek' hardware stuff)
And so on. After you've done all of that, you are supposed to 'close' a
file, which is sort of like asking me to call a destructor or run the
garbage collector by hand. Look, let's face it: If we wanted to be forced
to clean up after ourselves, we would a) still be living with our parents
and b) would use C or some other low(er) level language, not Python.
Note that 'open', 'seek', 'tell', 'flush', and 'close' are terribly general
verbs and don't give you any hint that you are working with a file (though
'open' has been replaced by 'file' in Python 2.2). This is a whole
different nomenclature compared with the usual Python objects: If a file
is a sequence of characters or bytes or whatnot, why can't I splice files
with [n:m] to get a bunch of lines or use the normal [n] to index a
certain byte like I can with everything else?
Of course it's good to have the choice of low-level, byte-by-byte control,
but that is what the os module is for. The top level, inbuilt commands
should protect the casual user from all of this - at least it does with
everything else, including division. I realize that when Python was being
invented, throwing a thin wrapper around the C functions that everybody
involved knew was a good idea to just be able to move on to more important
stuff, but it still is just a thin wrapper that exposes all of the
squishy, wet, slimy C bits I don't really want to know about.
(And to think a week ago I was worried about the grass not getting enough
water.)
Just for the entertainment value, and because the cat is still throwing her
wet body against my leg, what would be wrong with the following file
handling system:
We recognize two different types of flies:
1) "Normal" files which consist of lines, in other words, of strings. This
gives us strings that are sequences of characters and files that are
sequences of strings, a nice pyramid. We'll use the 'file'-keyword for
this type.
2) "Binary" files which consist of bytes. Note that every 'normal' file can
be accessed as a binary file, but the reverse is not true. We'll use the
'binfile'-keyword for this type.
Now when we want to access a file, we don't "open" it - that's the Elves'
job - we just get right down to it:
filehandle = file('/home/scot/python/wetcat.text') or
binhandle = binfile('flooddata.dat')
With this type of file, we can iterate, splice, or index the content
without having to explicitly tell the Elves that we want to read or write
or whatnot:
>>>filehandle[2]
'drip drip drip'
>>>filehandle[0:2]
['drip', 'drip drip']
>>>binhandle[2]
4
>>>binhandle[0:2]
(1, 2)
(Or maybe both should return lists, I'm not sure what would be better).
Since we have direct (random) access with splices and indices, we don't
need to 'seek' and 'tell' anymore, and we progress string by string or
byte by byte with iteration. Stuff like 'flush' and 'close' is Elves'
work. Still, you'd want to keep the following methods:
one_string = file('damp.txt').read()
all_strings_as_a_list = file('damp.txt').readall()
one_byte = binfile('humidity.dat').read()
all_bytes_as_a_list_or_tuple = binfile('humidity.dat').readall()
You also probably want to keep '.readline()' and '.readlines()' around as
synonyms. Reading is easy; writing gets you into trouble, because you
don't want to go around casually overwriting files the way you do
variables. There are four cases:
1) File exists: Overwrite anyway
2) File exists: Raise error
3) File doesn't exist: Make new file, write
4) File doesn't exist: Raise error
For example, we could define
filename.forcewrite(data): 1 + 3
filename.writenew(data): 2 + 3
filename.overwrite(data): 2 + 4
or whatever combination seems to make sense. Other useful methods we can
just steal from the 'list' type: append, insert, replace, index, remove...
The idea behind this is that newbies don't have to take a crash course in
hard disk mechanics, don't have to memorize magic characters that look
like ex commands, don't have to decide beforehand if they want to read and
write, don't have to close stuff afterwards, and can just access files in
the same way they would a list or any other sequence.
So instead of
outputfile = open('poldern.txt', 'w')
inputfile = open('floodgates.txt', 'r+'):
for line in inputfile:
outputfile.write(line)
outputfile.close()
inputfile.close()
you'd have
for line in file('floodgates.txt'):
file('poldern.txt').append(line)
Now doesn't that look a lot cleaner? /Watered down/, so to speak...
Okay, this is probably more than enough gushing from me today, and the cat
has gone asleep. I'd be interested to hear what the people with more
experience in design concepts and low-level file handling think of this -
or if it is just me who thinks that the current way of doing things is not
quite as high level as the rest of the language is.
Dry feet to all,
Y, Scot
--
Scot W. Stevenson wrote this on Tuesday, 13. Aug 2002 in Berlin, Germany
on his happy little Linux system that has been up for 1342 hours
and has a CPU that is falling asleep at a system load of 0.76.