[Tutor] RE: New file stuff

alan.gauld@bt.com alan.gauld@bt.com
Wed, 14 Aug 2002 14:31:12 +0100


> Well, yes, but just because it is the way that it has always 
> been done by computer scientists doesn't mean that a 
> different way might not be easier 

Oh I agree, the point I was making is that lots of 
language designers have tried and they keep coming 
back to variations on a theme...

> > Pascal kind of makes that distinction by having text
> > files as a special category. But binary files are
> > broken into myriad types... FILE OF <FOO>
> 
> If anybody wanted to do that (deal with a file as a 
> collection of 64-bit words instead of bytes, for example

I didn't explain that properly.
Pascal has the rather nice feature of defining binary 
files in terms of the high level dta types you want to 
store. Thus if we define a class Foo we can declare a 
FILE OF FOO

Then we can read/write foo objects to the file as 
binary chunks. This is a very nice feature although 
still not perfect since it can't handle polymorphic 
collections etc. But a lot better than writing
sizeof(foo) bytes each time...

> > Thats actually quite tricky to do. Why not try implememting
> > the interface in Python to see whats involved.... 

I don't mean define the interface I mean actually write 
a prototype of the basic open/read/write/slice operations
as a class.

See how ,any special cases you have to deal with etc.

> system basics - why can't you just read the whole file into a 
> buffer and manipulate that, which occasional flushes to a 
> backup version? 

Usually thats what the programmer will do but for big files
(several hundred megabytes) thats not really an option.
Its coping with these special cases that, makes file 
handling difficult, becayse at the end of the day you 
come up against the hardware which is essentiually 
a long sequence of bytes on a disk or tape!

> understand Linux correctly, this is what the operating system 
> does anyway, or at least that is the excuse everybody keeps 

Nope, Linux is smarter than that. It reads in a bit of 
the file(a page) and then as you read thru the data it 
'pages in' the next segment ready for you to move onto. 
Once you leave the first page it gets deleted and the 
space used for the next page and so on... But this is 
still basically a sequential read of the disk/tape.

> "free" shows me that all my nice RAM is being used for 
> buffers and caches and stuff like that..

Yes the pages are stored in buffers and the output 
is written to a page buffer before eventually being 
flushed to disk - that's why files need a flush 
operation...

To compound matters the underlying limitations tend 
to be as you say the Posix level read/write calls, 
but even they are sdhaped largely by the hardware 
device driver I/O routines which also operate in 
the same way(the BIOS on a PC).

To radically change how we use files we need to change 
how we design the hardware!

After all even our applications(word etc) use the 
same metaphor - Open a file, read it in, modify it, 
write it out...

Alan g.
Author of the 'Learning to Program' web site
http://www.freenetpages.co.uk/hp/alan.gauld