[Edu-sig] Teaching about files

Thu Nov 11 20:10:18 CET 2004

Danny Yoo wrote:
> 
> On Sun, 7 Nov 2004, Kent Johnson wrote:
> 
>>So my question is, am I missing something here? Is f.read(n) important?
> 
> Hi Kent,
> 
> The most common use for f.read(n) in my personal experience has been in
> conjunction with the 'md5' module on really large files.  I have sometimes
> done read(1), for character-by-character stuff.  But otherwise, I tend to
> use files as iterators.
> 
I once did some profiling on character-by-character stuff and
discovered that simply iterating over ``f.read()`` is much faster
than repeated ``f.read(1)``, somewhat faster than reading chunks with 
``f.read(n)`` and iterating over each one (nested loop) and not slower 
than ``mmap.mmap(f)``.  I was using it for files of several MBs and I 
had enough RAM to contain it.  Memory-mapping is probably the best 
approach for heavy processing but must be conditionalized -- it isn't 
always availiable.  So I went with ``f.read()`` for simplicity.

> I try to deemphasize read() and readlines(), as sucking a whole file as a
> list isn't a technique that will scale well with large inputs.
> 
``f.read()`` is very useful for simple tasks.  First show the simplest
way to do something, optimize only when needed.

`readlines()` is indeed not very useful because almost in all case where 
it is applicable, ``for line in f`` is sufficient.

BTW, the `inputfile` module should definitely be mentioned -- it's very 
handy for writing useful scripts.

-- 
And data is not difficult.  It's only data.  If you have
too much, filter it.  If it's not what you want, map it.
-- Dive Into Python, chapter 16.5 "Data-centric programming"