[Edu-sig] Teaching about files
Beni Cherniavsky
cben at users.sf.net
Thu Nov 11 20:10:18 CET 2004
Danny Yoo wrote:
>
> On Sun, 7 Nov 2004, Kent Johnson wrote:
>
>>So my question is, am I missing something here? Is f.read(n) important?
>
> Hi Kent,
>
> The most common use for f.read(n) in my personal experience has been in
> conjunction with the 'md5' module on really large files. I have sometimes
> done read(1), for character-by-character stuff. But otherwise, I tend to
> use files as iterators.
>
I once did some profiling on character-by-character stuff and
discovered that simply iterating over ``f.read()`` is much faster
than repeated ``f.read(1)``, somewhat faster than reading chunks with
``f.read(n)`` and iterating over each one (nested loop) and not slower
than ``mmap.mmap(f)``. I was using it for files of several MBs and I
had enough RAM to contain it. Memory-mapping is probably the best
approach for heavy processing but must be conditionalized -- it isn't
always availiable. So I went with ``f.read()`` for simplicity.
> I try to deemphasize read() and readlines(), as sucking a whole file as a
> list isn't a technique that will scale well with large inputs.
>
``f.read()`` is very useful for simple tasks. First show the simplest
way to do something, optimize only when needed.
`readlines()` is indeed not very useful because almost in all case where
it is applicable, ``for line in f`` is sufficient.
BTW, the `inputfile` module should definitely be mentioned -- it's very
handy for writing useful scripts.
--
And data is not difficult. It's only data. If you have
too much, filter it. If it's not what you want, map it.
-- Dive Into Python, chapter 16.5 "Data-centric programming"
More information about the Edu-sig
mailing list