[Python-Dev] Re: Use for enumerate

Ian Kjos ikjos@email.uophx.edu
Sat, 27 Apr 2002 12:50:39 -0500


Towards the "short code" challenge:

def g(n, x):
    if x<= 0: return ''
    f=file(n)
    for i in range(x): l = f.readline()
    f.close()
    return l


It's tested on 2.2.1, and correctly returns '' for out of range. It won't
stop at EOF, though.

PS: This is a good argument for pre-object-death destructors a'la C++. I
think resources should be finalized as soon as they are no longer accessable
through other means. I realize that there are issues to be resolved; perl
does it by following the law of most (!) astonishment. /me dives into some
documentation.


----- Original Message -----
From: <python-dev-request@python.org>
To: <python-dev@python.org>
Sent: Saturday, April 27, 2002 11:00 AM
Subject: Python-Dev digest, Vol 1 #2187 - 5 msgs


> Send Python-Dev mailing list submissions to
> python-dev@python.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://mail.python.org/mailman/listinfo/python-dev
> or, via email, send a message with subject or body 'help' to
> python-dev-request@python.org
>
> You can reach the person managing the list at
> python-dev-admin@python.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Python-Dev digest..."
>
>
> Today's Topics:
>
>    1. Re: Use for enumerate() (Raymond Hettinger)
>    2. RE: Use for enumerate() (Tim Peters)
>    3. Re: Use for enumerate() (Michael Gilfix)
>    4. Re: Use for enumerate() (holger krekel)
>    5. Re: Use for enumerate() (Guido van Rossum)
>
> --__--__--
>
> Message: 1
> From: "Raymond Hettinger" <python@rcn.com>
> To: <python-dev@python.org>
> Subject: Re: [Python-Dev] Use for enumerate()
> Date: Sat, 27 Apr 2002 01:08:43 -0400
>
> > Challenge 3: do it faster and with less code.
>
> def getline(filename, lineno):
>     if lineno < 1:
>         return ''
>     f = open(filename)
>     i, line = zip(xrange(lineno), f)[-1]
>     f.close()
>     if i+1 == lineno:
>         return line
>     return ''
>
> To keep to the spirit of the challenge, I'm ignoring that
> the function is i/o bound which would lead to using
> an 'rb' read and doing .finds or .counts on '\n'.
>
> The approach is to vectorize, trading away memory
> allocation time and xrange time to save the overhead
> of the pure Python loop and test cycle.
>
> The test is saved by taking advantage of zip's feature
> which stops when the first iterator is exhausted.
>
>
> Raymond Hettinger
>
>
>
>
>
> --__--__--
>
> Message: 2
> Date: Sat, 27 Apr 2002 01:56:56 -0400
> From: Tim Peters <tim.one@comcast.net>
> Subject: RE: [Python-Dev] Use for enumerate()
> To: python-dev@python.org
>
> >> Challenge 3: do it faster and with less code.
>
> [Raymond Hettinger]
> > def getline(filename, lineno):
> >     if lineno < 1:
> >         return ''
> >     f = open(filename)
> >     i, line = zip(xrange(lineno), f)[-1]
> >     f.close()
> >     if i+1 == lineno:
> >         return line
> >     return ''
>
> Hmm.  On my box it's a little slower than Guido's getline on my standard
> <wink> test, here calling that function g3 (g2 and the timing driver were
> posted before; the input is Zope's DateTime.py, a 1657-line Python source
> file):
>
> getline 4.85231314638
> g2 2.8915829967
> g3 5.19037613772
>
> That's a curious result, since, as you say:
>
> > The approach is to vectorize, trading away memory
> > allocation time and xrange time to save the overhead
> > of the pure Python loop and test cycle.
>
> It gets a speed boost to below 5.0 if I use range instead of xrange.
>
> It suggests this alternative, which is a tiny bit shorter and
significantly
> faster than Guido's:
>
> def g4(filename, lineno):
>     if lineno < 1:
>         return ''
>     f = open(filename)
>     get = iter(f).next
>     try:
>         for i in range(lineno): line = get()
>     except StopIteration:
>         pass
>     f.close()
>     return line
>
> That weighs in at 4.04 seconds on my test case.
>
> I think the lesson to take is that building gobs of 2-tuples is more
> expensive than taking the same number of quick trips around the eval loop.
> Guido's and your function both build gobs of 2-tuples, while the zippier
g4
> and much zippier g2 avoid that.
>
> > ...
> > The test is saved by taking advantage of zip's feature
> > which stops when the first iterator is exhausted.
>
> It is clever!  Too bad it's pig slow <wink>.
>
>
>
>
> --__--__--
>
> Message: 3
> Date: Sat, 27 Apr 2002 02:11:56 -0400
> From: Michael Gilfix <mgilfix@eecs.tufts.edu>
> To: Tim Peters <tim.one@comcast.net>
> Cc: python-dev@python.org
> Subject: Re: [Python-Dev] Use for enumerate()
> Reply-To: mgilfix@eecs.tufts.edu
>
> On Sat, Apr 27 @ 00:24, Tim Peters wrote:
> > Of course Guido never explicitly said it had to return a correct answer,
in
> > which case I vote for
> >
> >     g=lambda*a:''
> >
> > as both shortest and fastest by any measures <wink>.
>
>   Just return before you start :)
>
>          -- Mike
>
> --
> Michael Gilfix
> mgilfix@eecs.tufts.edu
>
> For my gpg public key:
> http://www.eecs.tufts.edu/~mgilfix/contact.html
>
>
>
> --__--__--
>
> Message: 4
> Date: Sat, 27 Apr 2002 12:23:00 +0200
> From: holger krekel <pyth@devel.trillke.net>
> To: Tim Peters <tim.one@comcast.net>
> Cc: python-dev@python.org
> Subject: Re: [Python-Dev] Use for enumerate()
>
> On Sat, Apr 27, 2002 at 12:24:46AM -0400, Tim Peters wrote:
> > [holger krekel]
> > > measured with your driver script the following code
> > > is about 10-20% faster than 'g2' and it gets close to
> > >
> > > > > Challenge 3: do it faster and with less code.
> > >
> > > def g3(filename, lineno):
> > >     if lineno>0:
> > >         f = file(filename)
> > >         while lineno>0:
> > >             read = f.read(1024)
> > >             count = read.count('\n')
> > >             lineno-=count or lineno
> > >         f.close()
> > >         if lineno<count:
> > >             return read.split('\n')[lineno+count-1]
> > >     return ''
> > >
> > > vertically it's one line less but it stretches
> > > a bit horizontally. But it is more portable :-)
> >
> > What if (the simplest example of what can go wrong) the first line is
more
> > than 1024 characters long, and the caller asks for lineno 1?
>
> right. i noticed after sending it that you also need to
> change
>          lineno-= count or lineno
> into
>          lineno-= count or read and lineno
>
> and the problem of lines longer>1024 and around the 1024 boundary remains.
> Solving this probably blows the code by several lines.
>
> But anyway, isn't reading and counting still a faster technique than
> using readlines() esp. for larger files?
>
>      holger
>
>
>
> --__--__--
>
> Message: 5
> To: "Raymond Hettinger" <python@rcn.com>
> cc: python-dev@python.org
> Subject: Re: [Python-Dev] Use for enumerate()
> From: Guido van Rossum <guido@python.org>
> Date: Sat, 27 Apr 2002 09:26:17 -0400
>
> > > Challenge 3: do it faster and with less code.
> >
> > def getline(filename, lineno):
> >     if lineno < 1:
> >         return ''
> >     f = open(filename)
> >     i, line = zip(xrange(lineno), f)[-1]
> >     f.close()
> >     if i+1 == lineno:
> >         return line
> >     return ''
>
> Cute, but it builds up a list containing all the lines up to lineno.
> An implicit part of the exercise (sorry for not making this explicit)
> was to avoid this -- IOW it should work even if the file is too large
> to fit in memory (as long as each individual line fits in memory).
>
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>
>
>
>
> --__--__--
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
>
>
> End of Python-Dev Digest
>