urllib2 - iteration over non-sequence
Erik Max Francis
max at alcyone.com
Sun Jun 10 07:54:47 CEST 2007
Gary Herron wrote:
> Certainly there's are cases where xreadlines or read(bytecount) are
> reasonable, but only if the total pages size is *very* large. But for
> most web pages, you guys are just nit-picking (or showing off) to
> suggest that the full read implemented by readlines is wasteful.
> Moreover, the original problem was with sockets -- which don't have
> xreadlines. That seems to be a method on regular file objects.
> For simplicity, I'd still suggest my original use of readlines. If
> and when you find you are downloading web pages with sizes that are
> putting a serious strain on your memory footprint, then one of the other
> suggestions might be indicated.
It isn't nitpicking to point out that you're making something that will
consume vastly more amounts of memory than it could possibly need. And
insisting that pages aren't _always_ huge is just a silly cop-out; of
course pages get very large.
There is absolutely no reason to read the entire file into memory (which
is what you're doing) before processing it. This is a good example of
the principle of there is one obvious right way to do it -- and it isn't
to read the whole thing in first for no reason whatsoever other than to
avoid an `x`.
Erik Max Francis && max at alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM, Y!M erikmaxfrancis
The more violent the love, the more violent the anger.
-- _Burmese Proverbs_ (tr. Hla Pe)
More information about the Python-list