[Tutor] code improvement for beginner ?

lmac lopoff at gmx.net
Mon Oct 10 21:50:37 CEST 2005


Danny Yoo wrote:
> 
> On Sat, 8 Oct 2005, lmac wrote:
> 
> 
>>Ok. Here we go. Wanted to start my page long ago. Now is the right time.
>>
>>http://daderoid.freewebspace24.de/python/python1.html
> 
> 
> Hi lmac,
> 
> I'll pick out some stuff that I see; I'm sure others will be happy to give
> comments too.  I'll try to make sure that all the criticism I give is
> constructive in nature, and if you have questions on any of it, please
> feel free to ask about it.
> 
> 
> I'll concentrate on the imgreg() function first.  The declaration of
> 'images' as a global variable looks a little weird.  I do see that
> 'imgreg' feeds into 'images'.  Not a major issue so far, but you might
> want to see if it's possible to do without the global, and explicitly pass
> in 'images' as another parameter to imgreg.  Globals just bother me on
> principle.  *grin*
> 
The thing is i want to download the images after i got the pages. So i
thought i use an global var that it is in scope at the end of the script.
> 
> You may want to document what 'patt' and 'search' are meant to be.  A
> comment at the top of imgreg, like:
> 
>      """imgreg searches for a pattern 'patt' within the text 'search'.  If
>      a match exists, adds it to the set of images, and returns 1.  Else,
>      returns 0."""
> 
> will help a lot.  Documenting the intent of a function is important,
> because people are forgetful.  No programming language can prevent memory
> loss:  what we should try to do is to compensate for our forgetfulness.
> *grin*
> 
> 
> Looking at pageimgs(): I'm not sure what 't' means in the open statement:
> 
> 	f = open(filename, "rt")
> 
> and I think that 't' might be a typo: I'm surprised that Python doesn't
> complain.  Can anyone confirm this?  I think you may have tried to do "r+"
> mode, but even then, you probably don't: you're just reading from the
> file, and don't need to write back to it.
> 
> 
> Looking further into pageimgs(): again, I get nervous about globals.  The
> use of the 'r1' global variable is mysterious.  I had to hunt around to
> figure out what it was it near the middle of the program.
> 
> If anything, I'd recommend naming your global variables with more meaning.
> A name like 'image_regex_patterns' will work better than 'r1'.  Also, it
> looks like pageimgs() is hardcoded to assume 'r1' has three regular
> expressions in it, as it calls imgreg three times for each pattern in r1.
> 
> 		if imgreg(r1[0],a) == 1:
> 			continue
> 		if imgreg(r1[1],a) == 1:
> 			continue
> 		imgreg(r1[2],a)
> 
> and that looks peculiar.  Because this snippet of code is also
> copy-and-pasted around line 106, it appears to be a specific kind of
> conceptual task that you're doing to register images.
> 
> I think that the use of 'r1' and 'imgreg' should be intrinsically tied.
> I'd recommend revising imgreg() so that when we register images, we don't
> have to worry that we've called it on all the regular expressions in r1.
> That is, let imgreg worry about it, not clients: have imgreg go through
> r1[0:3] by itself.
> 
> 
> If we incorporate these changes, the result might look something like
> this:
> 
> ###################################################
> image_regex_patterns = map(re.compile,
>                            [r'http://\w+.\w+.\w+/i.+.gif',
>                             r'http://\w+.\w+.\w+/i.+.jpg',
>                             r'http://\w+.\w+.\w+/i.+.png'])
This one is very good. I stumbled sometimes over map() but didn't know
how to use it. This makes it easier.
> def imgreg(search):
>     """Given a search text, looks for image urls for registration.  If
> a new one can be found, returns 1.  Otherwise, returns 0.
> 
> Possible bug: does not register all images in the search text, but only
> the first new one it can find.
> """
>     for r in image_regex_patterns:
>         z = r.search(search)
>         if z != None:
>             x = z.group(0)
>             if x not in images:
>                 images.append(x)
>                 return 1
>     return 0
> ###################################################
The purpose for storing the images in an global list was to download all
images after the pages were saved and don't download an image
again if it was already saved and downloaded on disk. So with
this list i have an good overview.
> 
> Does this make sense?
> 
Yes. Thats good.
> The point of this restructuring is to allow you to add more image types
> without too much pain, since there's no more hardcoded array indexing
> against r1.  It also simplifies to calls to imgreg from:
> 
>     if imgreg(r1[0],a) == 1:
>         continue
>     if imgreg(r1[1],a) == 1:
>         continue
>     imgreg(r1[2],a)
> 
> to the simpler:
> 
>     imgreg(a)
> 
> 
> I think I'll stop there and look at the program again later.  *grin* Hope
> this helps!
> 
> 
I made some significant changes ;-)

http://daderoid.freewebspace24.de/python/python1.html#version2

---------------

The problem with downloading the images is this:

-------------
http://images.nfl.com/images/globalnav-shadow-gray.gif
Traceback (most recent call last):
  File "/home/internet/bin/nflgrab.py", line 167, in ?
    urllib.urlretrieve(img,img[f:])
  File "/usr/lib/python2.3/urllib.py", line 83, in urlretrieve
    return _urlopener.retrieve(url, filename, reporthook, data)
  File "/usr/lib/python2.3/urllib.py", line 216, in retrieve
    tfp = open(filename, 'wb')
IOError: [Errno 13] Permission denied: '/globalnav-shadow-gray.gif'
-------------

Is there any solution to know if i can download the image ?

Thanks.





More information about the Tutor mailing list