please critique my thread code

MRAB google at mrabarnett.plus.com
Mon Jun 16 01:48:42 CEST 2008


On Jun 15, 2:29 pm, wins... at cs.wisc.edu wrote:
> I wrote a Python program (103 lines, below) to download developer data
> from SourceForge for research about social networks.
>
> Please critique the code and let me know how to improve it.
>
> An example use of the program:
>
> prompt> python download.py 1 240000
>
> The above command downloads data for the projects with IDs between 1
> and 240000, inclusive. As it runs, it prints status messages, with a
> plus sign meaning that the project ID exists. Else, it prints a minus
> sign.
>
> Questions:
>
> --- Are my setup and use of threads, the queue, and "while True" loop
> correct or conventional?
>
> --- Should the program sleep sometimes, to be nice to the SourceForge
> servers, and so they don't think this is a denial-of-service attack?
>
> --- Someone told me that popen is not thread-safe, and to use
> mechanize. I installed it and followed an example on the web site.
> There wasn't a good description of it on the web site, or I didn't
> find it. Could someone explain what mechanize does?
>
> --- How do I choose the number of threads? I am using a MacBook Pro
> 2.4GHz Intel Core 2 Duo with 4 GB 667 MHz DDR2 SDRAM, running OS
> 10.5.3.
>
> Thank you.
>
> Winston
>
[snip]
String methods are quicker than regular expressions, so don't use
regular expressions if string methods are perfectly adequate. For
example, you can replace:

error_pattern = re.compile(".*\n<!--pageid login -->\n.*", re.DOTALL)
...
valid_id = not error_pattern.match(text)

with:

error_pattern = "\n<!--pageid login -->\n"
...
valid_id = error_pattern not in text



More information about the Python-list mailing list