handling PAMIE and lxml

Simon Forman sajmikins at gmail.com
Mon Oct 26 14:39:29 EDT 2009


On Mon, Oct 26, 2009 at 3:05 AM, elca <highcar at gmail.com> wrote:
>
> Hello,
> i was open anther new thread ,old thread is too long.

Too long for what?

> first of all,i really appreciate other many people's help in this newsgroup.
> im making webscraper now.
> but still problem with my script source.
> http://elca.pastebin.com/m52e7d8e0
> i was upload my script in here.
> so anybody can modify it.
> main problem is , if you see line number 74 ,75 in my source,
> you can see this line "thepage = urllib.urlopen(theurl).read()".
> i want to change this line to work with Pamie not urllib.

Why?

> if anyone help me,really appreciate.
> thanks in advance.

I just took a look at your code.  I don't want to be mean but your
code is insane.

1.) you import HTMLParser and fromstring but don't use them.

2.) the page_check() function is useless.  All it does is sleep for
len("www.naver.com") seconds.  Why are you iterating through the
characters in that string anyway?

3.) On line 21 you have a pointless pass statement.

4.) The whole "if x:" statement on line 19 is pointless because both
branches do exactly the same thing.

5.) The variables start_line and end_line you're using strings.  This
is not php. Strings are not automatically converted to integers.

6.) Because you never change end_line anywhere, and because you don't
use break anywhere in the loop body, the while loop on line 39 will
never end.

7.) The while loop on line 39 defines the getit() function (over and
over again) but never calls it.

8.) On line 52 you define a list call "results" and then never use it anywhere.

9.) In getit() the default value for howmany is 0, but on line 68 you
subtract 1 from it and the next line you return if not howmany.  This
means if you ever forget to call getit() with a value of howmany above
zero that if statement will never return.

8.) In the for loop on line 54, in the while loop on line 56, you
recursively call getit() on line 76.  wtf?  I suspect lines 73-76 are
at the wrong indentation level.

9.) On line 79 you have a "bare" except, which just calls exit(1) on
the next line.  This replaces the exception you had (which contains
important information about the error encountered) with a SystemExit
exception (which does not.)  Note that an uncaught exception will exit
your script with a non-zero return code, so all you're doing here is
throwing away debugging information.

10.) On line 81 you have 'return()'.  This line will never be reached
because you just called exit() on the line before.  Also, return is
not a function, you do not need '()' after it.

11.) Why do you sleep for half a second on line 83?


I cannot believe that this script does anything useful.  I would
recommend playing with the interactive interpreter for awhile until
you understand python and what you're doing.  Then worry about Pamie
vs. urllib.



More information about the Python-list mailing list