handling PAMIE and lxml

elca highcar at gmail.com
Mon Oct 26 23:15:39 CET 2009

Simon Forman-2 wrote:
> On Mon, Oct 26, 2009 at 3:05 AM, elca <highcar at gmail.com> wrote:
>> Hello,
>> i was open anther new thread ,old thread is too long.
> Too long for what?
>> first of all,i really appreciate other many people's help in this
>> newsgroup.
>> im making webscraper now.
>> but still problem with my script source.
>> http://elca.pastebin.com/m52e7d8e0
>> i was upload my script in here.
>> so anybody can modify it.
>> main problem is , if you see line number 74 ,75 in my source,
>> you can see this line "thepage = urllib.urlopen(theurl).read()".
>> i want to change this line to work with Pamie not urllib.
> Why?
>> if anyone help me,really appreciate.
>> thanks in advance.
> I just took a look at your code.  I don't want to be mean but your
> code is insane.
> 1.) you import HTMLParser and fromstring but don't use them.
> 2.) the page_check() function is useless.  All it does is sleep for
> len("www.naver.com") seconds.  Why are you iterating through the
> characters in that string anyway?
> 3.) On line 21 you have a pointless pass statement.
> 4.) The whole "if x:" statement on line 19 is pointless because both
> branches do exactly the same thing.
> 5.) The variables start_line and end_line you're using strings.  This
> is not php. Strings are not automatically converted to integers.
> 6.) Because you never change end_line anywhere, and because you don't
> use break anywhere in the loop body, the while loop on line 39 will
> never end.
> 7.) The while loop on line 39 defines the getit() function (over and
> over again) but never calls it.
> 8.) On line 52 you define a list call "results" and then never use it
> anywhere.
> 9.) In getit() the default value for howmany is 0, but on line 68 you
> subtract 1 from it and the next line you return if not howmany.  This
> means if you ever forget to call getit() with a value of howmany above
> zero that if statement will never return.
> 8.) In the for loop on line 54, in the while loop on line 56, you
> recursively call getit() on line 76.  wtf?  I suspect lines 73-76 are
> at the wrong indentation level.
> 9.) On line 79 you have a "bare" except, which just calls exit(1) on
> the next line.  This replaces the exception you had (which contains
> important information about the error encountered) with a SystemExit
> exception (which does not.)  Note that an uncaught exception will exit
> your script with a non-zero return code, so all you're doing here is
> throwing away debugging information.
> 10.) On line 81 you have 'return()'.  This line will never be reached
> because you just called exit() on the line before.  Also, return is
> not a function, you do not need '()' after it.
> 11.) Why do you sleep for half a second on line 83?
> I cannot believe that this script does anything useful.  I would
> recommend playing with the interactive interpreter for awhile until
> you understand python and what you're doing.  Then worry about Pamie
> vs. urllib.
> -- 
> http://mail.python.org/mailman/listinfo/python-list

thanks for your advice,
all your words is correct.
first of all, i would like to say
that script source is not finished version,
just i was get it here and there, and just collect it.
im not familiar with python,currently im learning python.
but where is end of learning pyton? there is some end of learning python or
i don't think so . also i know what i doing with my script at least.
and what is 'wtf' ? :)

View this message in context: http://www.nabble.com/handling-PAMIE-and-lxml-tp26055230p26068732.html
Sent from the Python - python-list mailing list archive at Nabble.com.

More information about the Python-list mailing list