[Tutor] Not understanding a bit of code behavior
Bill Allen
wallenpb at gmail.com
Tue Jan 25 07:05:42 CET 2011
By the way, my guess as to why this is working for me the way it does is
that the statement
out_list = part_list
is actually linking these two objects, making them one. My intention had
been to just assign values from one to the other, but I think I have done
far more than that. In this case, if that is true, then it has worked out
well for me, giving me a feedback loop through the data. However, I can see
that it could also be a pitfall if this behavior is not clearly understood.
Am I right? Am I way off base? Either way, I could use some elaboration
about it.
--Bill
On Mon, Jan 24, 2011 at 23:56, Bill Allen <wallenpb at gmail.com> wrote:
> This is a bit embarrassing, but I have crafted a bit of code that does
> EXACTLY what I what, but I am now a bit baffled as to precisely why. I have
> written a function to do a bit of webscraping by following links for a
> project at work. If I leave the code as is, it behaves like it is
> recursively passing through the data tree- which is what I want. However,
> if I change it only slightly, it makes only one pass through the top level
> data. What I do not understand is why is ever behaves as if it is recursive
> as the function is only called once.
>
> If I comment out_list=[] and let out_list-=part_list be used the following
> parses through the whole tree of data as if recursive. If I use out_list=[]
> and comment out_list=part_list, it only processes to top level of the data
> tree.
>
> The function is called only once as: Exploded_BOM_List =
> get_BOM(first_num) in which I pass it a single part number to start with.
> The webscraping bit goes to a particular webpage about that part where it
> then picks up more part numbers and repeats the process.
>
> So can anyone help me understand why this actually works? Certainly no
> complaints here about it, but I would like to better understand why changes
> the behavior so profoundly. All the print statements are just to I could
> follow out the data flow while working on this. By following the data flow,
> I am finding that part_list is actually having values added to it during the
> time the function is running. Problem is, I don't see clearly why that
> should be so.
>
> def get_BOM(part_list):
> x=re.compile('part='+'.*?'+'>')
> BOM_List = []
>
> # out_list = []
> out_list = part_list
> print("called get_BOM")
> pass_num = 0
> for part_num in part_list:
> mypath = "http://xxx.xxx.xxx.xxx/cgi-bin/search/part-url.cgi?part="
> + part_num
> mylines = urllib.urlopen(mypath).readlines()
> print("pass number ", pass_num)
> print(mypath)
> print("PL:",part_list)
> for item in mylines:
> if "http://" in item:
> if "part=" in item:
> xstring=str(x.findall(item)).strip('"[\'part=>\']"')
> BOM_List.append(xstring)
> print("BL:",BOM_List)
> for bom_item in BOM_List:
> if bom_item not in out_list:
> out_list.append(bom_item)
> print("OL:",out_list)
> pass_num += 1
> return(out_list)
>
>
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110125/bdfd1595/attachment-0001.html>
More information about the Tutor
mailing list