[Tutor] Not understanding a bit of code behavior
Bill Allen
wallenpb at gmail.com
Tue Jan 25 07:37:31 CET 2011
Ok, I have definately verified this to myself. The following works
perfectly and is a little easier to understand. In this version, I am
plainly modifying my parts_list iterator thus producing the effect of an
iterator that is growing over the course of the operation of the code. So,
I am convinced that I had previously assigned part_list to out_list by
reference, not value as I mistaken thought when I first wrote the code,
which explains it. It was a silly mistake born from still being new in
Python and thinking in terms of another language I know that typically
assigns by value instead. It had no occurred to me initially that it was
possible to modify an iterator in this way. I do not think most languages
would allow this.
Question, is it possible to copy values from one object to another in such a
way as they are not just references one to the other?
Sorry about asking questions and then answering them. Things began to
become more clear with each question I asked.
def get_BOM(part_list):
x=re.compile('part='+'.*?'+'>')
BOM_List = []
pass_num = 0
for part_num in part_list:
mypath = "http://172.25.8.13/cgi-bin/search/part-url.cgi?part=" +
part_num
mylines = urllib.urlopen(mypath).readlines()
for item in mylines:
if "http://" in item:
if "part=" in item:
xstring=str(x.findall(item)).strip('"[\'part=>\']"')
BOM_List.append(xstring)
for bom_item in BOM_List:
if bom_item not in part_list:
part_list.append(bom_item)
pass_num += 1
return(part_list)
On Tue, Jan 25, 2011 at 00:05, Bill Allen <wallenpb at gmail.com> wrote:
> By the way, my guess as to why this is working for me the way it does is
> that the statement
>
> out_list = part_list
>
> is actually linking these two objects, making them one. My intention had
> been to just assign values from one to the other, but I think I have done
> far more than that. In this case, if that is true, then it has worked out
> well for me, giving me a feedback loop through the data. However, I can see
> that it could also be a pitfall if this behavior is not clearly understood.
> Am I right? Am I way off base? Either way, I could use some elaboration
> about it.
>
>
> --Bill
>
>
>
>
>
>
> On Mon, Jan 24, 2011 at 23:56, Bill Allen <wallenpb at gmail.com> wrote:
>
>> This is a bit embarrassing, but I have crafted a bit of code that does
>> EXACTLY what I what, but I am now a bit baffled as to precisely why. I have
>> written a function to do a bit of webscraping by following links for a
>> project at work. If I leave the code as is, it behaves like it is
>> recursively passing through the data tree- which is what I want. However,
>> if I change it only slightly, it makes only one pass through the top level
>> data. What I do not understand is why is ever behaves as if it is recursive
>> as the function is only called once.
>>
>> If I comment out_list=[] and let out_list-=part_list be used the following
>> parses through the whole tree of data as if recursive. If I use out_list=[]
>> and comment out_list=part_list, it only processes to top level of the data
>> tree.
>>
>> The function is called only once as: Exploded_BOM_List =
>> get_BOM(first_num) in which I pass it a single part number to start with.
>> The webscraping bit goes to a particular webpage about that part where it
>> then picks up more part numbers and repeats the process.
>>
>> So can anyone help me understand why this actually works? Certainly no
>> complaints here about it, but I would like to better understand why changes
>> the behavior so profoundly. All the print statements are just to I could
>> follow out the data flow while working on this. By following the data flow,
>> I am finding that part_list is actually having values added to it during the
>> time the function is running. Problem is, I don't see clearly why that
>> should be so.
>>
>> def get_BOM(part_list):
>> x=re.compile('part='+'.*?'+'>')
>> BOM_List = []
>>
>> # out_list = []
>> out_list = part_list
>> print("called get_BOM")
>> pass_num = 0
>> for part_num in part_list:
>> mypath = "
>> http://xxx.xxx.xxx.xxx/cgi-bin/search/part-url.cgi?part=" + part_num
>> mylines = urllib.urlopen(mypath).readlines()
>> print("pass number ", pass_num)
>> print(mypath)
>> print("PL:",part_list)
>> for item in mylines:
>> if "http://" in item:
>> if "part=" in item:
>> xstring=str(x.findall(item)).strip('"[\'part=>\']"')
>> BOM_List.append(xstring)
>> print("BL:",BOM_List)
>> for bom_item in BOM_List:
>> if bom_item not in out_list:
>> out_list.append(bom_item)
>> print("OL:",out_list)
>> pass_num += 1
>> return(out_list)
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110125/1992d120/attachment.html>
More information about the Tutor
mailing list