[Tutor] Not understanding a bit of code behavior

Tue Jan 25 07:37:31 CET 2011

Ok, I have definately verified this to myself.   The following works
perfectly and is a little easier to understand.  In this version, I am
plainly modifying my parts_list iterator thus producing the effect of an
iterator that is growing over the course of the operation of the code.  So,
I am convinced that I had previously assigned part_list to out_list by
reference, not value as I mistaken thought when I first wrote the code,
which explains it.  It was a silly mistake born from still being new in
Python and thinking in terms of another language I know that typically
assigns by value instead.  It had no occurred to me initially that it was
possible to modify an iterator in this way.  I do not think most languages
would allow this.

Question, is it possible to copy values from one object to another in such a
way as they are not just references one to the other?

Sorry about asking questions and then answering them.  Things began to
become more clear with each question I asked.

def get_BOM(part_list):
    x=re.compile('part='+'.*?'+'>')
    BOM_List = []

    pass_num = 0
    for part_num in part_list:
        mypath = "http://172.25.8.13/cgi-bin/search/part-url.cgi?part=" +
part_num
        mylines = urllib.urlopen(mypath).readlines()
        for item in mylines:
            if "http://" in item:
                if "part=" in item:
                    xstring=str(x.findall(item)).strip('"[\'part=>\']"')
                    BOM_List.append(xstring)
        for bom_item in BOM_List:
            if bom_item not in part_list:
                part_list.append(bom_item)
        pass_num += 1
    return(part_list)

On Tue, Jan 25, 2011 at 00:05, Bill Allen <wallenpb at gmail.com> wrote:

> By the way, my guess as to why this is working for me the way it does is
> that the statement
>
> out_list = part_list
>
> is actually linking these two objects, making them one.   My intention had
> been to just assign values from one to the other, but I think I have done
> far more than that.   In this case, if that is true, then it has worked out
> well for me, giving me a feedback loop through the data.  However, I can see
> that it could also be a pitfall if this behavior is not clearly understood.
> Am I right?   Am I way off base?  Either way, I could use some elaboration
> about it.
>
>
> --Bill
>
>
>
>
>
>
> On Mon, Jan 24, 2011 at 23:56, Bill Allen <wallenpb at gmail.com> wrote:
>
>> This is a bit embarrassing, but I have crafted a bit of code that does
>> EXACTLY what I what, but I am now a bit baffled as to precisely why.  I have
>> written a function to do a bit of webscraping by following links for a
>> project at work.  If I leave the code as is, it behaves like it is
>> recursively passing through the data tree- which is what I want.  However,
>> if I change it only slightly, it makes only one pass through the top level
>> data.  What I do not understand is why is ever behaves as if it is recursive
>> as the function is only called once.
>>
>> If I comment out_list=[] and let out_list-=part_list be used the following
>> parses through the whole tree of data as if recursive.  If I use out_list=[]
>> and comment out_list=part_list, it only processes to top level of the data
>> tree.
>>
>> The function is called only once as:  Exploded_BOM_List =
>> get_BOM(first_num)  in which I pass it a single part number to start with.
>> The webscraping bit goes to a particular webpage about that part where it
>> then picks up more part numbers and repeats the process.
>>
>> So can anyone help me understand why this actually works?  Certainly no
>> complaints here about it, but I would like to better understand why changes
>> the behavior so profoundly.  All the print statements are just to I could
>> follow out the data flow while working on this.  By following the data flow,
>> I am finding that part_list is actually having values added to it during the
>> time the function is running.   Problem is, I don't see clearly why that
>> should be so.
>>
>> def get_BOM(part_list):
>>     x=re.compile('part='+'.*?'+'>')
>>     BOM_List = []
>>
>> #    out_list = []
>>     out_list = part_list
>>     print("called get_BOM")
>>     pass_num = 0
>>     for part_num in part_list:
>>         mypath = "
>> http://xxx.xxx.xxx.xxx/cgi-bin/search/part-url.cgi?part=" + part_num
>>         mylines = urllib.urlopen(mypath).readlines()
>>         print("pass number ", pass_num)
>>         print(mypath)
>>         print("PL:",part_list)
>>         for item in mylines:
>>             if "http://" in item:
>>                 if "part=" in item:
>>                     xstring=str(x.findall(item)).strip('"[\'part=>\']"')
>>                     BOM_List.append(xstring)
>>                     print("BL:",BOM_List)
>>         for bom_item in BOM_List:
>>             if bom_item not in out_list:
>>                 out_list.append(bom_item)
>>                 print("OL:",out_list)
>>         pass_num += 1
>>     return(out_list)
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110125/1992d120/attachment.html>