<br>Your list is great. I&#39;ve been lurking for the past two weeks while I learned the basics. Thanks.<br><br>I am trying to loop thru 2 files and scrape some data, and the loops are not working.<br><br>The script is not getting past the first URL from state_list, as the test print shows.<br>
<br>If someone could point me in the right direction, I&#39;d appreciate it.<br><br>I would also like to know the difference between open() and csv.reader(). I had similar issues with csv.reader() when opening these files.<br>
<br>Any help greatly appreciated.<br><br>Roy<br><br>Code: Select all<br>    # DOWNLOAD USGS MISSING FILES<br><br>    import mechanize<br>    import BeautifulSoup as B_S<br>    import re<br>    # import urllib<br>    import csv<br>
<br>    # OPEN FILES<br>    # LOOKING FOR THESE SKUs<br>    _missing = open(&#39;C:\\Documents and Settings\\rhinkelman\\Desktop\\working DB files\\missing_topo_list.csv&#39;, &#39;r&#39;)<br>    # IN THESE STATES<br>    _states = open(&#39;C:\\Documents and Settings\\rhinkelman\\Desktop\\working DB files\\state_list.csv&#39;, &#39;r&#39;)<br>
    # IF NOT FOUND, LIST THEM HERE<br>    _missing_files = []<br>    # APPEND THIS FILE WITH META<br>    _topo_meta = open(&#39;C:\\Documents and Settings\\rhinkelman\\Desktop\\working DB files\\topo_meta.csv&#39;, &#39;a&#39;)<br>
<br>    # OPEN PAGE<br>    for each_state in _states:<br>        each_state = each_state.replace(&quot;\n&quot;, &quot;&quot;)<br>        print each_state<br>        html = mechanize.urlopen(each_state)<br>        _soup = B_S.BeautifulSoup(html)<br>
       <br>        # SEARCH THRU PAGE AND FIND ROW CONTAINING META MATCHING SKU<br>        _table = _soup.find(&quot;table&quot;, &quot;tabledata&quot;)<br>        print _table #test This is returning &#39;None&#39;<br><br>
        for each_sku in _missing:<br>            each_sku = each_sku.replace(&quot;\n&quot;,&quot;&quot;)<br>            print each_sku #test<br>            try:<br>                _row = _table.find(&#39;tr&#39;, text=re.compile(each_sku))<br>
            except (IOError, AttributeError):<br>                _missing_files.append(each_sku)<br>                continue<br>            else:<br>                _row = _row.previous<br>                _row = _row.parent<br>
                _fields = _row.findAll(&#39;td&#39;)<br>                _name = _fields[1].string<br>                _state = _fields[2].string<br>                _lat = _fields[4].string<br>                _long = _fields[5].string<br>
                _sku = _fields[7].string<br><br>                _topo_meta.write(_name + &quot;|&quot; + _state + &quot;|&quot; + _lat + &quot;|&quot; + _long + &quot;|&quot; + _sku + &quot;||&quot;)<br>           <br>            print x +&#39;: &#39; + _name<br>
<br>    print &quot;Missing Files:&quot;<br>    print _missing_files<br>    _topo_meta.close()<br>    _missing.close()<br>    _states.close()<br><br><br>The message I am getting is:<br><br>Code: <br>    &gt;&gt;&gt;<br>    <a href="http://libremap.org/data/state/Colorado/drg/">http://libremap.org/data/state/Colorado/drg/</a><br>
    None<br>    33087c2<br>    Traceback (most recent call last):<br>      File &quot;//Dc1/Data/SharedDocs/Roy/_Coding Vault/Python code samples/usgs_missing_file_META.py&quot;, line 34, in &lt;module&gt;<br>        _row = _table.find(&#39;tr&#39;, text=re.compile(each_sku))<br>
    AttributeError: &#39;NoneType&#39; object has no attribute &#39;find&#39;<br><br><br>And the files look like:<br><br>Code: <br>    state_list<br>    <a href="http://libremap.org/data/state/Colorado/drg/">http://libremap.org/data/state/Colorado/drg/</a><br>
    <a href="http://libremap.org/data/state/Connecticut/drg/">http://libremap.org/data/state/Connecticut/drg/</a><br>    <a href="http://libremap.org/data/state/Pennsylvania/drg/">http://libremap.org/data/state/Pennsylvania/drg/</a><br>
    <a href="http://libremap.org/data/state/South_Dakota/drg/">http://libremap.org/data/state/South_Dakota/drg/</a><br><br>    missing_topo_list<br>    33087c2<br>    34087b2<br>    33086b7<br>    34086c2<br><br><br>