<br>Your list is great. I've been lurking for the past two weeks while I learned the basics. Thanks.<br><br>I am trying to loop thru 2 files and scrape some data, and the loops are not working.<br><br>The script is not getting past the first URL from state_list, as the test print shows.<br>
<br>If someone could point me in the right direction, I'd appreciate it.<br><br>I would also like to know the difference between open() and csv.reader(). I had similar issues with csv.reader() when opening these files.<br>
<br>Any help greatly appreciated.<br><br>Roy<br><br>Code: Select all<br> # DOWNLOAD USGS MISSING FILES<br><br> import mechanize<br> import BeautifulSoup as B_S<br> import re<br> # import urllib<br> import csv<br>
<br> # OPEN FILES<br> # LOOKING FOR THESE SKUs<br> _missing = open('C:\\Documents and Settings\\rhinkelman\\Desktop\\working DB files\\missing_topo_list.csv', 'r')<br> # IN THESE STATES<br> _states = open('C:\\Documents and Settings\\rhinkelman\\Desktop\\working DB files\\state_list.csv', 'r')<br>
# IF NOT FOUND, LIST THEM HERE<br> _missing_files = []<br> # APPEND THIS FILE WITH META<br> _topo_meta = open('C:\\Documents and Settings\\rhinkelman\\Desktop\\working DB files\\topo_meta.csv', 'a')<br>
<br> # OPEN PAGE<br> for each_state in _states:<br> each_state = each_state.replace("\n", "")<br> print each_state<br> html = mechanize.urlopen(each_state)<br> _soup = B_S.BeautifulSoup(html)<br>
<br> # SEARCH THRU PAGE AND FIND ROW CONTAINING META MATCHING SKU<br> _table = _soup.find("table", "tabledata")<br> print _table #test This is returning 'None'<br><br>
for each_sku in _missing:<br> each_sku = each_sku.replace("\n","")<br> print each_sku #test<br> try:<br> _row = _table.find('tr', text=re.compile(each_sku))<br>
except (IOError, AttributeError):<br> _missing_files.append(each_sku)<br> continue<br> else:<br> _row = _row.previous<br> _row = _row.parent<br>
_fields = _row.findAll('td')<br> _name = _fields[1].string<br> _state = _fields[2].string<br> _lat = _fields[4].string<br> _long = _fields[5].string<br>
_sku = _fields[7].string<br><br> _topo_meta.write(_name + "|" + _state + "|" + _lat + "|" + _long + "|" + _sku + "||")<br> <br> print x +': ' + _name<br>
<br> print "Missing Files:"<br> print _missing_files<br> _topo_meta.close()<br> _missing.close()<br> _states.close()<br><br><br>The message I am getting is:<br><br>Code: <br> >>><br> <a href="http://libremap.org/data/state/Colorado/drg/">http://libremap.org/data/state/Colorado/drg/</a><br>
None<br> 33087c2<br> Traceback (most recent call last):<br> File "//Dc1/Data/SharedDocs/Roy/_Coding Vault/Python code samples/usgs_missing_file_META.py", line 34, in <module><br> _row = _table.find('tr', text=re.compile(each_sku))<br>
AttributeError: 'NoneType' object has no attribute 'find'<br><br><br>And the files look like:<br><br>Code: <br> state_list<br> <a href="http://libremap.org/data/state/Colorado/drg/">http://libremap.org/data/state/Colorado/drg/</a><br>
<a href="http://libremap.org/data/state/Connecticut/drg/">http://libremap.org/data/state/Connecticut/drg/</a><br> <a href="http://libremap.org/data/state/Pennsylvania/drg/">http://libremap.org/data/state/Pennsylvania/drg/</a><br>
<a href="http://libremap.org/data/state/South_Dakota/drg/">http://libremap.org/data/state/South_Dakota/drg/</a><br><br> missing_topo_list<br> 33087c2<br> 34087b2<br> 33086b7<br> 34086c2<br><br><br>