[Tutor] Can't loop thru file and don't see the problem
Roy Hinkelman
royhink at gmail.com
Thu Dec 3 09:09:05 CET 2009
Your list is great. I've been lurking for the past two weeks while I learned
the basics. Thanks.
I am trying to loop thru 2 files and scrape some data, and the loops are not
working.
The script is not getting past the first URL from state_list, as the test
print shows.
If someone could point me in the right direction, I'd appreciate it.
I would also like to know the difference between open() and csv.reader(). I
had similar issues with csv.reader() when opening these files.
Any help greatly appreciated.
Roy
Code: Select all
# DOWNLOAD USGS MISSING FILES
import mechanize
import BeautifulSoup as B_S
import re
# import urllib
import csv
# OPEN FILES
# LOOKING FOR THESE SKUs
_missing = open('C:\\Documents and
Settings\\rhinkelman\\Desktop\\working DB files\\missing_topo_list.csv',
'r')
# IN THESE STATES
_states = open('C:\\Documents and Settings\\rhinkelman\\Desktop\\working
DB files\\state_list.csv', 'r')
# IF NOT FOUND, LIST THEM HERE
_missing_files = []
# APPEND THIS FILE WITH META
_topo_meta = open('C:\\Documents and
Settings\\rhinkelman\\Desktop\\working DB files\\topo_meta.csv', 'a')
# OPEN PAGE
for each_state in _states:
each_state = each_state.replace("\n", "")
print each_state
html = mechanize.urlopen(each_state)
_soup = B_S.BeautifulSoup(html)
# SEARCH THRU PAGE AND FIND ROW CONTAINING META MATCHING SKU
_table = _soup.find("table", "tabledata")
print _table #test This is returning 'None'
for each_sku in _missing:
each_sku = each_sku.replace("\n","")
print each_sku #test
try:
_row = _table.find('tr', text=re.compile(each_sku))
except (IOError, AttributeError):
_missing_files.append(each_sku)
continue
else:
_row = _row.previous
_row = _row.parent
_fields = _row.findAll('td')
_name = _fields[1].string
_state = _fields[2].string
_lat = _fields[4].string
_long = _fields[5].string
_sku = _fields[7].string
_topo_meta.write(_name + "|" + _state + "|" + _lat + "|" +
_long + "|" + _sku + "||")
print x +': ' + _name
print "Missing Files:"
print _missing_files
_topo_meta.close()
_missing.close()
_states.close()
The message I am getting is:
Code:
>>>
http://libremap.org/data/state/Colorado/drg/
None
33087c2
Traceback (most recent call last):
File "//Dc1/Data/SharedDocs/Roy/_Coding Vault/Python code
samples/usgs_missing_file_META.py", line 34, in <module>
_row = _table.find('tr', text=re.compile(each_sku))
AttributeError: 'NoneType' object has no attribute 'find'
And the files look like:
Code:
state_list
http://libremap.org/data/state/Colorado/drg/
http://libremap.org/data/state/Connecticut/drg/
http://libremap.org/data/state/Pennsylvania/drg/
http://libremap.org/data/state/South_Dakota/drg/
missing_topo_list
33087c2
34087b2
33086b7
34086c2
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20091203/cad756b6/attachment.htm>
More information about the Tutor
mailing list