Thank you very much!<br><br>I had forgotten that unix URLs are case sensitive. <br><br>Also, I changed my 'For' statements to your suggestion, tweaked the exception code a little, and it's working.<br><br>So, there are obviously several ways to open files. Do you have a standard practice, or does it depend on the file format? <br>
<br>I will eventually be working with Excel and possibly mssql tables. <br><br>Thanks again for your help.<br><br>Roy<br><br><br><br><div class="gmail_quote">On Thu, Dec 3, 2009 at 3:46 AM, Christian Witts <span dir="ltr"><<a href="mailto:cwitts@compuscan.co.za">cwitts@compuscan.co.za</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div><div></div><div class="h5">Roy Hinkelman wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br>
Your list is great. I've been lurking for the past two weeks while I learned the basics. Thanks.<br>
<br>
I am trying to loop thru 2 files and scrape some data, and the loops are not working.<br>
<br>
The script is not getting past the first URL from state_list, as the test print shows.<br>
<br>
If someone could point me in the right direction, I'd appreciate it.<br>
<br>
I would also like to know the difference between open() and csv.reader(). I had similar issues with csv.reader() when opening these files.<br>
<br>
Any help greatly appreciated.<br>
<br>
Roy<br>
<br>
Code: Select all<br>
# DOWNLOAD USGS MISSING FILES<br>
<br>
import mechanize<br>
import BeautifulSoup as B_S<br>
import re<br>
# import urllib<br>
import csv<br>
<br>
# OPEN FILES<br>
# LOOKING FOR THESE SKUs<br>
_missing = open('C:\\Documents and Settings\\rhinkelman\\Desktop\\working DB files\\missing_topo_list.csv', 'r')<br>
# IN THESE STATES<br>
_states = open('C:\\Documents and Settings\\rhinkelman\\Desktop\\working DB files\\state_list.csv', 'r')<br>
# IF NOT FOUND, LIST THEM HERE<br>
_missing_files = []<br>
# APPEND THIS FILE WITH META<br>
_topo_meta = open('C:\\Documents and Settings\\rhinkelman\\Desktop\\working DB files\\topo_meta.csv', 'a')<br>
<br>
# OPEN PAGE<br>
for each_state in _states:<br>
each_state = each_state.replace("\n", "")<br>
print each_state<br>
html = mechanize.urlopen(each_state)<br>
_soup = B_S.BeautifulSoup(html)<br>
# SEARCH THRU PAGE AND FIND ROW CONTAINING META MATCHING SKU<br>
_table = _soup.find("table", "tabledata")<br>
print _table #test This is returning 'None'<br>
<br>
</blockquote></div></div>
If you take a look at the webpage you open up, you will notice there are no tables. Are you certain you are using the correct URLs for this ?<div class="im"><br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
for each_sku in _missing:<br>
</blockquote></div>
The for loop `for each_sku in _missing:` will only iterate once, you can either pre-read it into a list / dictionary / set (whichever you prefer) or change it to<br>
_missing_filename = 'C:\\Documents and Settings\\rhinkelman\\Desktop\\working DB files\\missing_topo_list.csv'<br>
for each_sku in open(_missing_filename):<br>
# carry on here<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div><div></div><div class="h5">
each_sku = each_sku.replace("\n","")<br>
print each_sku #test<br>
try:<br>
_row = _table.find('tr', text=re.compile(each_sku))<br>
except (IOError, AttributeError):<br>
_missing_files.append(each_sku)<br>
continue<br>
else:<br>
_row = _row.previous<br>
_row = _row.parent<br>
_fields = _row.findAll('td')<br>
_name = _fields[1].string<br>
_state = _fields[2].string<br>
_lat = _fields[4].string<br>
_long = _fields[5].string<br>
_sku = _fields[7].string<br>
<br>
_topo_meta.write(_name + "|" + _state + "|" + _lat + "|" + _long + "|" + _sku + "||")<br>
print x +': ' + _name<br>
<br>
print "Missing Files:"<br>
print _missing_files<br>
_topo_meta.close()<br>
_missing.close()<br>
_states.close()<br>
<br>
<br>
The message I am getting is:<br>
<br>
Code:<br>
>>><br>
<a href="http://libremap.org/data/state/Colorado/drg/" target="_blank">http://libremap.org/data/state/Colorado/drg/</a><br>
None<br>
33087c2<br>
Traceback (most recent call last):<br>
File "//Dc1/Data/SharedDocs/Roy/_Coding Vault/Python code samples/usgs_missing_file_META.py", line 34, in <module><br>
_row = _table.find('tr', text=re.compile(each_sku))<br>
AttributeError: 'NoneType' object has no attribute 'find'<br>
<br>
<br>
And the files look like:<br>
<br>
Code:<br>
state_list<br>
<a href="http://libremap.org/data/state/Colorado/drg/" target="_blank">http://libremap.org/data/state/Colorado/drg/</a><br>
<a href="http://libremap.org/data/state/Connecticut/drg/" target="_blank">http://libremap.org/data/state/Connecticut/drg/</a><br>
<a href="http://libremap.org/data/state/Pennsylvania/drg/" target="_blank">http://libremap.org/data/state/Pennsylvania/drg/</a><br>
<a href="http://libremap.org/data/state/South_Dakota/drg/" target="_blank">http://libremap.org/data/state/South_Dakota/drg/</a><br>
<br>
missing_topo_list<br>
33087c2<br>
34087b2<br>
33086b7<br>
34086c2<br>
<br>
<br></div></div>
------------------------------------------------------------------------<br>
<br>
_______________________________________________<br>
Tutor maillist - <a href="mailto:Tutor@python.org" target="_blank">Tutor@python.org</a><br>
To unsubscribe or change subscription options:<br>
<a href="http://mail.python.org/mailman/listinfo/tutor" target="_blank">http://mail.python.org/mailman/listinfo/tutor</a><br>
<br>
</blockquote>
Hope the comments above help in your endeavours.<br>
<br>
-- <br>
Kind Regards,<br><font color="#888888">
Christian Witts<br>
<br>
<br>
</font></blockquote></div><br>