[Tutor] Can't loop thru file and don't see the problem

Thu Dec 3 12:46:43 CET 2009

Roy Hinkelman wrote:
>
> Your list is great. I've been lurking for the past two weeks while I 
> learned the basics. Thanks.
>
> I am trying to loop thru 2 files and scrape some data, and the loops 
> are not working.
>
> The script is not getting past the first URL from state_list, as the 
> test print shows.
>
> If someone could point me in the right direction, I'd appreciate it.
>
> I would also like to know the difference between open() and 
> csv.reader(). I had similar issues with csv.reader() when opening 
> these files.
>
> Any help greatly appreciated.
>
> Roy
>
> Code: Select all
>     # DOWNLOAD USGS MISSING FILES
>
>     import mechanize
>     import BeautifulSoup as B_S
>     import re
>     # import urllib
>     import csv
>
>     # OPEN FILES
>     # LOOKING FOR THESE SKUs
>     _missing = open('C:\\Documents and 
> Settings\\rhinkelman\\Desktop\\working DB 
> files\\missing_topo_list.csv', 'r')
>     # IN THESE STATES
>     _states = open('C:\\Documents and 
> Settings\\rhinkelman\\Desktop\\working DB files\\state_list.csv', 'r')
>     # IF NOT FOUND, LIST THEM HERE
>     _missing_files = []
>     # APPEND THIS FILE WITH META
>     _topo_meta = open('C:\\Documents and 
> Settings\\rhinkelman\\Desktop\\working DB files\\topo_meta.csv', 'a')
>
>     # OPEN PAGE
>     for each_state in _states:
>         each_state = each_state.replace("\n", "")
>         print each_state
>         html = mechanize.urlopen(each_state)
>         _soup = B_S.BeautifulSoup(html)
>       
>         # SEARCH THRU PAGE AND FIND ROW CONTAINING META MATCHING SKU
>         _table = _soup.find("table", "tabledata")
>         print _table #test This is returning 'None'
>
If you take a look at the webpage you open up, you will notice there are 
no tables.  Are you certain you are using the correct URLs for this ?
>         for each_sku in _missing:
The for loop `for each_sku in _missing:` will only iterate once, you can 
either pre-read it into a list / dictionary / set (whichever you prefer) 
or change it to
_missing_filename = 'C:\\Documents and 
Settings\\rhinkelman\\Desktop\\working DB files\\missing_topo_list.csv'
for each_sku in open(_missing_filename):
    # carry on here
>             each_sku = each_sku.replace("\n","")
>             print each_sku #test
>             try:
>                 _row = _table.find('tr', text=re.compile(each_sku))
>             except (IOError, AttributeError):
>                 _missing_files.append(each_sku)
>                 continue
>             else:
>                 _row = _row.previous
>                 _row = _row.parent
>                 _fields = _row.findAll('td')
>                 _name = _fields[1].string
>                 _state = _fields[2].string
>                 _lat = _fields[4].string
>                 _long = _fields[5].string
>                 _sku = _fields[7].string
>
>                 _topo_meta.write(_name + "|" + _state + "|" + _lat + 
> "|" + _long + "|" + _sku + "||")
>           
>             print x +': ' + _name
>
>     print "Missing Files:"
>     print _missing_files
>     _topo_meta.close()
>     _missing.close()
>     _states.close()
>
>
> The message I am getting is:
>
> Code:
>     >>>
>     http://libremap.org/data/state/Colorado/drg/
>     None
>     33087c2
>     Traceback (most recent call last):
>       File "//Dc1/Data/SharedDocs/Roy/_Coding Vault/Python code 
> samples/usgs_missing_file_META.py", line 34, in <module>
>         _row = _table.find('tr', text=re.compile(each_sku))
>     AttributeError: 'NoneType' object has no attribute 'find'
>
>
> And the files look like:
>
> Code:
>     state_list
>     http://libremap.org/data/state/Colorado/drg/
>     http://libremap.org/data/state/Connecticut/drg/
>     http://libremap.org/data/state/Pennsylvania/drg/
>     http://libremap.org/data/state/South_Dakota/drg/
>
>     missing_topo_list
>     33087c2
>     34087b2
>     33086b7
>     34086c2
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>   
Hope the comments above help in your endeavours.

-- 
Kind Regards,
Christian Witts