[Tutor] Can't loop thru file and don't see the problem

Thu Dec 3 09:09:05 CET 2009

Your list is great. I've been lurking for the past two weeks while I learned
the basics. Thanks.

I am trying to loop thru 2 files and scrape some data, and the loops are not
working.

The script is not getting past the first URL from state_list, as the test
print shows.

If someone could point me in the right direction, I'd appreciate it.

I would also like to know the difference between open() and csv.reader(). I
had similar issues with csv.reader() when opening these files.

Any help greatly appreciated.

Roy

Code: Select all
    # DOWNLOAD USGS MISSING FILES

    import mechanize
    import BeautifulSoup as B_S
    import re
    # import urllib
    import csv

    # OPEN FILES
    # LOOKING FOR THESE SKUs
    _missing = open('C:\\Documents and
Settings\\rhinkelman\\Desktop\\working DB files\\missing_topo_list.csv',
'r')
    # IN THESE STATES
    _states = open('C:\\Documents and Settings\\rhinkelman\\Desktop\\working
DB files\\state_list.csv', 'r')
    # IF NOT FOUND, LIST THEM HERE
    _missing_files = []
    # APPEND THIS FILE WITH META
    _topo_meta = open('C:\\Documents and
Settings\\rhinkelman\\Desktop\\working DB files\\topo_meta.csv', 'a')

    # OPEN PAGE
    for each_state in _states:
        each_state = each_state.replace("\n", "")
        print each_state
        html = mechanize.urlopen(each_state)
        _soup = B_S.BeautifulSoup(html)

        # SEARCH THRU PAGE AND FIND ROW CONTAINING META MATCHING SKU
        _table = _soup.find("table", "tabledata")
        print _table #test This is returning 'None'

        for each_sku in _missing:
            each_sku = each_sku.replace("\n","")
            print each_sku #test
            try:
                _row = _table.find('tr', text=re.compile(each_sku))
            except (IOError, AttributeError):
                _missing_files.append(each_sku)
                continue
            else:
                _row = _row.previous
                _row = _row.parent
                _fields = _row.findAll('td')
                _name = _fields[1].string
                _state = _fields[2].string
                _lat = _fields[4].string
                _long = _fields[5].string
                _sku = _fields[7].string

                _topo_meta.write(_name + "|" + _state + "|" + _lat + "|" +
_long + "|" + _sku + "||")

            print x +': ' + _name

    print "Missing Files:"
    print _missing_files
    _topo_meta.close()
    _missing.close()
    _states.close()

The message I am getting is:

Code:
    >>>
    http://libremap.org/data/state/Colorado/drg/
    None
    33087c2
    Traceback (most recent call last):
      File "//Dc1/Data/SharedDocs/Roy/_Coding Vault/Python code
samples/usgs_missing_file_META.py", line 34, in <module>
        _row = _table.find('tr', text=re.compile(each_sku))
    AttributeError: 'NoneType' object has no attribute 'find'

And the files look like:

Code:
    state_list
    http://libremap.org/data/state/Colorado/drg/
    http://libremap.org/data/state/Connecticut/drg/
    http://libremap.org/data/state/Pennsylvania/drg/
    http://libremap.org/data/state/South_Dakota/drg/

    missing_topo_list
    33087c2
    34087b2
    33086b7
    34086c2
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20091203/cad756b6/attachment.htm>