[Tutor] Can't loop thru file and don't see the problem

Fri Dec 4 01:01:28 CET 2009

Thank you very much!

I had forgotten that unix URLs are case sensitive.

Also, I changed my 'For' statements to your suggestion, tweaked the
exception code a little, and it's working.

So, there are obviously several ways to open files. Do you have a standard
practice, or does it depend on the file format?

I will eventually be working with Excel and possibly mssql tables.

Thanks again for your help.

Roy

On Thu, Dec 3, 2009 at 3:46 AM, Christian Witts <cwitts at compuscan.co.za>wrote:

> Roy Hinkelman wrote:
>
>>
>> Your list is great. I've been lurking for the past two weeks while I
>> learned the basics. Thanks.
>>
>> I am trying to loop thru 2 files and scrape some data, and the loops are
>> not working.
>>
>> The script is not getting past the first URL from state_list, as the test
>> print shows.
>>
>> If someone could point me in the right direction, I'd appreciate it.
>>
>> I would also like to know the difference between open() and csv.reader().
>> I had similar issues with csv.reader() when opening these files.
>>
>> Any help greatly appreciated.
>>
>> Roy
>>
>> Code: Select all
>>    # DOWNLOAD USGS MISSING FILES
>>
>>    import mechanize
>>    import BeautifulSoup as B_S
>>    import re
>>    # import urllib
>>    import csv
>>
>>    # OPEN FILES
>>    # LOOKING FOR THESE SKUs
>>    _missing = open('C:\\Documents and
>> Settings\\rhinkelman\\Desktop\\working DB files\\missing_topo_list.csv',
>> 'r')
>>    # IN THESE STATES
>>    _states = open('C:\\Documents and
>> Settings\\rhinkelman\\Desktop\\working DB files\\state_list.csv', 'r')
>>    # IF NOT FOUND, LIST THEM HERE
>>    _missing_files = []
>>    # APPEND THIS FILE WITH META
>>    _topo_meta = open('C:\\Documents and
>> Settings\\rhinkelman\\Desktop\\working DB files\\topo_meta.csv', 'a')
>>
>>    # OPEN PAGE
>>    for each_state in _states:
>>        each_state = each_state.replace("\n", "")
>>        print each_state
>>        html = mechanize.urlopen(each_state)
>>        _soup = B_S.BeautifulSoup(html)
>>              # SEARCH THRU PAGE AND FIND ROW CONTAINING META MATCHING SKU
>>        _table = _soup.find("table", "tabledata")
>>        print _table #test This is returning 'None'
>>
>>  If you take a look at the webpage you open up, you will notice there are
> no tables.  Are you certain you are using the correct URLs for this ?
>
>         for each_sku in _missing:
>>
> The for loop `for each_sku in _missing:` will only iterate once, you can
> either pre-read it into a list / dictionary / set (whichever you prefer) or
> change it to
> _missing_filename = 'C:\\Documents and
> Settings\\rhinkelman\\Desktop\\working DB files\\missing_topo_list.csv'
> for each_sku in open(_missing_filename):
>   # carry on here
>
>>            each_sku = each_sku.replace("\n","")
>>            print each_sku #test
>>            try:
>>                _row = _table.find('tr', text=re.compile(each_sku))
>>            except (IOError, AttributeError):
>>                _missing_files.append(each_sku)
>>                continue
>>            else:
>>                _row = _row.previous
>>                _row = _row.parent
>>                _fields = _row.findAll('td')
>>                _name = _fields[1].string
>>                _state = _fields[2].string
>>                _lat = _fields[4].string
>>                _long = _fields[5].string
>>                _sku = _fields[7].string
>>
>>                _topo_meta.write(_name + "|" + _state + "|" + _lat + "|" +
>> _long + "|" + _sku + "||")
>>                      print x +': ' + _name
>>
>>    print "Missing Files:"
>>    print _missing_files
>>    _topo_meta.close()
>>    _missing.close()
>>    _states.close()
>>
>>
>> The message I am getting is:
>>
>> Code:
>>    >>>
>>    http://libremap.org/data/state/Colorado/drg/
>>    None
>>    33087c2
>>    Traceback (most recent call last):
>>      File "//Dc1/Data/SharedDocs/Roy/_Coding Vault/Python code
>> samples/usgs_missing_file_META.py", line 34, in <module>
>>        _row = _table.find('tr', text=re.compile(each_sku))
>>    AttributeError: 'NoneType' object has no attribute 'find'
>>
>>
>> And the files look like:
>>
>> Code:
>>    state_list
>>    http://libremap.org/data/state/Colorado/drg/
>>    http://libremap.org/data/state/Connecticut/drg/
>>    http://libremap.org/data/state/Pennsylvania/drg/
>>    http://libremap.org/data/state/South_Dakota/drg/
>>
>>    missing_topo_list
>>    33087c2
>>    34087b2
>>    33086b7
>>    34086c2
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Tutor maillist  -  Tutor at python.org
>> To unsubscribe or change subscription options:
>> http://mail.python.org/mailman/listinfo/tutor
>>
>>
> Hope the comments above help in your endeavours.
>
> --
> Kind Regards,
> Christian Witts
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20091203/c985891e/attachment.htm>