Thank you very much!<br><br>I had forgotten that unix URLs are case sensitive. <br><br>Also, I changed my &#39;For&#39; statements to your suggestion, tweaked the exception code a little, and it&#39;s working.<br><br>So, there are obviously several ways to open files. Do you have a standard practice, or does it depend on the file format? <br>

<br>I will eventually be working with Excel and possibly mssql tables. <br><br>Thanks again for your help.<br><br>Roy<br><br><br><br><div class="gmail_quote">On Thu, Dec 3, 2009 at 3:46 AM, Christian Witts <span dir="ltr">&lt;<a href="mailto:cwitts@compuscan.co.za">cwitts@compuscan.co.za</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div><div></div><div class="h5">Roy Hinkelman wrote:<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<br>

Your list is great. I&#39;ve been lurking for the past two weeks while I learned the basics. Thanks.<br>

<br>

I am trying to loop thru 2 files and scrape some data, and the loops are not working.<br>

<br>

The script is not getting past the first URL from state_list, as the test print shows.<br>

<br>

If someone could point me in the right direction, I&#39;d appreciate it.<br>

<br>

I would also like to know the difference between open() and csv.reader(). I had similar issues with csv.reader() when opening these files.<br>

<br>

Any help greatly appreciated.<br>

<br>

Roy<br>

<br>

Code: Select all<br>

    # DOWNLOAD USGS MISSING FILES<br>

<br>

    import mechanize<br>

    import BeautifulSoup as B_S<br>

    import re<br>

    # import urllib<br>

    import csv<br>

<br>

    # OPEN FILES<br>

    # LOOKING FOR THESE SKUs<br>

    _missing = open(&#39;C:\\Documents and Settings\\rhinkelman\\Desktop\\working DB files\\missing_topo_list.csv&#39;, &#39;r&#39;)<br>

    # IN THESE STATES<br>

    _states = open(&#39;C:\\Documents and Settings\\rhinkelman\\Desktop\\working DB files\\state_list.csv&#39;, &#39;r&#39;)<br>

    # IF NOT FOUND, LIST THEM HERE<br>

    _missing_files = []<br>

    # APPEND THIS FILE WITH META<br>

    _topo_meta = open(&#39;C:\\Documents and Settings\\rhinkelman\\Desktop\\working DB files\\topo_meta.csv&#39;, &#39;a&#39;)<br>

<br>

    # OPEN PAGE<br>

    for each_state in _states:<br>

        each_state = each_state.replace(&quot;\n&quot;, &quot;&quot;)<br>

        print each_state<br>

        html = mechanize.urlopen(each_state)<br>

        _soup = B_S.BeautifulSoup(html)<br>

              # SEARCH THRU PAGE AND FIND ROW CONTAINING META MATCHING SKU<br>

        _table = _soup.find(&quot;table&quot;, &quot;tabledata&quot;)<br>

        print _table #test This is returning &#39;None&#39;<br>

<br>

</blockquote></div></div>

If you take a look at the webpage you open up, you will notice there are no tables.  Are you certain you are using the correct URLs for this ?<div class="im"><br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

        for each_sku in _missing:<br>

</blockquote></div>

The for loop `for each_sku in _missing:` will only iterate once, you can either pre-read it into a list / dictionary / set (whichever you prefer) or change it to<br>

_missing_filename = &#39;C:\\Documents and Settings\\rhinkelman\\Desktop\\working DB files\\missing_topo_list.csv&#39;<br>

for each_sku in open(_missing_filename):<br>

   # carry on here<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div><div></div><div class="h5">

            each_sku = each_sku.replace(&quot;\n&quot;,&quot;&quot;)<br>

            print each_sku #test<br>

            try:<br>

                _row = _table.find(&#39;tr&#39;, text=re.compile(each_sku))<br>

            except (IOError, AttributeError):<br>

                _missing_files.append(each_sku)<br>

                continue<br>

            else:<br>

                _row = _row.previous<br>

                _row = _row.parent<br>

                _fields = _row.findAll(&#39;td&#39;)<br>

                _name = _fields[1].string<br>

                _state = _fields[2].string<br>

                _lat = _fields[4].string<br>

                _long = _fields[5].string<br>

                _sku = _fields[7].string<br>

<br>

                _topo_meta.write(_name + &quot;|&quot; + _state + &quot;|&quot; + _lat + &quot;|&quot; + _long + &quot;|&quot; + _sku + &quot;||&quot;)<br>

                      print x +&#39;: &#39; + _name<br>

<br>

    print &quot;Missing Files:&quot;<br>

    print _missing_files<br>

    _topo_meta.close()<br>

    _missing.close()<br>

    _states.close()<br>

<br>

<br>

The message I am getting is:<br>

<br>

Code:<br>

    &gt;&gt;&gt;<br>

    <a href="http://libremap.org/data/state/Colorado/drg/" target="_blank">http://libremap.org/data/state/Colorado/drg/</a><br>

    None<br>

    33087c2<br>

    Traceback (most recent call last):<br>

      File &quot;//Dc1/Data/SharedDocs/Roy/_Coding Vault/Python code samples/usgs_missing_file_META.py&quot;, line 34, in &lt;module&gt;<br>

        _row = _table.find(&#39;tr&#39;, text=re.compile(each_sku))<br>

    AttributeError: &#39;NoneType&#39; object has no attribute &#39;find&#39;<br>

<br>

<br>

And the files look like:<br>

<br>

Code:<br>

    state_list<br>

    <a href="http://libremap.org/data/state/Colorado/drg/" target="_blank">http://libremap.org/data/state/Colorado/drg/</a><br>

    <a href="http://libremap.org/data/state/Connecticut/drg/" target="_blank">http://libremap.org/data/state/Connecticut/drg/</a><br>

    <a href="http://libremap.org/data/state/Pennsylvania/drg/" target="_blank">http://libremap.org/data/state/Pennsylvania/drg/</a><br>

    <a href="http://libremap.org/data/state/South_Dakota/drg/" target="_blank">http://libremap.org/data/state/South_Dakota/drg/</a><br>

<br>

    missing_topo_list<br>

    33087c2<br>

    34087b2<br>

    33086b7<br>

    34086c2<br>

<br>

<br></div></div>

------------------------------------------------------------------------<br>

<br>

_______________________________________________<br>

Tutor maillist  -  <a href="mailto:Tutor@python.org" target="_blank">Tutor@python.org</a><br>

To unsubscribe or change subscription options:<br>

<a href="http://mail.python.org/mailman/listinfo/tutor" target="_blank">http://mail.python.org/mailman/listinfo/tutor</a><br>

  <br>

</blockquote>

Hope the comments above help in your endeavours.<br>

<br>

-- <br>

Kind Regards,<br><font color="#888888">

Christian Witts<br>

<br>

<br>

</font></blockquote></div><br>