[Tutor] re and more ... read on

Bob Gailer bgailer at alum.rpi.edu
Sun Aug 14 16:35:00 CEST 2005


At 06:17 PM 8/13/2005, Jesse Lands wrote:
>I am trying to create a simple GUI that will track a network connection 
>using ping.  It's a starter project for me so I am sure there are other 
>easier ways of doing it, but I am learning, so bare with me.

Are we taking our clothes off? Reminds me of the old Adventure (Colossal 
Cave) game where the player decides to kill the bear. The response was 
"What! your bare hands against his *bear* hands?"

>I figured out the first part
>
>import os
>ping = os.popen('ping -c 4 10.0.8.200')
>ping_result = ping.read()
>
>
>I assume I have to use Regular Expression to read the results, but I don't 
>quite understand it.  Can someone explain it or point me in a direction to 
>where it's explained better then what I could find with google.

I don't know what links you found, but here are a few:
http://www.amk.ca/python/howto/regex/
http://www.regular-expressions.info/quickstart.html - this looks like a 
nice easy start tutorial

I suggest you post the output from ping and tell us what part of that you 
want. Then we can help more.

I also suggest learning re as it is a most useful tool. Yesterday I used 
urllib2 to grab a web page with many links to pdf files on it. SInce my 
broswer insists on opening instead of downloading such files, I used re to 
extract the urls from the links, then read and wrote each file. Since I am 
delighted with this code I post it in full. Notice especially the re.findall().

import urllib2, re
x = 
urllib2.urlopen(r'http://www.cde.state.co.us/cdeassess/csap/as_filelayouts.htm')
p = x.readlines()
# get just the lines containing '.pdf' the "easy way"
pdfs = [l for l in p if '.pdf' in l]
# pdfs[0] = '      <td><a 
href="2005/Mathematics%203-10%20DST%20grt%202005.pdf">Grades 3-10 CSAP 
Mathematics \r\n'
linkText = '\n'.join(pdfs) # convert the list to a string for re

# here I use re to grab text between the 2 "s
pdfs2 = re.findall(r'"(.*)"', linkText)

# findall = repeatedly apply the re until end of string
# r = treat following string literal as "raw" so one does not need to escape \
# the pattern "(.*)"  " = find a "; ( = begin a group . = match any character ;
# * = repeat matching the .;
# ) = end the group; " = stop matching the . when you find a ";
# using () to group ensures that just the part of the string that matches 
.* will be kept
# result is a list of desired strings
# pdfs2[0] = '2005/Mathematics%203-10%20DST%20grt%202005.pdf'

# In retrospect I could also have used re.findall(r'<td><a 
href="(.*)\.pdf', '\n'.join(pdfs))
# on the entire page contents. I would have to reattach the ".pdf" later.

# obtain pdf files
for pdf in pdfs2:
   outfile = 'j:\\lssg\\' + pdf.replace('%20', '_')[5:] # drop the 2005/
   url = r'http://www.cde.state.co.us/cdeassess/csap/' + pdf
   try: # report and skip bad urls
     pn = urllib2.urlopen(url)
   except:
     print "Bad url:", url
     continue
   p = pn.read()
   z = file(outfile, 'wb')
   z.write(p)
   z.close()

BTW if you answered YES to the "What!" question you got "Congratulations - 
you just killed a bear."

Bob Gailer
303 442 2625 home
720 938 2625 cell 



More information about the Tutor mailing list