pattern matching

Dr Vangel 470e8b8c35950 at
Thu Feb 24 07:26:21 CET 2011

>if I have a string such as '<td>01/12/2011</td>' and i want
>to reformat it as '20110112', how do i pull out the components
>of the string and reformat them into a YYYYDDMM format?
>I have:
>import re
>test = re.compile('dd/')
>f = open('test.html')  # This file contains the html dates
>for line in f:
>     if
>         # I need to pull the date components here

I am no python guru but you could use beautifulsoup to parse html as its 
much easier

some untested pseudocode below. adapt to your needs.

from BeautifulSoup import BeautifulSoup

#read html data or whatever source
html_data = open('/yourwebsite/page.html','r').read() 

#Create the soup object from the HTML data
soup = new BeautifulSoup(html_data)
someData = soup.find('td',name='someTable') 
#Find the proper tag see beautifulsoup docs
value = someData.attrs[2][1] # the value of 3rd attrib of the tag , just 
an example


now when you have the date in some str format the next thing is your date 
conversion. For this
re fer to dateutil parse

hope it help.

posted via -- poster is authenticated.
begin 644 

More information about the Python-list mailing list