pattern matching
Dr Vangel
470e8b8c35950 at poster.grepler.com
Thu Feb 24 01:26:21 EST 2011
>
>if I have a string such as '<td>01/12/2011</td>' and i want
>to reformat it as '20110112', how do i pull out the components
>of the string and reformat them into a YYYYDDMM format?
>
>I have:
>
>import re
>
>test = re.compile('dd/')
>f = open('test.html') # This file contains the html dates
>for line in f:
> if test.search(line):
> # I need to pull the date components here
I am no python guru but you could use beautifulsoup to parse html as its
much easier
some untested pseudocode below. adapt to your needs.
from BeautifulSoup import BeautifulSoup
#read html data or whatever source
html_data = open('/yourwebsite/page.html','r').read()
#Create the soup object from the HTML data
soup = new BeautifulSoup(html_data)
someData = soup.find('td',name='someTable')
#Find the proper tag see beautifulsoup docs
value = someData.attrs[2][1] # the value of 3rd attrib of the tag , just
an example
##end
now when you have the date in some str format the next thing is your date
conversion. For this
re fer to dateutil parse http://labix.org/python-dateutil
hope it help.
----------------------------
posted via Grepler.com -- poster is authenticated.
begin 644
end
More information about the Python-list
mailing list