joncle at googlemail.com
Thu Feb 24 11:04:22 EST 2011
On Feb 24, 2:11 am, monkeys paw <mon... at joemoney.net> wrote:
> if I have a string such as '<td>01/12/2011</td>' and i want
> to reformat it as '20110112', how do i pull out the components
> of the string and reformat them into a YYYYDDMM format?
> I have:
> import re
> test = re.compile('\d\d\/')
> f = open('test.html') # This file contains the html dates
> for line in f:
> if test.search(line):
> # I need to pull the date components here
I second using an html parser to extact the content of the TD's, but I
would also go one step further reformatting and do something such as:
>>> from time import strptime, strftime
>>> d = '01/12/2011'
>>> strftime('%Y%m%d', strptime(d, '%m/%d/%Y'))
That way you get some validation about the data, ie, if you get
'13/12/2011' you've probably got mixed data formats.
More information about the Python-list