pattern matching

Jon Clements joncle at
Thu Feb 24 11:04:22 EST 2011

On Feb 24, 2:11 am, monkeys paw <mon... at> wrote:
> if I have a string such as '<td>01/12/2011</td>' and i want
> to reformat it as '20110112', how do i pull out the components
> of the string and reformat them into a YYYYDDMM format?
> I have:
> import re
> test = re.compile('\d\d\/')
> f = open('test.html')  # This file contains the html dates
> for line in f:
>      if
>          # I need to pull the date components here

I second using an html parser to extact the content of the TD's, but I
would also go one step further reformatting and do something such as:

>>> from time import strptime, strftime
>>> d = '01/12/2011'
>>> strftime('%Y%m%d', strptime(d, '%m/%d/%Y'))

That way you get some validation about the data, ie, if you get
'13/12/2011' you've probably got mixed data formats.



More information about the Python-list mailing list