Regular expression fun. Repeated matching of a group Q
Larry Bates
larry.bates at websafe.com
Fri Feb 24 18:49:48 EST 2006
matteosartori at gmail.com wrote:
> Hi all,
>
> I've spent all morning trying to work this one out:
>
> I've got the following string:
>
> <td>04/01/2006</td><td>Wednesday</td><td> </td><td>09:14</td><td>12:44</td><td>12:50</td><td>17:58</td><td> </td><td> </td><td> </td><td> </td><td>08:14</td>
>
> from which I'm attempting to extract the date, and the five times from
> into a list. Only the very last time is guaranteed to be there so it
> should also work for a line like:
>
> <td>03/01/2006</td><td>Tuesday</td><td>Annual_Holiday</td><td> </td><td> </td><td> </td><td> </td><td> </td><td> </td><td> </td><td> </td><td>08:00</td>
>
> My Python regular expression to match that is currently:
>
> digs = re.compile(
> r'<td>(\d{2}\/\d{2}\/\d{4})</td>.*?(?:<td>(\d+\:\d+)</td>).*$' )
>
> which first extracts the date into group 1
> then matches the tags between the date and the first instance of a time
> into group 2
> then matches the first instance of a time into group 3
> but then group 4 grabs all the remaining string.
>
> I've tried changing the time pattern into
>
> (?:<td>(\d+\:\d+)</td>)+
>
> but that doesn't seem to mean "grab one or more cases of the previous
> regexp."
>
> Any Python regexp gurus with a hint would be greatly appreciated.
>
> M@
>
This works:
import BeautifulSoup
test = '<td>04/01/2006</td>' \
'<td>Wednesday</td>' \
'<td> </td>' \
'<td>09:14</td>' \
'<td>12:44</td>' \
'<td>12:50</td>' \
'<td>17:58</td>' \
'<td> </td>' \
'<td> </td>' \
'<td> </td>' \
'<td> </td>' \
'<td>08:14</td>'
c=BeautifulSoup.BeautifulSoup(test)
times=[]
for i in c.childGenerator():
if i.contents[0] == " ": continue
times.append(i.contents[0])
date=times.pop(0)
day=times.pop(0)
print "date=", date
print "day=", day
print "times=", times
-Larry Bates
More information about the Python-list
mailing list