[Tutor] re.findall parentheses problem
Michael Scharf
mnshtb at gmail.com
Tue Sep 14 18:41:09 CEST 2010
Hi all,
I have a regex that matches dates in various formats. I've tested the regex
in a reliable testbed, and it seems to match what I want (dates in formats
like "1 Jan 2010" and "January 1, 2010" and also "January 2008"). It's just
that using re.findall with it is giving me weird output. I'm using Python
2.6.5 here, and I've put in line breaks for clarity's sake:
>>> import re
>>> date_regex =
re.compile(r"([0-3]?[0-9])?((\s*)|(\t*))((Jan\.?u?a?r?y?)|(Feb\.?r?u?a?r?y?)|(Mar\.?c?h?)|(Apr\.?i?l?)|(May)|(Jun[e.]?)|(Jul[y.]?)|(Aug\.?u?s?t?)|(Sep[t.]?\.?e?m?b?e?r?)|(Oct\.?o?b?e?r?)|(Nov\.?e?m?b?e?r?)|(Dec\.?e?m?b?e?r?))((\s*)|(\t*))(2?0?[0-3]?[0-9]\,?)?((\s*)|(\t*))(2?0?[01][0-9])")
>>> test_output = re.findall(date_regex, 'January 1, 2008')
>>> print test_output
[('', '', '', '', 'January', 'January', '', '', '', '', '', '', '', '', '',
'', '', ' ', ' ', '', '20', '', '', '', '08')]
>>> test_output = re.findall(date_regex, 'January 1, 2008')
>>> print test_output
[('', '', '', '', 'January', 'January', '', '', '', '', '', '', '', '', '',
'', '', ' ', ' ', '', '1,', ' ', ' ', '', '2008')]
>>> test_output = re.findall(date_regex, "The date was January 1, 2008. But
it was not January 2, 2008.")
>>> print test_output
[('', ' ', ' ', '', 'January', 'January', '', '', '', '', '', '', '', '',
'', '', '', ' ', ' ', '', '1,', ' ', ' ', '', '2008'), ('', ' ', ' ', '',
'January', 'January', '', '', '', '', '', '', '', '', '', '', '', ' ', ' ',
'', '2,', ' ', ' ', '', '2008')]
A friend says: " I think that the problem is that every time that you have a
parenthesis you get an output. Maybe there is a way to suppress this."
My friend's explanation speaks to the empties, but maybe not to the two
Januaries. Either way, what I want is for re.finall, or some other re
method that perhaps I haven't properly explored, to return the matches and
just the matches.
I've read the documentation, googled various permutations etc, and I can't
figure it out. Any help much appreciated.
Thanks,
Mike
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20100914/427eb084/attachment-0001.html>
More information about the Tutor
mailing list