[Tutor] help with regular expressions
dyoo at hkn.eecs.berkeley.edu
Thu Feb 5 19:34:25 EST 2004
On Thu, 5 Feb 2004, Christopher Spears wrote:
> I'm trying to figure out regular expressions and am completely baffled!
> I understand the concept because there is something similar in UNIX, but
> for some reason, Python regular expressions don't make any sense to me!
> Are there some good tutorials that can help explain this subject to me?
Yes, there's a tutorial-style Regular Expression HOWTO by A.M. Kuchling:
Regular expressions allow us to define text patterns. For example, we can
define a pattern of a bunch of 'a's:
>>> import re
>>> pattern = re.compile('a+')
<_sre.SRE_Pattern object at 0x8126060>
'pattern' is a regular expression that can recognize all continuous
patterns of the letter 'a'. That is, if we give it a string with 'a's,
it'll recognize exactly where they are.
Let's see what it does on a simple example:
>>> pattern.findall('this is a test')
Here, it found the letter 'a'.
Let's try something else:
['aaa', 'a', 'a', 'aaaa', 'aa', 'aaaa']
And here, it found all 'a' sequences in that string.
Does this make sense so far? The pattern above is deliberately simple,
but regular expressions can get a little more complicated.
For example, here's a regular expression that tries to detect date strings
of the form '2/5/2004' (like date strings):
>>> date_regex = re.compile('[0-9]+/[0-9]+/[0-9]+')
>>> date_regex.findall("this is a test on 02/05/2004, right?")
The regular expression is trying to say "a bunch of digits, followed by a
a slash, followed by another bunch of digits, followed by a slash, and
then topped with another bunch of digits". Whew. *grin*
Caveat: the pattern above is too lenient for catching date strings. It
also catches stuff like 2005/2/5, or even things like:
>>> date_regex.findall("looky 1/2/3 or /4/5/6/")
So there's something of an art to writing good regular expressions that
are both general and specific.
If you have questions, please feel free to ask.
More information about the Tutor