# [Tutor] help with regular expressions

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Thu Feb 5 19:34:25 EST 2004

On Thu, 5 Feb 2004, Christopher Spears wrote:

> I'm trying to figure out regular expressions and am completely baffled!
> I understand the concept because there is something similar in UNIX, but
> for some reason, Python regular expressions don't make any sense to me!
> Are there some good tutorials that can help explain this subject to me?

Hi Chris,

Yes, there's a tutorial-style Regular Expression HOWTO by A.M. Kuchling:

http://www.amk.ca/python/howto/regex/

Regular expressions allow us to define text patterns.  For example, we can
define a pattern of a bunch of 'a's:

###
>>> import re
>>> pattern = re.compile('a+')
>>> pattern
<_sre.SRE_Pattern object at 0x8126060>
###

'pattern' is a regular expression that can recognize all continuous
patterns of the letter 'a'.  That is, if we give it a string with 'a's,
it'll recognize exactly where they are.

Let's see what it does on a simple example:

###
>>> pattern.findall('this is a test')
['a']
###

Here, it found the letter 'a'.

Let's try something else:

###
['aaa', 'a', 'a', 'aaaa', 'aa', 'aaaa']
###

And here, it found all 'a' sequences in that string.

Does this make sense so far?  The pattern above is deliberately simple,
but regular expressions can get a little more complicated.

For example, here's a regular expression that tries to detect date strings
of the form '2/5/2004' (like date strings):

###
>>> date_regex = re.compile('[0-9]+/[0-9]+/[0-9]+')
>>> date_regex.findall("this is a test on 02/05/2004, right?")
['02/05/2004']
###

The regular expression is trying to say "a bunch of digits, followed by a
a slash, followed by another bunch of digits, followed by a slash, and
then topped with another bunch of digits".  Whew.  *grin*

Caveat: the pattern above is too lenient for catching date strings. It
also catches stuff like 2005/2/5, or even things like:

###
>>> date_regex.findall("looky 1/2/3 or /4/5/6/")
['1/2/3', '4/5/6']
###

So there's something of an art to writing good regular expressions that
are both general and specific.