[Tutor] Re.findall()

Andreas Perstinger andipersti at gmail.com
Thu Apr 12 20:51:59 CEST 2012


On Thu, 12 Apr 2012 09:06:53 -0700 
Michael Lewis <mjolewis at gmail.com> wrote:

> Here's the "pattern" portion that I don't understand:
> 
> re.findall("[^A-Z]+[A-Z]{3}([a-z])[A-Z]{3}[^A-Z]+"
> 

You have 5 different parts here:
1) [^A-Z]+ - this matches one or more non-uppercase characters.
The brackets [] describe a set of wanted characters. A-Z would match
any uppercase character, but the caret ^ at the first position inside
the brackets means to inverse the set (i.e., match any character
not in the set). + means to match at least one of the character(s)
described before.
2) [A-Z]{3} - this matches exactly three uppercase characters.
With the braces {} you can define how many characters should match: {3}
matches exactly 3, {3,} matches at least 3, {,3} matches up to three
and {3,6} matches 3 to 6.
3) ([a-z]) - this matches exactly one lowercase character.
The parens () are used to save the character for later use. (using the
group()/groups()-methods, see the docs).
4) [A-Z]{3} - again matches exactly three uppercase characters.
5) [^A-Z]+ - again matches at least one non-uppercase character.

HTH, Andreas


More information about the Tutor mailing list