[Tutor] Regex confusion
Jeffrey N. Shelton
Fri, 17 Dec 1999 10:52:29 -0500
Michael P. Reilly wrote:
> [Charset Windows-1252 unsupported, skipping...]
> > I'm looking at some Python code (from the DT_HTML.py module of Zope,
> > actually) that uses a regex expression that I can't figure out. Perhaps
> > could point me in the right direction?
> > name_match=regex.compile('[\0- ]*[a-zA-Z]+[\0- ]*').match
> > end_match=regex.compile('[\0- ]*\(/\|end\)',regex.casefold).match
> > start_search=regex.compile('[<&]').search
> > The phrase "[\0- ]" in the first two lines confuses me. Is this a group
> > reference? And if so, what is group 0? I thought the numbering for match
> > groups started at 1. Logically, it would seem that it this phrase is a
> > for grabbing up leading and trailing dashes and whitespace. But it's
> > me to figure out how "\0" figures into this.
> > Also, the ampersand (&) in the third line is a problem for me. I've
> > through my copy of "Mastering Regular Expressions" and can't find any
> > reference to a "&" metacharacter. What am I overlooking?
> Hi Jeffrey,
> The regular expression [...] is commonly called a character class, it
> matches any one character against the characters inside the brackets.
> [<&] - one of the two characters "<" or "&"
> [\0- ] - any character with ASCII value between 0 ('\0') and 32 (' ')
> If you include a caret (^) immediately after the left bracket ([), then
> matching is against characters not in the class.
Thanks for the help!
The "&" usage is now obvious in the light of day. (Doh!) As is the "\0- "
phrase, although I think I would have caught on a lot faster if it had been
"\0-\32" or, even better, "\000-\032".
Anyhow, I'm back on my feet again. Thanks!