[Tutor] Regex confusion

Michael P. Reilly arcege@shore.net
Fri, 17 Dec 1999 09:41:17 -0500 (EST)


[Charset Windows-1252 unsupported, skipping...]
> I'm looking at some Python code (from the DT_HTML.py module of Zope,
> actually) that uses a regex expression that I can't figure out. Perhaps you
> could point me in the right direction?
> 
>      name_match=regex.compile('[\0- ]*[a-zA-Z]+[\0- ]*').match
>      end_match=regex.compile('[\0- ]*\(/\|end\)',regex.casefold).match
>      start_search=regex.compile('[<&]').search
> 
> The phrase "[\0- ]" in the first two lines confuses me. Is this a group
> reference? And if so, what is group 0? I thought the numbering for match
> groups started at 1. Logically, it would seem that it this phrase is a means
> for grabbing up leading and trailing dashes and whitespace. But it's beyond
> me to figure out how "\0" figures into this.
> 
> Also, the ampersand (&) in the third line is a problem for me. I've looked
> through my copy of "Mastering Regular Expressions" and can't find any
> reference to a "&" metacharacter. What am I overlooking?

Hi Jeffrey,

The regular expression [...] is commonly called a character class, it
matches any one character against the characters inside the brackets.
  [<&]    - one of the two characters "<" or "&"
  [\0- ]  - any character with ASCII value between 0 ('\0') and 32 (' ')

If you include a caret (^) immediately after the left bracket ([), then
matching is against characters not in the class.

  -Arcege


-- 
------------------------------------------------------------------------
| Michael P. Reilly, Release Engineer | Email: arcege@shore.net        |
| Salem, Mass. USA  01970             |                                |
------------------------------------------------------------------------