a n00b regex qestion
Tim Chase
python.list at tim.thechases.com
Mon Dec 3 16:39:13 EST 2007
> I tried these this:
>
> string = string.replace('<tr>\s*<th class="table">Field One</th>\s*<td>
> %FieldOneValue%</td>\s*</tr>', '')
>
>
> But this doesn't work. The doco for Python's regex suggests that \s
> should match any whitespace including newlines which is what I
> wanted,
from http://docs.python.org/lib/module-re.html
"""
Regular expressions use the backslash character ("\") to indicate
special forms or to allow special characters to be used without
invoking their special meaning. This collides with Python's usage
of the same character for the same purpose in string literals;
for example, to match a literal backslash, one might have to
write '\\\\' as the pattern string, because the regular
expression must be "\\", and each backslash must be expressed as
"\\" inside a regular Python string literal.
The solution is to use Python's raw string notation for regular
expression patterns; backslashes are not handled in any special
way in a string literal prefixed with "r". So r"\n" is a
two-character string containing "\" and "n", while "\n" is a
one-character string containing a newline. Usually patterns will
be expressed in Python code using this raw string notation.
"""
and from http://docs.python.org/lib/re-syntax.html
"""
If you're not using a raw string to express the pattern, remember
that Python also uses the backslash as an escape sequence in
string literals; if the escape sequence isn't recognized by
Python's parser, the backslash and subsequent character are
included in the resulting string. However, if Python would
recognize the resulting sequence, the backslash should be
repeated twice. This is complicated and hard to understand, so
it's highly recommended that you use raw strings for all but the
simplest expressions.
"""
And if you don't know about raw strings, you can read about them
here:
http://docs.python.org/ref/strings.html
-tkc
More information about the Python-list
mailing list