regex walktrough

Hans Mulder hansmu at xs4all.nl
Sat Dec 8 20:33:37 CET 2012


On 8/12/12 18:48:13, rh wrote:
>  Look through some code I found this and wondered about what it does:
> ^(?P<salsipuedes>[0-9A-Za-z-_.//]+)$
> 
> Here's my walk through:
> 
> 1) ^ match at start of string
> 2) ?P<salsipuedes> if a match is found it will be accessible in a
> variable salsipuedes

I wouldn't call it a variable.  If m is a match-object produced
by this regex, then m.group('salsipuedes') will return the part
that was captured.

I'm not sure, though, why you'd want to define a group that
effectively spans the whole regex.  If there's a match, then
m.group(0) will return the matching substring, and
m.group('salsipuedes') will return the substring that matched
the parenthesized part of the pattern and these two substrings
will be equal, since the only bits of the pattern outside the
parenthesis are zero-width assertions.

> 3) [0-9A-Za-z-_.//] this is the one that looks wrong to me, see below
> 4) + one or more from the preceeding char class
> 5) () the grouping we want returned (see #2)
> 6) $ end of the string to match against but before any newline
> 
> more on #3
> the z-_ part looks wrong and seems that the - should be at the start
> of the char set otherwise we get another range z-_ or does the a-z
> preceeding the z-_ negate the z-_ from becoming a range?

The latter: a-z is a range and block the z-_ from being a range.
Consequently, the -_ bit matches only - and _.

> The "." might be ok inside a char set.

It is.  Most special characters lose their special meaning
inside a char set.

> The two slashes look wrong but maybe it has some special meaning
> in some case? I think only one slash is needed.

You're correct: there's no special meaning and only one slash
is needed.  But then, a char set is a set and duplcates are
simply ignored, so it does no harm.

Perhaps the person who wrote this was confusing slashes and
backslashes.

> I've looked at pydoc re, but it's cursory.

That's one way of putting it.


Hope this helps,

-- HansM





More information about the Python-list mailing list