[Tutor] Re: Need help with multi-line regex identification

Jorge Godoy godoy at ieee.org
Wed Apr 21 07:13:02 EDT 2004


On Qua 21 Abr 2004 03:54, Tony  Cappellini wrote:

> Could someone help point me in the right direction for this ?

In Perl we used to use multiline/extended match for that. 

Either we made the line terminator become irrelevant and handled everything
as if it was in one line (something like that, but not exactly that) or we
used some extensions on grep (you can read Perl's documentation on regular
expressions for that: perldoc perlre). It became something along the lines
of what's in this part of the docs:

---------------------------------------------------------------------
       m   Treat string as multiple lines.  That is, change "^" and "$" from 
           matching the start or end of the string to matching the start or 
           end of any line anywhere within the string.

       s   Treat string as single line.  That is, change "." to match any 
           character whatsoever, even a newline, which normally it would not 
           match.

           The "/s" and "/m" modifiers both override the $* setting.  That 
           is, no matter what $* contains, "/s" without "/m" will force "^" 
           to match only at the beginning of the string and "$" to match 
           only at the end (or just before a newline at the end) of the 
           string.  Together, as /ms, they let the "." match any character 
           whatsoever, while still allowing "^" and "$" to match, 
           respectively, just after and just before newlines within the
           string.
---------------------------------------------------------------------

I mean the regexp became "/<something>/ms".

Even there, where regexps are highly recommended for lots of things we find
problems with such things and there's a "more recommended" approach: using
a lexical parser.

So, this is what I'm questioning you: wouldn't it be a lot easier for you to
change the language, expand it and also parse it if you had a parser for
it? 

On a short search through Google ("python lexical parser") I found this:
http://christophe.delord.free.fr/en/tpg/

There's also the parser-sig (whose page links to
http://www.python.org/topics/parsing.html) where you can get other options
if the above doesn't satisfy your needs. 

For using Plex, there's even an example where the author is handling
"comments" on code. It might interest you better. The docs are at
http://www.cosc.canterbury.ac.nz/~greg/python/Plex/version/doc/index.html
and the page where the parser-sig points to is at
http://www.cosc.canterbury.ac.nz/~greg/python/Plex/


Take a look at the other ones too... And use a parser. It will be better and
easier, IMHO.


Be seeing you,
-- 
Godoy.      <godoy at ieee.org>




More information about the Tutor mailing list