[Moin-devel] Re: header_re patch for Page.getPageText() (Thomas Werschlein)

Wed May 4 23:38:27 EDT 2005

> I noticed, that Page.getPageText() does not limit header searches for
> pragmas and comments to the beginning of the page. Therefore, if
> a user, e.g. enters Java comments into a verbatim section such
>     as
>     {{{
>     # java comment
>     }}}
>
> AND there are no pragmas/comments before this java comment hash
> the body will start only after the java comment.
>
> Removing the re.MULITLINE flag when compiling the regexp did solve the
> problem for me.
>
> diff -u -r1.1 -r1.2
> --- Page.py     12 Apr 2005 15:00:23 -0000      1.1
> +++ Page.py     4 May 2005 11:32:00 -0000       1.2
> @@ -1389,7 +1389,7 @@
>           # Lazy compile regex on first use. All instances share the
>           # same regex, compiled once when the first call in an 
> instance is done.
>           if isinstance(self.__class__.header_re, (str, unicode)):
> -            self.__class__.header_re = 
> re.compile(self.__class__.header_re, re.MULTILINE | re.UNICODE)
> +            self.__class__.header_re = 
> re.compile(self.__class__.header_re, re.UNICODE)
>
>           body = self.get_raw_body() or ''
>           header = self.header_re.search(body)

The search is done in the raw body of the page. The header_re is used 
to show the first interesting search result, which the current search 
define as the first result on the page body.

I'm not sure that removing the re.M does not break that re for other 
cases. Anyway, I suspect that re is wrong, and have another well tested 
re used by section parser.

Best Regards,

Nir Soffer