[Mailman-Users] Pipermail URL handling in archives
Mark Sapiro
mark at msapiro.net
Sat Feb 23 01:37:24 CET 2008
Jim Popovitch wrote:
>On Fri, Feb 22, 2008 at 4:03 PM, Mark Sapiro <mark at msapiro.net> wrote:
>> You could try to find the line
>>
>> urlpat = re.compile(r'(\w+://[^>)\s]+)') # URLs in text
>>
>> near the beginning of Mailman/Archiver/HyperArch.py and change it to
>>
>> urlpat = re.compile(r'(\w+://[^>)\s]+?)\.?(\s|$)') # URLs in text
>
>Mark, that works well for the case I described. I did find something
>else similar that doesn't work:
>
> this is another url http://www.yahoo.com, and so is this
>http://www.google.com.
>
>Gets converted into:
> this is another url <A
>HREF="http://www.yahoo.com,">http://www.yahoo.com,</A>
> and so is this <A
>HREF="http://www.ibm.com">http://www.google.com</A>.
I assume that's a typo and 'ibm' should be 'google'.
>So, the problem seems to appear with commas too which makes me wonder
>if this can be resolved with this:
>
> urlpat = re.compile(r'(\w+://[^>)\s]+?)(\.|,)?(\s|$)') # URLs in text
>
>but then I got to thinking about any other punctuation make that
>follows a URL... and my mind started spinning :-)
I think the suggestion above - (\.|,)? would work for comma, but you
could do it other ways - e.g.
urlpat = re.compile(r'(\w+://[^>)\s]+?)[.,;]?(\s|$)') # URLs in text
to handle '.', ',' and ';', and you could extend that with more
characters, but you really need to be careful. Consider for example,
<http://www.example.com/some/page#anchor.> which could be a valid URL
ending in '.'.
--
Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
San Francisco Bay Area, California better use your sense - B. Dylan
More information about the Mailman-Users
mailing list