[Mailman-Users] Pipermail URL handling in archives

Sat Feb 23 01:57:40 CET 2008

On Fri, Feb 22, 2008 at 7:37 PM, Mark Sapiro <mark at msapiro.net> wrote:
>  >Gets converted into:
>  >   this is another url <A
>  >HREF="http://www.yahoo.com,">http://www.yahoo.com,</A>
>  >            and so is this <A
>  >HREF="http://www.ibm.com">http://www.google.com</A>.
>
>
>  I assume that's a typo and 'ibm' should be 'google'.

:-) Yep.  I had used www.ibm.com and www.mbi.com in my test and
changed them to G! and Y! for the email, but
  missed one reference.

>  >So, the problem seems to appear with commas too which makes me wonder
>  >if this can be resolved with this:
>  >
>  >   urlpat = re.compile(r'(\w+://[^>)\s]+?)(\.|,)?(\s|$)') # URLs in text
>  >
>  >but then I got to thinking about any other punctuation make that
>  >follows a URL... and my mind started spinning :-)
>
>
>  I think the suggestion above - (\.|,)? would work for comma, but you
>  could do it other ways - e.g.
>
>
>    urlpat = re.compile(r'(\w+://[^>)\s]+?)[.,;]?(\s|$)') # URLs in text
>
>  to handle '.', ',' and ';', and you could extend that with more
>  characters, but you really need to be careful. Consider for example,
>  <http://www.example.com/some/page#anchor.> which could be a valid URL
>  ending in '.'.

Understood.  I think the "[.,;]" would cover 99% of the possibilities
of a URL in a sentence.

Thanks again!

-Jim P.