[Mailman-Users] Pipermail URL handling in archives
Jim Popovitch
yahoo at jimpop.com
Sat Feb 23 01:57:40 CET 2008
On Fri, Feb 22, 2008 at 7:37 PM, Mark Sapiro <mark at msapiro.net> wrote:
> >Gets converted into:
> > this is another url <A
> >HREF="http://www.yahoo.com,">http://www.yahoo.com,</A>
> > and so is this <A
> >HREF="http://www.ibm.com">http://www.google.com</A>.
>
>
> I assume that's a typo and 'ibm' should be 'google'.
:-) Yep. I had used www.ibm.com and www.mbi.com in my test and
changed them to G! and Y! for the email, but
missed one reference.
> >So, the problem seems to appear with commas too which makes me wonder
> >if this can be resolved with this:
> >
> > urlpat = re.compile(r'(\w+://[^>)\s]+?)(\.|,)?(\s|$)') # URLs in text
> >
> >but then I got to thinking about any other punctuation make that
> >follows a URL... and my mind started spinning :-)
>
>
> I think the suggestion above - (\.|,)? would work for comma, but you
> could do it other ways - e.g.
>
>
> urlpat = re.compile(r'(\w+://[^>)\s]+?)[.,;]?(\s|$)') # URLs in text
>
> to handle '.', ',' and ';', and you could extend that with more
> characters, but you really need to be careful. Consider for example,
> <http://www.example.com/some/page#anchor.> which could be a valid URL
> ending in '.'.
Understood. I think the "[.,;]" would cover 99% of the possibilities
of a URL in a sentence.
Thanks again!
-Jim P.
More information about the Mailman-Users
mailing list