Ann: Validating Emails and HTTP URLs in Python
Philip Semanchuk
philip at semanchuk.com
Mon May 3 09:24:49 EDT 2010
On May 3, 2010, at 9:06 AM, andrew cooke wrote:
>
> Hi,
>
> The latest Lepl release includes an implementation of RFC 3696 - the
> RFC that describes how best to validate email addresses and HTTP
> URLs. For more information please see http://www.acooke.org/lepl/rfc3696.html
>
> Lepl's main page is http://www.acooke.org/lepl
>
> Because Lepl compiles to regular expressions wherever possible, the
> library is quite fast - in testing I was seeing about 1ms needed to
> validate a URL.
>
> Please bear in mind that this is the very first release of this
> module, so it may have some bugs... If you find any problems contact
> me and I'll fix them ASAP.
Thanks, Andrew, for contributing that to the open source community.
FYI, Fourthought's PyXML has a module called uri.py that contains
regexes for URL validation. I've over a million URLs (harvested from
the Internet) through their code. I can't say I checked each and every
result, but I never saw anything that would lead me to believe it was
misbehaving.
It might be interesting to compare the results of running a large list
of URLs through your code and theirs.
Good luck
Philip
More information about the Python-list
mailing list